2011
12.20

When I planned an RSS site project I investigated many RSS agregator possibilities, and ended up with Autoblog. As one of the leading figures of Hungarian WordPress Community we have access to wpmudev plugins, but I had previous experience with AutoBlogged and FeedWordPress as well.

I have made many modifications in the plugin, to fullify my needs the most.
These are the things I want to show you in this article.

Modify RSS update method

The modification what I’ll show denies every feed upgrade while the user browse the pages. Because it’s so annoying when user came to the page – and because of the nature of the blog feed processor – and the page starts updating the feed, and if you have many feeds, it could take 30-45-60-90 seconds.
While the user have to wait, wait and wait…

On the other hand, if you cache the pages to prevent this method you will end up with outdated and missed feed update times. So I’ve inserted this piece of code at the beginning of autoblog/autoblogpremium.php file.

1
2
3
4
if(!defined('DOING_CRON') && !is_admin())
{              
    return;
}

Trigger the scheduler

If we just make the modification above we ruined everything, because feeds will update quite rarely.
WordPress has an own cron method, which sends POST data to wp-cron.php from time to time, if the code runs. But it can’t run, when you serve cached pages to users, or you don’t browse the admin pages.

That’s why I created a unix based cron script to run every minute, and call, or actually POST to wp_cron.php properly.

1
2
#everyminute I trigger the wpcron
* * * * * www /home/www/cronrobot.sh
1
2
#!/bin/sh
/usr/bin/wget --post-data '' http://domain.com/wp-cron.php?doing_wp_cron=`date +%s` -O /dev/null

Handling the load

When you are on a shared hosting it could happen that the load is too high, and maybe it’s not the best idea to do such heavy task like rss+xml/rss+atom parsing.
I pressume that the only thing on my site which highers the load is rss updating, so when the load is higher, that a given amount the rss fetcher will simply to running, until the load falls lower.

To support load watching I had to modify the autoblogincludes/classes/autoblogprocess.php‘s contructor around the 22. line.

1
2
3
4
5
6
7
8
9
10
    $loadavg = sys_getloadavg();

    if($loadavg[0] > AUTOBLOG_HALT_ON_HIGH_LOAD)
    {
        if($this->debug) {
        // time out
            $this->errors[] = __('Notice: Processing stopped due to the load is ' . $loadavg[0] . '.','autoblogtext');
        }
        die;
    }

And I also defined the given AUTOBLOG_HALT_ON_HIGH_LOAD constant at the end of autoblogincludes/includes/conf.php

1
if(!defined('AUTOBLOG_HALT_ON_HIGH_LOAD')) define( 'AUTOBLOG_HALT_ON_HIGH_LOAD', 4);

So, if the load is higher than 4 (it uses four times more CPU computing time than the system really has) well wait.
If you use the wp-cron I explained above it means, the next update will occur 1 minute later, it won’t hurt.

URL handling

Thus I need to know where the original posts came from and I don’t want to use the postmeta table – for performance reasons – I’ve modified the plugin the use the posts table’s guid field to the url storage. This field is varchar(255) so it should be enough, however there are valid url’s much longer than that.

So, I’ve modified the row in autoblogincludes/classes/autoblogprocess.php

This code

1
2
// We are going to store the permalink for imported posts in a meta field so we don't import duplicates
$results = $this->db->get_row( $this->db->prepare("SELECT post_id FROM {$this->db->postmeta} WHERE meta_key = %s AND meta_value = %s", 'original_source', $item->get_permalink()) );

to this one

1
  $results = $wpdb->get_row( $wpdb->prepare("SELECT ID FROM {$this->db->posts} WHERE guid = %s", $item->get_permalink()) );

And also I need to modify how the post is saved, for this function to work well.
Autoblog has some filters and action hooks in the code (but I’ve inserted some more ;)) I used the autoblog_pre_post_insert filter, like this:

1
2
3
4
5
6
7
8
9
    add_filter('autoblog_pre_post_insert','my_pre_post_insert',10,3);

    function my_pre_post_insert($post_data,$ablog,$item)
    {
        $link = $item->get_permalink();
        $post_data['guid'] = trim($link);//just in case

        return $post_data;
    }

More hooks

There are two pretty handy hooks in the code, which called autoblog_pre_process_feeds and autoblog_post_process_feeds. The only pitty is that these hooks only available, if you call the feed processing manually, so I extended the code to use in the automatic processing as well.

I like the “autoblog_post_process_feeds” pretty much, because I can make checkings after all feeds have been processed. For example I set every post to published which set to future (you know RSS should contain GMT time, but, not everybody respect this recommendation, and this way you can have many scheduled posts).

3 comments so far

Add Your Comment
  1. [...] I’ve explained earlier here I’m working on an RSS based project, and I’ve got a request to rethink the feed [...]

  2. Hey DjZoNe,

    I made a PHP function that fetches data from any blog’s feed like its title, published date, categories etc. and also checks if the feed is updated and fetches the new data. To get the updated feeds I want this program to run every 30 minutes automatically but don’t know how to do this. Will you plzzzzz help me out to resolve this issue? Just need the code and way to run my function automatically after every 30 minutes.

    I have this project to submit and really need your help desperately. Any help would be divine… Just counting on you….

    Thank you.

  3. Hey Sahil, the answer is right here:
    http://djzone.im/2013/01/how-to-create-custom-cron-interval-in-wordpress/