
Facebook RSS import results
Technical ·Thursday March 5, 2009 @ 02:16 EST (link)
I'm adding a few log items as a test: my current newest entry ("Obama's Magic") was dated February 12; I've posted two entries, one older than that (February 11) and one later on the 12th, to see how the Facebook RSS importer works. Does it import anything newer than its "last seen" date? (Then it'll pick up only the second one.) Does it import anything it hasn't seen, using the RSS GUID or a checksum? (Then it'll get both.) Or does it only import entries dated later than the current date? (Then it'll add neither.) It'll be a revelatory test. (Also I'm testing how frequently it checks, although I believe it says every hour: I just updated one of the entries; will it get the old or new version, or does it notice updates to imported entries?—true, if it does notice updates it'll be impossible to tell if it first picked up the old one.) Also: set up a (user) cron to save some Facebook data; had to add myself to the cron and crontab groups to make it work (and remember that every command needs an explicit path, even /bin/date).
… And the results are in! From my (Apache access) logs, Facebook (out.nnn.01.snc1.facebook.com) fetched index.rss (which runs my script, using XML::RSS, my pH::Journal module, and a bit of glue) at 0208, and they (both) showed up perhaps a minute later (sorted correctly). It fetches the RSS feed about every two hours (minimum 1:40, maximum 2:54, so far this month… found using a one-line perl program and the excellent DateTime modules). Since I had changed the output a little (added categories), I could tell that no, they don't update existing entries. Well done, Facebook, although I wish it the behavior was better documented (but since most people don't program their own blog and RSS feed, and only post current entries, it's understandable that they don't document these implementation details). It does tend to mangle formatting a little, though.
Books finished: The Minority Report.