Operational Dynamics
Research and Development   |   Projects   |   Blogs   |   Source Code   |   Linux
May
Mon Tue Wed Thu Fri Sat Sun
           
13
         

hackergotchi
This section:

Blog postings by Operational Dynamics partners and staff

Use the links at top left for a consolidated feed of all the posts made on this site.

Please note the disclaimer at the bottom of this page.

RSS 2.0 Atom 0.3

Fri, 13 May 2005

Whither GUIDs?

I’ve been trying to understand the impact of updating or moving a blog entry within Blosxom.

The RSS 2.0 schema provides two fields for each article that relate to links and identity. First is the <link> field which, reasonably enough, is expected to be a link to an online (presumably HTML) version of the article. The second is the <guid> field, intended to be a globally unique string which can be used by a reader to identify an article.

<guid> has one attribute, isPermaLink, defaulting to true. If true, the field can be interpreted as a usable permanent URL to the story.

If the date changes but GUID stays the same, my reader marks the article as updated, which is cool. And on a planet, well, it just gets reslotted and the post shows up wherever its new modification time indicates it should be. So updates are all good.

But what happens if I ever want to move entries around to make a more sensible category tree?

Recategorizing entries

Until now I’ve been doing what everyone else seems to do, that is, sticking the link in as the guid and allowing the isPermaLink="true" default to work.

One of the problems of using a hierarchical blog store (ie a set of files in a directory hierarchy) is that category is defined by what directory you’re in. That’s exactly what you want, but it creates a problem when you’re trying to figure out what a good GUID would be - binding the GUID (and implicitly the permalink) to the file’s name and position in the directory hierarchy means that if you move the file up or down, or rename it, (regardless of whether you preserve the modification time) you change the guid, making readers think the article is new. That’s no good.

Since I want to be able to recategorize postings at some point in the future, the alternative would seem to be changing my rss20 plugin and flavour template to say <guid isPermaLink="false"> and use something actually unique for the globally unique identifier. But what?

Generate persistent and unique IDs

Anyone who has ever done any work in the enterprise application server or core database world has come across the design problem of generating primary keys. There are several approaches to try. The AUTO_INCREMENT or SEQUENCE mechanisms in relational databases are an obvious source of sequential numbers, but this runs into problems when there are multiply redundant masters which have to co-ordinate handing out unique IDS. Blogging is certainly a far simpler case, but we still have to find a way to form GUIDs that will be stable.

The approach that Blosxom takes, as shipped, is to use a permalink of the form /2005/03/08#posting-name (which it serves as a virtual and autogenerated category). The trouble with using this string for the GUID field (even assuming isPermaLink="false") is “what happens when I make an update to the post and the date has changed?”. The whole point is to have a GUID field that survives updates.

Another idea would be to use the inode number as that would survive a file move. Inode number would actually be perfect except that, unfortunately, I publish my blog by rsyncing my local draft copy to our R&D server where the feed is hosted. rsync regards a moved file as a combination of a delete of the original and an addition of a file in new location, and a new file means a new inode number server side. (Damn)

Finally, of course, are any number of options which involve generating a sequential UIDs and storing them in the plugin cache somewhere. That is do-able but is really rather complicated, and would imply some kind of item-to-uid mapping that would then need to be maintained internally — and synced up to the server. Ick.

Keep it simple

At this point, I gave up and decided that posting-name itself would probably cut it. This implies that I’m imposing on myself the requirement that story names be unique across the entire blog, but that’s not so terribly bad. And it means that if I do move a file, I don’t rename it and I have to ensure I preserve mtime. But that’s easy, because that’s exactly what mv does! And mving makes perfect sense for a Blosxom blog because that’s how you recategorize something.

So, the RSS entry for this story now contains <guid isPermaLink="false">reverse-understanding-rss20</guid> which is fine. What I wanted was a GUID that would survive category moves. The file’s basename is good enough.

[Yes, there is the obvious problem of stale links floating around out there in people’s readers and on Planets and whatnot, but having decided on unique filenames, a 404 page which goes hunting for the file within the blog tree and redirects to the new location accordingly will be easy to implement]

I’ve updated my rss20 flavour to do the guids this way.

AfC

Sorry to anyone whose reader (correctly) interprets this as if all my posts are new entries — that’s what I get for mucking with the guid field. It won’t happen again … well, not until someone tells me a better way to generate a GUID for a Blosxom feed :)


RSS 2.0 Atom 0.3 Category Specific Feeds. Use these links for an RSS or ATOM feed limited to this category and its descendants. Technorati Profile


Material on this site copyright © 2005-2008 Operational Dynamics Consulting Pty Ltd, unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing.

We make this service available to our staff in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics. All times UTC.