| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 | 31 |

This section:
Meta: blog postings about ... blog posting with Blosxom!
The syndication links at top left will give you a feed for the blog as a whole.
If you'd like a feed specific to this sub-category, see bottom of page.
blogs > andrew > meta > blosxom > reverse-understanding-rss20
Fri, 13 May 2005
Whither GUIDs?
I’ve been trying to understand the impact of updating or moving a blog entry within Blosxom.
The RSS 2.0 schema provides two fields for each article
that relate to links and identity.
First is the <link> field which,
reasonably enough, is expected to be a link to an online (presumably
HTML) version of the article. The second is the <guid> field,
intended to be a globally unique string which can be used by a reader
to identify an article.
<guid> has one attribute, isPermaLink, defaulting to true. If true, the field
can be interpreted as a usable permanent URL to the story.
If the date changes but GUID stays the same, my reader marks the article as updated, which is cool. And on a planet, well, it just gets reslotted and the post shows up wherever its new modification time indicates it should be. So updates are all good.
But what happens if I ever want to move entries around to make a more sensible category tree?
Recategorizing entries
Until now I’ve been doing what everyone else seems to do, that is,
sticking the link in as the guid and allowing the
isPermaLink="true" default to work.
One of the problems of using a hierarchical blog store (ie a set of
files in a directory hierarchy) is that category is defined by what
directory you’re in. That’s exactly what you want, but it creates a
problem when you’re trying to figure out what a good GUID would be -
binding the GUID (and implicitly the permalink) to the file’s name and
position in the directory hierarchy means that if you move the file up
or down, or rename it, (regardless of whether you preserve the
modification time) you change the guid, making readers think the
article is new. That’s no good.
Since I want to be able to recategorize postings at some point in the
future, the alternative would seem to be changing my rss20
plugin and flavour template to say
<guid isPermaLink="false"> and use something actually unique for the
globally unique identifier. But what?
Generate persistent and unique IDs
Anyone who has ever done any work in the enterprise application server or core database world has come across the design problem of generating primary keys. There are several approaches to try. The AUTO_INCREMENT or SEQUENCE mechanisms in relational databases are an obvious source of sequential numbers, but this runs into problems when there are multiply redundant masters which have to co-ordinate handing out unique IDS. Blogging is certainly a far simpler case, but we still have to find a way to form GUIDs that will be stable.
The approach that Blosxom takes, as shipped, is to use a permalink of
the form /2005/03/08#posting-name (which it serves as a virtual and
autogenerated category). The trouble with using this string for the
GUID field (even assuming isPermaLink="false") is “what happens when
I make an update to the post and the date has changed?”. The whole
point is to have a GUID field that survives updates.
Another idea would be to use the inode number as that would survive a
file move. Inode number would actually be perfect except that,
unfortunately, I publish my blog by rsyncing my local draft copy to
our R&D server where the feed is hosted. rsync regards a moved file
as a combination of a delete of the original and an addition of a
file in new location, and a new file means a new inode number server
side. (Damn)
Finally, of course, are any number of options which involve generating a sequential UIDs and storing them in the plugin cache somewhere. That is do-able but is really rather complicated, and would imply some kind of item-to-uid mapping that would then need to be maintained internally — and synced up to the server. Ick.
Keep it simple
At this point, I gave up and decided that posting-name itself would
probably cut it. This implies that I’m imposing on myself the
requirement that story names be unique across the entire blog, but
that’s not so terribly bad. And it means that if I do move a file, I
don’t rename it and I have to ensure I preserve mtime. But that’s
easy, because that’s exactly what mv does! And mving makes perfect
sense for a Blosxom blog because that’s how you recategorize something.
So, the RSS entry for this story now contains
<guid isPermaLink="false">reverse-understanding-rss20</guid>
which is fine. What I wanted was a GUID that would survive category
moves. The file’s basename is good enough.
[Yes, there is the obvious problem of stale links floating around out there in people’s readers and on Planets and whatnot, but having decided on unique filenames, a 404 page which goes hunting for the file within the blog tree and redirects to the new location accordingly will be easy to implement]
I’ve updated my rss20 flavour to do the guids this way.
AfC
Sorry to anyone whose reader (correctly) interprets this as if all my
posts are new entries — that’s what I get for mucking with the guid
field. It won’t happen again … well, not until someone tells me a better way
to generate a GUID for a Blosxom feed :)
Category Specific Feeds.
Use these links for an RSS or ATOM feed limited to this category and its descendants.
Technorati Profile

