Operational Dynamics
Research and Development   |   Projects   |   Blogs   |   Source Code   |   Linux
November
Mon Tue Wed Thu Fri Sat Sun
         
20 21 22 23
24 25 26 27 28 29 30

hackergotchi
This section:

The nuts and bolts of getting information online

The syndication links at top left will give you a feed for the blog as a whole. If you'd like a feed specific to this sub-category, see bottom of page.

RSS 2.0 Atom 0.3 blogs > andrew > meta

Fri, 13 May 2005

Whither GUIDs?

I’ve been trying to understand the impact of updating or moving a blog entry within Blosxom.

The RSS 2.0 schema provides two fields for each article that relate to links and identity. First is the <link> field which, reasonably enough, is expected to be a link to an online (presumably HTML) version of the article. The second is the <guid> field, intended to be a globally unique string which can be used by a reader to identify an article.

<guid> has one attribute, isPermaLink, defaulting to true. If true, the field can be interpreted as a usable permanent URL to the story.

If the date changes but GUID stays the same, my reader marks the article as updated, which is cool. And on a planet, well, it just gets reslotted and the post shows up wherever its new modification time indicates it should be. So updates are all good.

But what happens if I ever want to move entries around to make a more sensible category tree?

Recategorizing entries

Until now I’ve been doing what everyone else seems to do, that is, sticking the link in as the guid and allowing the isPermaLink="true" default to work.

One of the problems of using a hierarchical blog store (ie a set of files in a directory hierarchy) is that category is defined by what directory you’re in. That’s exactly what you want, but it creates a problem when you’re trying to figure out what a good GUID would be - binding the GUID (and implicitly the permalink) to the file’s name and position in the directory hierarchy means that if you move the file up or down, or rename it, (regardless of whether you preserve the modification time) you change the guid, making readers think the article is new. That’s no good.

Since I want to be able to recategorize postings at some point in the future, the alternative would seem to be changing my rss20 plugin and flavour template to say <guid isPermaLink="false"> and use something actually unique for the globally unique identifier. But what?

Generate persistent and unique IDs

Anyone who has ever done any work in the enterprise application server or core database world has come across the design problem of generating primary keys. There are several approaches to try. The AUTO_INCREMENT or SEQUENCE mechanisms in relational databases are an obvious source of sequential numbers, but this runs into problems when there are multiply redundant masters which have to co-ordinate handing out unique IDS. Blogging is certainly a far simpler case, but we still have to find a way to form GUIDs that will be stable.

The approach that Blosxom takes, as shipped, is to use a permalink of the form /2005/03/08#posting-name (which it serves as a virtual and autogenerated category). The trouble with using this string for the GUID field (even assuming isPermaLink="false") is “what happens when I make an update to the post and the date has changed?”. The whole point is to have a GUID field that survives updates.

Another idea would be to use the inode number as that would survive a file move. Inode number would actually be perfect except that, unfortunately, I publish my blog by rsyncing my local draft copy to our R&D server where the feed is hosted. rsync regards a moved file as a combination of a delete of the original and an addition of a file in new location, and a new file means a new inode number server side. (Damn)

Finally, of course, are any number of options which involve generating a sequential UIDs and storing them in the plugin cache somewhere. That is do-able but is really rather complicated, and would imply some kind of item-to-uid mapping that would then need to be maintained internally — and synced up to the server. Ick.

Keep it simple

At this point, I gave up and decided that posting-name itself would probably cut it. This implies that I’m imposing on myself the requirement that story names be unique across the entire blog, but that’s not so terribly bad. And it means that if I do move a file, I don’t rename it and I have to ensure I preserve mtime. But that’s easy, because that’s exactly what mv does! And mving makes perfect sense for a Blosxom blog because that’s how you recategorize something.

So, the RSS entry for this story now contains <guid isPermaLink="false">reverse-understanding-rss20</guid> which is fine. What I wanted was a GUID that would survive category moves. The file’s basename is good enough.

[Yes, there is the obvious problem of stale links floating around out there in people’s readers and on Planets and whatnot, but having decided on unique filenames, a 404 page which goes hunting for the file within the blog tree and redirects to the new location accordingly will be easy to implement]

I’ve updated my rss20 flavour to do the guids this way.

AfC

Sorry to anyone whose reader (correctly) interprets this as if all my posts are new entries — that’s what I get for mucking with the guid field. It won’t happen again … well, not until someone tells me a better way to generate a GUID for a Blosxom feed :)

Sun, 08 May 2005

Getting Blosxom to work…

So I’ve finally joined the blogosphere. About time. I kept putting it off, being relatively content with the posts I was sending to email lists.

I began to notice that about once a day I found myself writing some really long email (or hammering away at length in some IRC channel) about a topic which I inevitably wanted to refer to again. But forwarded emails are usually a pain, and having to rummage around in your own email folders in order to then hunt down the message in an email archive so as to then send someone a link gets really old in a hurry. So, why not. Might as well blog the stuff up.

Picking a blog tool took a lot of thought. WordPress is awesome, especially the multi-categories feature. But what I really wanted was something that would work off of simple text files that I could work at offline. I tend to spend a lot of time in places with no internet coverage (airplanes, coffee shops by the beach, banana republics, that sort of thing) and so anything that was a web app was really a non-starter. I needed something I could publish by running rsync.

People had recommended pyBlosxom, but I couldn’t quite get it running, and in any case I don’t speak Python so that wasn’t terribly appealing. I’d come across Blosxom of course, and seen it used for a number of blog pages that I’d been following for a while, and on that basis alone it seemed like it might be alright. Blosxom is in Perl so I figured I’d have no problem getting it to behave.

Boy, was I in for a shock. First things first, however…

Text markup

Five or six years ago there was something called Simple Document Format which was blindingly simple to use, and from simple text documents generated man pages and web pages. Yeah, sure, so does LeX, but this was text. When wikis appeared on the scene a few years back I wasn’t perhaps as impressed as I might have been, other than “well, it’s about time.” But I quickly noted that the lack of any coherent driver to standardize wiki markup meant that as each person created their own new wiki, they came up with a syntax that they liked. Which sucks for users because we have to learn yet another wiki markup language each time we sign into one.

I came across Markdown, and at last found a text markup tool that I liked. It’s awesome. Writing about the philosophy behind the design, the author, John Gruber, states,

“Markdown is intended to be as easy-to-read and easy-to-write as is feasible. Readability, however, is emphasized above all else… the single biggest source of inspiration for Markdown’s syntax is the format of plain text email

To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email.

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format.”

Reminded me of SDF. Combined with Blosxom, which had the premise of being simple and text file based I felt sure I’d have my blog up in no time.

Yeah, right. Who was I kidding?

Blosxom Plugins

You quickly realize that to do anything with Blosxom you’ll be using a variety of plugins. That didn’t seem like it would pose a problem as there was a well organized plugin directory to work from.

And then you realize that there are 4 different plugins that all purport to do the same thing. The one you decide to try turns out to have been written over two years ago, and that it hasn’t been touched since. Never a good sign. And then you try to download it, only to discover that the site that it was posted on is long gone.

Sadly, this is not an uncommon story in the Open Source world. Just look at your average search on SourceForge or Freshmeat - let alone the endless shady PHP classes debacle. But I kept coming across people who were using Blosxom as their blog tool, and so I figured it couldn’t be that bad, and persisted.

I ended up using the following plugins:

They all had POD documentation in their files, but it was usually rather sparse, and just getting to the point where you knew whether or not a plugin was even running took quite a while.

In proper Free Software fashion, I finally figured out what was going on when I delved into the source. Yes I’d read about the difference between head.flavour and story.flavour, but it didn’t quite sunk in until I was trying to use the username from the whoami plugin in the blog header, and finally clued in that no, that wasn’t available yet because the $whoami::username variable that whoami populated was based on the individual item, not “the page as a whole” even though it was “my” blog. Fair enough, but like I said, a bit of a learning curve.

Syndication

Blosxom has a MultiViews notion it calls flavours. You ask for index.html, get a web page; ask for index.rss and ta-da you get a different stylesheet view of the blog data that’s an RSS feed. So I figured that getting an RSS feed out of it was going to be a snap.

Except for one thing: the rss flavour built into Blosxom doesn’t validate, at least not without fudging the content type and taking out all the HTML. (props to Chris Deigan for pointing me at that tool).

But hang on - everyone else was is using HTML in their feeds. So I thought “Maybe it’s because my feed is RSS 0.91”, and tried upgrading to RSS 1.0. I had a hint something was wrong when I came across these plugins:

Why should I need a plugin to fix the RSS flavour to be 1.0 compliant? Whatever. I grabbed the plugin, figured out how to use it… and it still wouldn’t validate. So I started looking at it, and discovered RSS 1.0 it was nothing even remotely close to 0.9 - (its full of namespaces on attributes. Yuk!) - and that Blosxom’s idea of RSS 1.0 wasn’t remotely close to validating either. So I hacked on the templates, and eventually got a RSS 1.0 feed that passed.

Except for one thing. It didn’t grok HTML in descriptions either. WTF?

So then I tried Atom, had to hack it a touch to validate, and finally I had a valid feed with HTML in it. My feed reader liferea groked it perfectly as well. Yeay!

The coming of wisdom

At that point, exhausted, I figured that RSS 2.0 would be a further nightmare, but being too punch drunk to know when to stop, I glanced at the spec anyway. And to my amazement, it was Really Simple. Imagine. With all I’d learned, I had a valid feed in no time, creating my own

plugin and flavour template.

But wait, you say, embedded XHTML isn’t a valid part of RSS 2.0! Well, you’re right. In the end I finally discovered the technique of wrapping the description content with <?[CDATA[ ... ]]>. Slap that around the body text and you escape any and all non RSS entities. Damn and other comments. If I’d thought of that earlier I could have saved myself a lot of trouble.

Download

You can download my fixes to the atomfeed plugin (it’s self contained), or grab my rss20 plugin and rss20 flavour files, which have been working really well.

Sigh. Look at me. RSS guru. Yup. You know those little boxes on government forms that ask you how long it took to complete the process? … You don’t want to know.

AfC


RSS 2.0 Atom 0.3 Category Specific Feeds. Use these links for an RSS or ATOM feed limited to this category and its descendants. Technorati Profile


Material on this site copyright © 2005-2008 Operational Dynamics Consulting Pty Ltd, unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing.

We make this service available to our staff in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics. All times UTC.