| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |

This section:
The nuts and bolts of getting information online
The syndication links at top left will give you a feed for the blog as a whole.
If you'd like a feed specific to this sub-category, see bottom of page.
Fri, 13 May 2005
Whither GUIDs?
I’ve been trying to understand the impact of updating or moving a blog entry within Blosxom.
The RSS 2.0 schema provides two fields for each article
that relate to links and identity.
First is the <link> field which,
reasonably enough, is expected to be a link to an online (presumably
HTML) version of the article. The second is the <guid> field,
intended to be a globally unique string which can be used by a reader
to identify an article.
<guid> has one attribute, isPermaLink, defaulting to true. If true, the field
can be interpreted as a usable permanent URL to the story.
If the date changes but GUID stays the same, my reader marks the article as updated, which is cool. And on a planet, well, it just gets reslotted and the post shows up wherever its new modification time indicates it should be. So updates are all good.
But what happens if I ever want to move entries around to make a more sensible category tree?
Recategorizing entries
Until now I’ve been doing what everyone else seems to do, that is,
sticking the link in as the guid and allowing the
isPermaLink="true" default to work.
One of the problems of using a hierarchical blog store (ie a set of
files in a directory hierarchy) is that category is defined by what
directory you’re in. That’s exactly what you want, but it creates a
problem when you’re trying to figure out what a good GUID would be -
binding the GUID (and implicitly the permalink) to the file’s name and
position in the directory hierarchy means that if you move the file up
or down, or rename it, (regardless of whether you preserve the
modification time) you change the guid, making readers think the
article is new. That’s no good.
Since I want to be able to recategorize postings at some point in the
future, the alternative would seem to be changing my rss20
plugin and flavour template to say
<guid isPermaLink="false"> and use something actually unique for the
globally unique identifier. But what?
Generate persistent and unique IDs
Anyone who has ever done any work in the enterprise application server or core database world has come across the design problem of generating primary keys. There are several approaches to try. The AUTO_INCREMENT or SEQUENCE mechanisms in relational databases are an obvious source of sequential numbers, but this runs into problems when there are multiply redundant masters which have to co-ordinate handing out unique IDS. Blogging is certainly a far simpler case, but we still have to find a way to form GUIDs that will be stable.
The approach that Blosxom takes, as shipped, is to use a permalink of
the form /2005/03/08#posting-name (which it serves as a virtual and
autogenerated category). The trouble with using this string for the
GUID field (even assuming isPermaLink="false") is “what happens when
I make an update to the post and the date has changed?”. The whole
point is to have a GUID field that survives updates.
Another idea would be to use the inode number as that would survive a
file move. Inode number would actually be perfect except that,
unfortunately, I publish my blog by rsyncing my local draft copy to
our R&D server where the feed is hosted. rsync regards a moved file
as a combination of a delete of the original and an addition of a
file in new location, and a new file means a new inode number server
side. (Damn)
Finally, of course, are any number of options which involve generating a sequential UIDs and storing them in the plugin cache somewhere. That is do-able but is really rather complicated, and would imply some kind of item-to-uid mapping that would then need to be maintained internally — and synced up to the server. Ick.
Keep it simple
At this point, I gave up and decided that posting-name itself would
probably cut it. This implies that I’m imposing on myself the
requirement that story names be unique across the entire blog, but
that’s not so terribly bad. And it means that if I do move a file, I
don’t rename it and I have to ensure I preserve mtime. But that’s
easy, because that’s exactly what mv does! And mving makes perfect
sense for a Blosxom blog because that’s how you recategorize something.
So, the RSS entry for this story now contains
<guid isPermaLink="false">reverse-understanding-rss20</guid>
which is fine. What I wanted was a GUID that would survive category
moves. The file’s basename is good enough.
[Yes, there is the obvious problem of stale links floating around out there in people’s readers and on Planets and whatnot, but having decided on unique filenames, a 404 page which goes hunting for the file within the blog tree and redirects to the new location accordingly will be easy to implement]
I’ve updated my rss20 flavour to do the guids this way.
AfC
Sorry to anyone whose reader (correctly) interprets this as if all my
posts are new entries — that’s what I get for mucking with the guid
field. It won’t happen again … well, not until someone tells me a better way
to generate a GUID for a Blosxom feed :)
Sun, 08 May 2005
Getting Blosxom to work…
So I’ve finally joined the blogosphere. About time. I kept putting it off, being relatively content with the posts I was sending to email lists.
I began to notice that about once a day I found myself writing some really long email (or hammering away at length in some IRC channel) about a topic which I inevitably wanted to refer to again. But forwarded emails are usually a pain, and having to rummage around in your own email folders in order to then hunt down the message in an email archive so as to then send someone a link gets really old in a hurry. So, why not. Might as well blog the stuff up.
Picking a blog tool took a lot of thought. WordPress is awesome, especially the multi-categories feature. But what I really wanted was something that would work off of simple text files that I could work at offline. I tend to spend a lot of time in places with no internet coverage (airplanes, coffee shops by the beach, banana republics, that sort of thing) and so anything that was a web app was really a non-starter. I needed something I could publish by running rsync.
People had recommended pyBlosxom, but I couldn’t quite get it running, and in any case I don’t speak Python so that wasn’t terribly appealing. I’d come across Blosxom of course, and seen it used for a number of blog pages that I’d been following for a while, and on that basis alone it seemed like it might be alright. Blosxom is in Perl so I figured I’d have no problem getting it to behave.
Boy, was I in for a shock. First things first, however…
Text markup
Five or six years ago there was something called Simple Document Format which was blindingly simple to use, and from simple text documents generated man pages and web pages. Yeah, sure, so does LeX, but this was text. When wikis appeared on the scene a few years back I wasn’t perhaps as impressed as I might have been, other than “well, it’s about time.” But I quickly noted that the lack of any coherent driver to standardize wiki markup meant that as each person created their own new wiki, they came up with a syntax that they liked. Which sucks for users because we have to learn yet another wiki markup language each time we sign into one.
I came across Markdown, and at last found a text markup tool that I liked. It’s awesome. Writing about the philosophy behind the design, the author, John Gruber, states,
“Markdown is intended to be as easy-to-read and easy-to-write as is feasible. Readability, however, is emphasized above all else… the single biggest source of inspiration for Markdown’s syntax is the format of plain text email
To this end, Markdown’s syntax is comprised entirely of punctuation characters, which punctuation characters have been carefully chosen so as to look like what they mean. E.g., asterisks around a word actually look like emphasis. Markdown lists look like, well, lists. Even blockquotes look like quoted passages of text, assuming you’ve ever used email.
Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format.”
Reminded me of SDF. Combined with Blosxom, which had the premise of being simple and text file based I felt sure I’d have my blog up in no time.
Yeah, right. Who was I kidding?
Blosxom Plugins
You quickly realize that to do anything with Blosxom you’ll be using a variety of plugins. That didn’t seem like it would pose a problem as there was a well organized plugin directory to work from.
And then you realize that there are 4 different plugins that all purport to do the same thing. The one you decide to try turns out to have been written over two years ago, and that it hasn’t been touched since. Never a good sign. And then you try to download it, only to discover that the site that it was posted on is long gone.
Sadly, this is not an uncommon story in the Open Source world. Just look at your average search on SourceForge or Freshmeat - let alone the endless shady PHP classes debacle. But I kept coming across people who were using Blosxom as their blog tool, and so I figured it couldn’t be that bad, and persisted.
I ended up using the following plugins:
- breadcrumbs
- config
- whoami
- calendar
- file
- page_titler
They all had POD documentation in their files, but it was usually rather sparse, and just getting to the point where you knew whether or not a plugin was even running took quite a while.
In proper Free Software fashion, I finally figured out what was going
on when I delved into the source. Yes I’d read about the difference
between head.flavour and story.flavour, but it didn’t quite sunk in
until I was trying to use the username from the whoami plugin in the
blog header, and finally clued in that no, that wasn’t available yet
because the $whoami::username variable that whoami populated was
based on the individual item, not “the page as a whole” even
though it was “my” blog. Fair enough, but like I said, a bit of a
learning curve.
Syndication
Blosxom has a MultiViews notion it calls flavours. You ask for
index.html, get a web page; ask for index.rss and ta-da you get a
different stylesheet view of the blog data that’s an RSS
feed. So I figured that getting an RSS feed out of it was going to be a
snap.
Except for one thing: the rss flavour built into Blosxom doesn’t validate, at least not without fudging the content type and taking out all the HTML. (props to Chris Deigan for pointing me at that tool).
But hang on - everyone else was is using HTML in their feeds. So I thought “Maybe it’s because my feed is RSS 0.91”, and tried upgrading to RSS 1.0. I had a hint something was wrong when I came across these plugins:
- rss10
- atomfeed
Why should I need a plugin to fix the RSS flavour to be 1.0 compliant? Whatever. I grabbed the plugin, figured out how to use it… and it still wouldn’t validate. So I started looking at it, and discovered RSS 1.0 it was nothing even remotely close to 0.9 - (its full of namespaces on attributes. Yuk!) - and that Blosxom’s idea of RSS 1.0 wasn’t remotely close to validating either. So I hacked on the templates, and eventually got a RSS 1.0 feed that passed.
Except for one thing. It didn’t grok HTML in descriptions either. WTF?
So then I tried Atom, had to hack it a touch to validate, and finally I had a valid feed with HTML in it. My feed reader liferea groked it perfectly as well. Yeay!
The coming of wisdom
At that point, exhausted, I figured that RSS 2.0 would be a further nightmare, but being too punch drunk to know when to stop, I glanced at the spec anyway. And to my amazement, it was Really Simple. Imagine. With all I’d learned, I had a valid feed in no time, creating my own
- rss20
plugin and flavour template.
But wait, you say, embedded XHTML isn’t a valid part of RSS 2.0! Well,
you’re right. In the end I finally discovered the technique of wrapping
the description content with <?[CDATA[ ... ]]>. Slap that around
the body text and you escape any and all non RSS entities. Damn and
other comments. If I’d thought of that earlier I could have saved
myself a lot of trouble.
Download
You can download my fixes to the atomfeed plugin (it’s self contained), or grab my rss20 plugin and rss20 flavour files, which have been working really well.
Sigh. Look at me. RSS guru. Yup. You know those little boxes on government forms that ask you how long it took to complete the process? … You don’t want to know.
AfC
Category Specific Feeds.
Use these links for an RSS or ATOM feed limited to this category and its descendants.
Technorati Profile

