<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<?xml-stylesheet href="http://research.operationaldynamics.com/blogs/atom.css" type="text/css"?>

<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">

<title type="text/plain">Andrew Cowie</title>
<tagline type="text/plain">Meta: blog postings about ... blog posting with Blosxom!</tagline>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs" />
<id>tag:research.operationaldynamics.com,2008:/andrew/meta/blosxom</id>
<generator url="http://www.blosxom.com/" version="2.0">Blosxom</generator>
<modified>2008-11-18T01:36:00Z</modified>

<entry>
<id>tag:research.operationaldynamics.com,2005:/andrew/meta/blosxom/reverse-understanding-rss20</id>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs/andrew/meta/blosxom/reverse-understanding-rss20.html" />
<title type="text/plain">Whither GUIDs?</title>
<dc:subject>/andrew/meta/blosxom</dc:subject>
<issued>2005-05-13T15:52:00Z</issued>
<modified>2005-05-13T15:52:00Z</modified>
<author>
  <name>Andrew Cowie</name>
</author>
<content type="application/xhtml+xml" xml:base="http://research.operationaldynamics.com/blogs" xml:lang="en" xml:space="preserve" mode="xml">
<div xmlns="http://www.w3.org/1999/xhtml"><p>I&#8217;ve been trying to understand the impact of updating or moving a blog
entry within Blosxom.</p>

<p>The <a href="http://blogs.law.harvard.edu/tech/rss">RSS 2.0</a> schema provides two fields for each article 
that relate to links and identity.
First is the <code>&lt;link&gt;</code> field which,
reasonably enough, is expected to be a link to an online (presumably
HTML) version of the article. The second is the <code>&lt;guid&gt;</code> field,
intended to be a globally unique string which can be used by a reader
to identify an article.</p>

<p><code>&lt;guid&gt;</code> has one attribute, <code>isPermaLink</code>, defaulting to true. If true, the field
can be interpreted as a usable permanent URL to the story.</p>

<p>If the date changes but GUID stays the same, my <a href="http://liferea.sourceforge.net/">reader</a>
marks the article as updated, which is cool. And on a planet, well, it
just gets reslotted and the post shows up wherever its new
modification time indicates it should be. So updates are all good.</p>

<p>But what happens if I ever want to move entries
around to make a more sensible category tree?</p>

<h2>Recategorizing entries</h2>

<p>Until now I&#8217;ve been doing what everyone else seems to do, that is,
sticking the <code>link</code> in as the <code>guid</code> and allowing the
<code>isPermaLink="true"</code> default to work.</p>

<p>One of the problems of using a hierarchical blog store (ie a set of
files in a directory hierarchy) is that category is defined by what
directory you&#8217;re in. That&#8217;s exactly what you want, but it creates a
problem when you&#8217;re trying to figure out what a good GUID would be -
binding the GUID (and implicitly the permalink) to the file&#8217;s name and
position in the directory hierarchy means that if you move the file up
or down, or rename it, (regardless of whether you preserve the
modification time) you change the <code>guid</code>, making readers think the
article is new. That&#8217;s no good.</p>

<p>Since I want to be able to recategorize postings at some point in the
future, the alternative would seem to be changing my rss20
<a href="http://research.operationaldynamics.com/projects/blosxom/rss20/rss20">plugin</a> and <a href="http://research.operationaldynamics.com/projects/blosxom/rss20">flavour</a> template to say
<code>&lt;guid isPermaLink="false"&gt;</code> and use something actually unique for the
globally unique identifier. But what?</p>

<h2>Generate persistent and unique IDs</h2>

<p>Anyone who has ever done any work in the enterprise application server
or core database world has come across the design problem of
generating primary keys. There are several approaches to try. The
AUTO_INCREMENT or SEQUENCE mechanisms in relational databases are an
obvious source of sequential numbers, but this runs into problems when
there are multiply redundant masters which have to co-ordinate handing
out unique IDS.  Blogging is certainly a far simpler case, but we
still have to find a way to form GUIDs that will be stable.</p>

<p>The approach that Blosxom takes, as shipped, is to use a permalink of
the form <code>/2005/03/08#posting-name</code> (which it serves as a virtual and
autogenerated category). The trouble with using this string for the
GUID field (even assuming <code>isPermaLink="false"</code>) is &#8220;what happens when
I make an update to the post and the date has changed?&#8221;. The whole
point is to have a GUID field that survives updates. </p>

<p>Another idea would be to use the inode number as that would survive a
file move. Inode number would actually be perfect except that,
unfortunately, I publish my blog by rsyncing my local draft copy to
our R&amp;D server where the feed is hosted. <code>rsync</code> regards a moved file
as a combination of a delete of the original and an addition of a
file in new location, and a new file means a new inode number server
side. (Damn)</p>

<p>Finally, of course, are any number of options which involve generating
a sequential UIDs and storing them in the plugin cache somewhere. That
is do-able but is really rather complicated, and would imply some kind
of item-to-uid mapping that would then need to be maintained
internally &#8212; and synced up to the server. Ick.</p>

<h2>Keep it simple</h2>

<p>At this point, I gave up and decided that <code>posting-name</code> itself would
probably cut it. This implies that I&#8217;m imposing on myself the
requirement that story names be unique across the entire blog, but
that&#8217;s not so terribly bad. And it means that if I do move a file, I
don&#8217;t rename it and I have to ensure I preserve mtime. But that&#8217;s
easy, because that&#8217;s exactly what <code>mv</code> does! And <code>mv</code>ing makes perfect
sense for a Blosxom blog because that&#8217;s how you recategorize something.</p>

<p>So, the RSS entry for this story now contains
<code>&lt;guid isPermaLink="false"&gt;reverse-understanding-rss20&lt;/guid&gt;</code>
which is fine. What I wanted was a GUID that would survive category
moves. The file&#8217;s basename is good enough.</p>

<p>[Yes, there is the obvious problem of stale links floating around out
there in people&#8217;s readers and on Planets and whatnot, but having
decided on unique filenames, a 404 page which goes hunting for the
file within the blog tree and redirects to the new location
accordingly will be easy to implement]</p>

<p>I&#8217;ve updated my <a href="http://research.operationaldynamics.com/projects/blosxom/rss20">rss20</a> flavour to do the <code>guid</code>s this way.</p>

<p>AfC</p>

<p><em>Sorry to anyone whose reader (correctly) interprets this as if all my
posts are new entries &#8212; that&#8217;s what I get for mucking with the <code>guid</code>
field. It won&#8217;t happen again &#8230; well, not until someone tells me a better way
to generate a GUID for a Blosxom feed</em> <code>:)</code></p>
</div>
</content>
</entry>

<entry>
<id>tag:research.operationaldynamics.com,2005:/andrew/meta/blosxom/blosxom-colophon</id>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs/andrew/meta/blosxom/blosxom-colophon.html" />
<title type="text/plain">Getting Blosxom to work&amp;#8230;</title>
<dc:subject>/andrew/meta/blosxom</dc:subject>
<issued>2005-05-08T09:45:00Z</issued>
<modified>2005-05-08T09:45:00Z</modified>
<author>
  <name>Andrew Cowie</name>
</author>
<content type="application/xhtml+xml" xml:base="http://research.operationaldynamics.com/blogs" xml:lang="en" xml:space="preserve" mode="xml">
<div xmlns="http://www.w3.org/1999/xhtml"><p>So I&#8217;ve finally joined the blogosphere. About time. I kept putting it
off, being relatively content with the posts I was sending to email lists.</p>

<p>I began to notice that about once a day I found myself
writing some really long email (or hammering away at length in some IRC
channel) about a topic which I inevitably wanted to refer to again.
But forwarded emails are usually a pain, and having to rummage around
in your own email folders in order to then hunt down the message in an
email archive so as to then send someone a link gets really old in a
hurry. So, why not. Might as well blog the stuff up.</p>

<p>Picking a blog tool took a lot of thought. <a href="http://wordpress.org/">WordPress</a> is awesome,
especially the multi-categories feature. But what I
really wanted was something that would work off of simple text files that
I could work at offline. I tend to spend a lot of time in places with
no internet coverage (airplanes, coffee shops by the beach, banana
republics, that sort of thing) and so anything that was a web app was
really a non-starter. I needed something I could publish by running
rsync.</p>

<p>People had recommended pyBlosxom, but I couldn&#8217;t quite get it running,
and in any case I don&#8217;t speak Python so that wasn&#8217;t terribly
appealing. I&#8217;d come across <a href="http://www.blosxom.com/">Blosxom</a> of course, and seen it
used for a number of blog pages that I&#8217;d been following for a while,
and on that basis alone it seemed like it might be alright.
Blosxom is in Perl so I figured I&#8217;d have no problem getting it to behave. </p>

<p>Boy, was I in for a shock. First things first, however&#8230;</p>

<h2>Text markup</h2>

<p>Five or six years ago there was something called Simple Document
Format which was blindingly simple to use, and from simple text
documents generated man pages and web pages. Yeah, sure, so does LeX,
but this was <em>text</em>. When wikis appeared on the scene a few years
back I wasn&#8217;t perhaps as impressed as I might have been, other than
&#8220;well, it&#8217;s about time.&#8221; But I quickly noted that the lack of any
coherent driver to standardize wiki markup meant that as each
person created their own new wiki, they came up with a syntax that
they liked. Which sucks for users because we have to learn yet
<em>another</em> wiki markup language each time we sign into one.</p>

<p>I came across <a href="http://daringfireball.net/projects/markdown/">Markdown</a>, and at last found a text markup
tool that I liked. It&#8217;s awesome. Writing about the philosophy behind
the design, the author, John Gruber, states,</p>

<blockquote>
  <p>&#8220;Markdown is intended to be as easy-to-read and easy-to-write as is
  feasible. Readability, however, is emphasized above all else&#8230; the
  single biggest source of inspiration for Markdown&#8217;s syntax is the
  format of plain text email</p>
  
  <p>To this end, Markdown&#8217;s syntax is comprised entirely of punctuation
  characters, which punctuation characters have been carefully chosen so
  as to look like what they mean. E.g., asterisks around a word actually
  look like <strong>emphasis</strong>. Markdown lists look like, well, lists. Even
  blockquotes look like quoted passages of text, assuming you&#8217;ve ever
  used email.</p>
  
  <p>Markdown is not a replacement for HTML, or even close to it. Its
  syntax is very small, corresponding only to a very small subset of
  HTML tags. The idea is not to create a syntax that makes it easier
  to insert HTML tags. In my opinion, HTML tags are already easy to
  insert. The idea for Markdown is to make it easy to read, write, and
  edit prose. HTML is a publishing format; Markdown is a writing
  format.&#8221;</p>
</blockquote>

<p>Reminded me of SDF. Combined with Blosxom, which had the premise of
being simple and text file based I felt sure I&#8217;d have my blog up in no
time.</p>

<p>Yeah, right. Who was I kidding?</p>

<h2>Blosxom Plugins</h2>

<p>You quickly realize that to do anything with Blosxom you&#8217;ll be using a
variety of plugins. That didn&#8217;t seem like it would pose a problem as
there was a well organized <a href="http://www.blosxom.com/plugins/">plugin directory</a> to work from.</p>

<p>And then you realize that there are 4 different plugins that all
purport to do the same thing. The one you decide to try turns out to
have been written over two years ago, and that it hasn&#8217;t been touched
since.  Never a good sign. And then you try to download it, only to
discover that the site that it was posted on is long gone. </p>

<p>Sadly, this is not an uncommon story in the Open Source
world. Just look at your average search on SourceForge or Freshmeat -
let alone the endless shady PHP classes debacle. But I kept coming
across people who were using Blosxom as their blog tool, and so I
figured it couldn&#8217;t be that bad, and persisted.</p>

<p>I ended up using the following plugins:</p>

<ul>
<li>breadcrumbs</li>
<li>config</li>
<li>whoami</li>
<li>calendar</li>
<li>file</li>
<li>page_titler</li>
</ul>

<p>They all had POD documentation in their files, but it was usually
rather sparse, and just getting to the point where you knew whether or
not a plugin was even running took quite a while. </p>

<p>In proper Free Software fashion, I finally figured out what was going
on when I delved into the source. <em>Yes</em> I&#8217;d read about the difference
between head.flavour and story.flavour, but it didn&#8217;t quite sunk in
until I was trying to use the username from the whoami plugin in the
blog header, and finally clued in that no, that wasn&#8217;t available <em>yet</em>
because the <code>&#036;whoami::username</code> variable that whoami populated was
based on the individual <strong>item</strong>, not &#8220;the page as a whole&#8221; even
though it was &#8220;my&#8221; blog. Fair enough, but like I said, a bit of a
learning curve.</p>

<h2>Syndication</h2>

<p>Blosxom has a MultiViews notion it calls flavours. You ask for
<code>index.html</code>, get a web page; ask for <code>index.rss</code> and <em>ta-da</em> you get a
different stylesheet view of the blog data that&#8217;s an RSS
feed. So I figured that getting an RSS feed out of it was going to be a
snap.</p>

<p>Except for one thing: the rss flavour built into Blosxom <strong>doesn&#8217;t</strong>
<a href="http://feedvalidator.org/">validate</a>, at least not without fudging the content type
and taking out all the HTML.
<em>(props to <a href="http://ctd.id.au/">Chris Deigan</a> for pointing me at that
tool)</em>.</p>

<p>But hang on - everyone else was is using HTML in their feeds. So I
thought &#8220;Maybe it&#8217;s because my feed is RSS 0.91&#8221;, and tried upgrading
to RSS 1.0. I had a hint something was wrong when I came across these
plugins:</p>

<ul>
<li>rss10 </li>
<li>atomfeed</li>
</ul>

<p>Why should I need a plugin to fix the RSS flavour to be 1.0 compliant?
Whatever. I grabbed the plugin, figured out how to use it&#8230; and it
<em>still</em> wouldn&#8217;t validate. So I started looking at it, and discovered
RSS 1.0 it was nothing even remotely close to 0.9 - (its full of
namespaces <em>on attributes</em>. Yuk!) - and that Blosxom&#8217;s idea of RSS 1.0
wasn&#8217;t remotely close to validating either. So I hacked on the
templates, and eventually got a RSS 1.0 feed that passed.</p>

<p>Except for one thing. It didn&#8217;t grok HTML in descriptions either. WTF?</p>

<p>So then I tried Atom, had to hack <strong>it</strong> a touch to validate, and
<em>finally</em> I had a valid feed with HTML in it. My feed reader 
<a href="http://liferea.sourceforge.net/">liferea</a> groked it perfectly as well. Yeay!</p>

<h2>The coming of wisdom</h2>

<p>At that point, exhausted, I figured that RSS 2.0 would be a further
nightmare, but being too punch drunk to know when to stop, I
glanced at the spec anyway. And to my amazement, it was
<em>Really Simple</em>. Imagine. With all I&#8217;d learned, I had a valid feed in
no time, creating my own </p>

<ul>
<li>rss20</li>
</ul>

<p>plugin and flavour template. </p>

<p>But wait, you say, embedded XHTML isn&#8217;t a valid part of RSS 2.0! Well,
you&#8217;re right. In the end I finally discovered the technique of wrapping
the description content with <code>&lt;?[CDATA[ ...  ]]&gt;</code>. Slap that around
the body text and you escape any and all non RSS entities. Damn and
other comments. If I&#8217;d thought of that earlier I could have saved
myself a lot of trouble.</p>

<h2>Download</h2>

<p>You can download my fixes to the
<a href="http://research.operationaldynamics.com/projects/blosxom/atom/atomfeed">atomfeed</a> plugin (it&#8217;s self
contained), or grab my <a href="http://research.operationaldynamics.com/projects/blosxom/rss20/rss20">rss20 plugin</a>
and <a href="http://research.operationaldynamics.com/projects/blosxom/rss20/">rss20 flavour files</a>, which have been
working really well.</p>

<p><em>Sigh</em>. Look at me. RSS guru. Yup. You know those little boxes on
government forms that ask you how long it took to complete the
process? &#8230; You don&#8217;t want to know.</p>

<p>AfC</p>
</div>
</content>
</entry>


</feed>
