Operational Dynamics
Technology, Strategy, and IT Operations Consulting   |   Open Source Research and Development   |   Blogs

The Parchment File Format

Download

Parchment File Format spec, version 5.0.

Abstract

The on-disk file format used by the Quill what-you-see-is-what-you-need document editor and presented by the Parchment what-you-get-is-what-you-want rendering engine is a little unconventional: it's text based, so you can use version control. The schema is extraordinarily simplistic, meaning that it can be worked with programmatically. More importantly, it can be fixed by humans when it breaks. And finally, the format is actually designed to support technical writing and technical writers.

Not to worry; this is not the dawn of the millennium; the program will probably crash and eat your data. All is as it should be.

But if you're interested in the design decisions that led to the Parchment File Format being what it is, then you will find here some background along with our documentation of the "Manuscript Schema" (found in the .parchment files), and the "Quack Schema" (the chapters in .xml files that make up those manuscripts).

Summary

The Parchment File Format is made up of the Quack Schema for chapter bodies, and a Manuscript Schema for the containers which specify the document as a whole. These are fully described in a document written with Quill (it's self-hosting! :) You can download and read ParchmentFileFormat.pdf [19 pages, 168.0 kB] if you're interested.

The sources for the spec are in the source tree at doc/format/ParchmentFileFormat.parchment.

What! You're not working in ODF?

That's right. One of the major aims of Quill was to make a document processor that uses an on-disk format that is line oriented text, so we can store it in a proper version control system like Bazaar and in turn use that for collaborating with others. ODF is an horribly complex format, and just as importantly is stored in a compressed binary blob. You can't use VCS tools with something like that.

So what are you using?

LaTeX is often cited when people think about content vs presentation separation in documents processing. We chose to steer clear of LaTeX; as an on disk format it is horribly ugly to work with, you can't easily validate a document written in it. Most of all the content is buried underneath a mountain of markup, and in practise the semantic markup tags are far outnumbered by tags that are used to manipulate the what the presentation will look like. So LaTeX was out.

The subset of the DocBook XML that is used for O'Reilly books was the inspiration for the editor's internal data model, and so not surprisingly Quill started life as something that worked in that format, or at least tried to.

We had to keep in mind, though, that we're not trying to write something that can handle arbitrary input; after all having to parse and render arbitrarily complex HTML pages is why web browsers are such huge pieces of software. Likewise, we're not trying to write a generic DocBook editor; we're trying to write an editor that just happens to use something.

Examples

Some of everything

In developing Quill and Parchment we have of course been working with an example document which has a little bit of just about everything you can do with the program. Naturally, it makes a good reference for what the file format looks like on disk.

Chapter bodies are stored in the Quack Schema and it looks like this: SomeOfEverything.xml.

The container that describes a document as a whole is the Manuscript Schema and is very simple, as can be seen here: ExampleDocument.parchment.

Want to know more? Print the spec and have a read. :)

Conference paper

Quill and Parchment was designed in no small part to handle conference papers and other technical memoirs. Surviving Change is a paper we have presented to numerous audiences worldwide and is typical of the sort of writing we do. Ensuring the Parchment File Format could support everything that was needed to represent this document was a major goal for much of the past year.

Here's what it looks like in Nautilus:

There you see the document wrapper SurvivingChange.parchment (that's the one in Manuscript format that you can "open"), the various chapter files (with content in Quack schema) which make up its content in TheoryPractice.xml, ProfessionOfArms.xml, ScientificMethod.xml, and Reference.xml. A number of images that are included in the document are visible too. SurvivingChange.dic is the document word list, and of course SurvivingChange.pdf is the rendered paper.

Last: Download, Previous: Screenshots, Up: Overview.


These documents copyright © 2008-2012 Operational Dynamics Consulting, Pty Ltd unless otherwise noted. Copyright of work by other authors is retained by those authors. Most material in this site is made available under an Open Source or Open Content licence; see the top level LICENCE file in that project's source repository for details

Files on this site (notably program code) will be delivered as Content-Type: text/plain unless syntax-highlighted or binary. All times UTC