Operational Dynamics
Research and Development   |   Projects   |   Blogs   |   Source Code   |   Linux
January
Mon Tue Wed Thu Fri Sat Sun
 
     

hackergotchi
This section:

Blog postings by Operational Dynamics partners and staff

Use the links at top left for a consolidated feed of all the posts made on this site.

Please note the disclaimer at the bottom of this page.

RSS 2.0 Atom 0.3

Fri, 25 Jan 2008

The arcane secrets of hash-bang

I’ve been working for a while now prototyping various different domain specific approaches to modelling software configuration information. Most of these involve putting the configuration data in the body of an executable script. To that end, I’ve been digging in to how interpreted scripts actually work on Linux and other Unix-like operating systems work.

#! interpreter

Anyone who has ever written a Shell script, Perl program, or Python program is familiar with #! lines:

    #! /bin/sh
    #
    # A program to do something very special.
    #

    echo "Hello World"

and

    #! /usr/bin/perl
    #
    # Another program to do something very special.
    #

    while (<>) {
        print "Hello World\n";
    }

etc. The program mentioned after the magic #! characters is the program that will interpret the script. There are many gotchas with that (notably portability concerns owing to the fact that some idiotic flavours of Unix don’t put Perl in /usr/bin, that sort of thing)

I always figured that the script file got piped by the OS into the interpreter on stdin. A reasonable guess given the way that most of the tools we use work, but it turns out it is nothing of the sort. Every time I tried to write an interpreter (in C) I got stuck.

What threw me off was that cat works as an “interpreter”:

    #! /bin/cat
    This is the script body
    which will happily be sent
    to stdout

If you put that in a file called script and run that from your terminal, then sure enough,

$ ./script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$

which of course is exactly what would happen if you did:

$ cat < script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$

and that’s what totally had me on the wrong track. I figured that the interpreter on the #! line was being fed the body of the executing file on stdin. Nope.

Seek and ye shall find, sort of

I thought that I might be able to find out what was going on by reading the code of an interpreter program. I started by looking at the sources for /sbin/runscript (which is on the #! line for all of Gentoo Linux’s RC scripts), expecting that to be quite simple. It was simple. Too simple. All it does is some environment filtering and then fires off bash to run /sbin/runscript.sh (in other words, it’s largely a workaround for the fact that you can’t actually make a shell script itself an interpreter). Nothing at all in there about reading stdin. So then I looked at the source code for Perl (Whoa, there’s a beast). Nothing obvious there either. Lots of stuff about reading from stdin but nothing about that being the origin of the script to be executed. A lot of messing around with argument signatures though.

#! is not exactly the easiest term to put into a search engine. I did, however, happen to know that one of the ways #! is pronounced is “hash bang” (being two common names for the respective characters, though lots of old suspender-snapping sandal-wearing bearded Unix freaks would, I’m sure, tell you with great passion that it has to be pronounced some other way). Searching on “hash bang” brought up lots of arcana, including something that lead me to an obscure article by one Andries Brouwer on the parameter signature at invocation wherein I discovered that there is a calling convention for how arguments are passed to the interpreter program being invoked.

It’s a bit complicated, since you can have command line arguments for both the interpreter and for the script being run. It goes something like this. Let’s say you have an script that begins with the following:

    #! /path/to/program -v -d

(with -v perhaps meaning “verbose” and -d perhaps meaning “debug”) and you have it in a file called ./script, then running it will actually cause program to execute. The trick is, with what arguments? Check this out. If you do:

$ ./script -p -r

(with -r and -p, for the sake of illustration, perhaps having the same meanings as cp, that is “preserve” and “recursive” respectively) then when our interpreter program is executed, it will be invoked with the following arguments:

/path/to/program -v -d ./script -p -r

the mapping is a bit obscure. It’s actually:

argv0 argi argn args…
argv[0] argv[1] argv[2] argv[3] argv[4]
/path/to/program -v -d ./script -p -r

(to use the terminology in the above link). This all shed a little light on what I’d seen in runscript.c and perl.c, but still not a single mention of the script being fed in on stdin. So I pondered that for a while longer, until finally the light bulb went off.

Eureaka & Company

The reason I couldn’t find any mention of ./script being fed in on stdin is because is is not fed in on stdin. You don’t need it to be: you’ve got the name of the script file fed to you in your interpreter’s argument list (from the above example, it’s in argv[2], one happy looking string containing “./script”). So read it already!

    FILE* body;

    body = fopen(argv[2], "r");
    ...

and ta da, that’s where you get your script’s program body from. Now you can at last get on with parsing your script, and running it.

Most big programs spend lots of time munging the argument list, dealing with the fact that argv[1] could be full of all sorts of stuff jammed into, or nothing, etc. The whole thing goes from elegant to clumsy when you discover that if there are no arguments to the interpreter on the #! line then the script file will be in argv[1], and it goes to nightmare level when you look at the list of variations in behaviour across different operating systems, compiled by one Sven Mascheck. Nonetheless, the interpreter is your program, and presumably you can recognize, parse, and skip over zero, one or more arguments to yourself before deciding you’ve reached the name of the script. Judicious use of argv++; argc--; is your friend here, apparently.

Anyway, this all explains why my cat example was working but my own efforts were not. cat is not reading data being fed to it on stdin (which is cat’s behaviour if you run it without any arguments), it’s being executed with an argument, namely ./script as argv[1], ie exactly the same as:

$ cat ./script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$

But now that I know what’s going on, I can write my own interpreter.c:

    #include <stdio.h>
    #define LEN 128

    int main(int argc, char** argv) {
        char buf[LEN];
        FILE* body;

        body = fopen(argv[1], "r");
        while (fgets(buf, LEN, body) != NULL) {
            printf("%s", buf);
        }
        fclose(body);

        return 0;
    }

and if I compile that to interpreter, then I can write a domain specific language that is interpreted by this program, say:

    #! ./interpreter
    This is a test of the Emergency Broadcast System

in a file called script, then, at last,

$ ./script
This is a test of the Emergency Broadcast System
$

Yeay!

Ok, so that’s cat, but cat is the Hello World of input/output. :) The real point is that running script caused interpreter to be executed, and interpreter got at the body of the script that was “run”, and was able to do something with it. Onwards at last.

AfC


Comments

Julio Merino Vidal wrote in suggesting:

Take a look at NetBSD’s script(7) manual page for some more details
about how that is supposed to work and some things you must consider
for portability (such as being able to feed a single argument to the
interpreter through the #! line, or the maximum length of it).

Updates

Quite by accident, I just came across the related information for Linux; see the execve(2) man page for a succinct treatment of both exec()‘ing in general, and the topic of interpreting scripts.

Thu, 24 Jan 2008

Free Range Software

Jon Hall writes of his experience in a restaurant talking with its owner about “Free Range Eggs”:

“… but we have to charge money for our eggs. People who don’t acknowledge that just do not want to understand the term ‘Free Range’ for what it really means … better eggs, and changing the term will not help that.”

The fact that the discussion started because of maddog’s suggestion that maybe they should be called “open range eggs” to eliminate the confusion is not the point (“that’s silly” the owner said, “everyone calls them free range eggs”). The term we use, Free Software, has a bigger problem. Consider the difference between:

Clearly, the term is “Free Range”, and applies as an adjective to “Eggs”, whereas the latter really does mean “free eggs”. Now consider this:

There’s something missing, and so the term free gets connotated as having to do with price.

No, I’m not about to say that we should call it Free Range Software [and while “let it run free!” is a lovely metaphor, I don’t quite think we want to be associating our work with chicken farming :)]. Perhaps someone will come up with an intermediate word that will do the trick. To be honest, though I’ve pretty much given up on the term Free Software; I write Open Source software, and the cause I advocate is Software Freedom.

And when people still stare at you blankly, you can say “you know, like Linux” and watch as comprehension dawns. To be sure, they probably still don’t get it, but chances are you’ve got more important things to talk about, and getting on with it is going to do you — and logiciel libre — a lot more good than getting lost because of the insufficiencies of the English language.

AfC

Thanks to Atul Chitnis for having passed on the link.

Wed, 16 Jan 2008

Reusing Experience

I came across an interesting comment yesterday:

The documentation doesn’t help you much though. First, it is not sufficient and second (and important) you do not learn much from the documentation.

Thankfully, you have the source code and I really appreciate the source code of Eclipse is open. That’s because only from source you can learn as much as necessary. And this fact leads me to think more and more that open source is not about “reusing software” (commercial companies do that as well) but about “reusing experience” which is hidden inside the source.

I have the strong belief that people who write the software are more important than the software itself and by looking to the source code you can gain at least part of experience of people who wrote it. That’s the power of open source, that’s “reusing experience” concept at work!

This observation was written a few years ago by one Alexander Dymo who was expressing his amazement at the whole Software Freedom thing, especially the hands-on side of it. And as for his “people are more important than software”, well, hey; I couldn’t agree more. Enlightened organizations that want to preserve their strategic advantage understand that the people who are capable of working with such code are their competitive advantage. They are the ones who can reuse experience and leverage the power of the most wide reaching and enabling social phenomenon we have ever seen: the global openness movement.

Towards a technical definition of Open Source

I’m on a bit of a kick at the moment working to elucidate a practical technical definition of Open Source (ie, complementary to the necessary legal foundation which enables Free Software and the requisite social interaction which is at the heart of global Open Source communities). Comments like the one above are helping refine my thoughts on the topic: yes the interaction of licence and copyright law gives us structure, and yes communication between people distributed the world over is the genesis of the sense of community, belonging, and accomplishment which is the rich social fabric within which our software development takes place, but there is also a pragmatic aspect: can you actually work with the source code, get right into it, experiment with it, break it, and do crazy things with it?

That the four GNU freedoms stipulate that this must be “permitted” doesn’t really change the fact that there are practical prerequisites. Can you easily get the sources under development? Do those sources actually build? Is there a clear mechanism for contributing source back to the project and are they actively facilitating such contributions?

The source tarball as primary release artifact

The biggest give-away is whether the primary release artifact is source or an opaque binary.¹ Especially in the Java world but elsewhere as well, there is a surprising amount of activity for which, despite the fact that it may legally be Free Software and its loud proclamations to being Open Source, it remains clear that they just don’t get it: there are a huge number of projects and products which only do their releases in binary form. Bad sign when you start calling it a “product”, I think. In a frustratingly large number of cases, if you try to actually work with their code you will discover that it doesn’t actually compile! In extreme, there are statements like:

We distribute source but never claimed that you can build it out of the box. We don’t have time for such things.

I didn’t believe it when I saw it, but one of the hackers of a supposedly Open Source project actually said that in response to a bug report. Astonishing.

Being able to duplicate the result is a rigour that goes far beyond software; indeed it has been the bedrock of science — and human progress — since the dawn of the age of reason. Back to the present day, it has suddenly become obvious to me that the fundamental technical difference between proprietary Unix from the commercial vendors (not to mention proprietary operating systems from Microsoft and Apple) and Linux is that in our world (taken to mean the entire continuum of FOSS communities) the primary release artifact for all upstream projects is a source release (these days it’s typically in a .tar.bz2, but whether it is that or .zip or whatever else isn’t terribly relevant) that you can build. Not a tarball full of already built .class files and .jar files. Not a “source .zip” for Eclipse. Not a self-extractable full of .exes and .dlls. No. A source release is source code that someone else can build, right out of the tarball. THAT, ladies and gentlemen, is the technical definition of Open Source.

I’ve started to realize that this area is a big stumbling block. People I collaborate with in (upstream) global projects like GNOME, Free Java, Bazaar and elsewhere take the efficacy and primacy of source releases entirely for granted. But a number of clients that my firm is working with to enable Open Source often just don’t get why they should be doing source releases — and resist it.

Whether you buy into Software Freedom or not is a different topic (and your decision, not mine to impose on you), but if you’ve got a software project for which you’ve decided to free the source, you want it to be successful, right? Success is a project that which inspires user enthusiasm, which the major distributions can package and ship without hassle, and around which a vibrant community of developers grows, ultimately fostering new contributions. There are a lot of steps along the way, but a buildable source release is, in our view, the technical bar you’ve got to reach. Otherwise you’re not making releases of the software, you’re abandoning it. And your users.

AfC

¹Note that this is different from a Linux or Unix distro shipping binary packages — no matter if it was Fedora, Debian, Gentoo, FreeBSD, OpenSolaris or anyone else, they should still have been able to build their package from source. It matters little whether that compilation took place in a build farm somewhere or on the user’s desktop (which is what happens in the “binary” distros when someone wants to work on the package or the project itself) — the key point remains that we can build the software if we want to, and are not forced into relying on someone else’s proprietary (and more to the point, unavailable) tooling. If you can’t build it, it’s not Open Source.

Fri, 11 Jan 2008

Fascinating thread: FOSS Quality Control

A long-time critic of things Open Source showed up on the Classpath project’s mailing list and asked some rather provoking questions in a thread titled “Quality control and FOSS rant”. He at least ended with: “I suppose this is more of a troll than a criticism, sorry about that.”

Despite the flame bait, the thread contained some surprisingly insightful replies. It’s always great to hear some of the top software developers in the world noting their motivations and why they believe what they do works.

From Roman Kennke:

Both approaches (closed and open) apparently tend to produce relatively high quality code (or really crappy code, happens in both camps), where with the closed approach the developers (or vendors) have to take over 100% responsibility (because the end user has no way to interact with the development), which usually makes things very formal and slow, where the open approach relies very much on the end users reporting problems. In most active projects these are fixed really quickly, giving both the developers and the end users a warm fuzzy feeling ;-)

From Andrew John Hughes

There’s a lot to be said for feedback and interaction with your users that’s often overlooked. All the ideas of complicated quality control processes in the world is not going to make a user feel as loved as seeing someone responding quickly to their bug and fixing it in a short space of time.

From Mark Wielaard, a remark on the complex administrative process used by the project to review contributed code:

We do have a flow chart that people have to follow when contributing… It is all very formal really: http://gnu.wildebeest.org/~mark/patch.png

and from Archie Cobbs, a reminder about the track record of a certain formerly proprietary process on solving bug desperately desired by their user community:

The number #1 voted bug in their bug database has been unfixed for over 5 YEARS!

The comments on that bug make for hilarious reading, but the bigger point is this: the identity of the people making the decisions about the relevance of the issue are hidden. That sort of thing doesn’t inspire much hope for people on the outside. It’s not like we’re talking about national security or the future of western democracy; it’s a bug report that turned into a feature request for a piece of software that many, many people depend on! No one likes to be fed the line that their problem is so Top Secret that they won’t be told when (or even if) the problem will be addressed. The cloak of anonymity strikes again.

Fascinating thread.

AfC

Don’t feed the locals

Spend a few weeks hiking in Tasmania over New Year’s…

Hazzard beach
Hazzard beach, looking south to Mt Freycinet and Mt Graham in Freycinet National Park, Tasmania, Australia.

No water here, or (anywhere else in the park for that matter). It’s only 600m up, but it’s a fairly steep climb. Lucky to have had gorgeous weather, but whoa was it hot. Packed in 10L of water. Heavy. Still ran low; should have brought even more. I used to think those pocket desalinaters were gimmicks, but I’m having second thoughts now.

Freycinet Peninsula
The view east from Mt Graham, Freycinet National Park, Tasmania, Australia.

I like cooking pancakes when I’m trekking and make sure to bring maple syrup along, of course. Apparently, the wallabies at Wineglass Bay also like pancakes. The pot lid kept this one out of the batter, but didn’t prevent it from feeling free to lick the spoon.

The morning, the tent, the pancakes, and the wallaby
Breakfast at Wineglass Bay in Freycinet National Park, Tasmania, Australia. Photo by Katrina Ross


There are some astoundingly beautiful National Parks in Tasmania, although you have to drive for ever to get to any of them — forget about public transit; you’re renting a car so you can park it for a week when you get there. Great.

Worth it for the views, though. You get to missing mountains, sometimes.

Southwest from Mount Rufus
Looking southwest from Mt Rufus in Lake St Clair National Park, Tasmania, Australia.

You can also forget about seeing any old growth forest; while the National Parks in the state cover an impressive amount of ground, the formation of these reserves appears to have come long after the bulk of the big timber was removed, and logging continues to happen in protected areas. From high up on the mountains you can plainly see the clear-cut areas, which is a shame, because properly managed forests can be a renewable resource. The trouble, however, it takes on the order of 70 years before a plot is ready for harvesting. Most people aren’t really that patient, and clear cutting is often “easier”.

What really gets me, though are the “state forests” which are marked as “multi-use”. Funny how there aren’t many trees left. Same thing happens all over - Canada’s “National Parks” are “multi-use”; take a drive through the Rockies from Calgary to Vancouver and you’ll keep coming to National Parks that are “temporarily closed for logging, no camping” and clear-cut. Then there is the activity of the American federal “Forestry Service” (which is in the business of building roads so that it easier for logging companies to mow down said forests). Australia, it seems, is no different.

And this from a guy who is otherwise pro-logging. Lumber is an essential construction material, and paper will remain the essence of recording and disseminating information for a long time to come. An immense number of jobs come from their production. But if we want to have those jobs in the future, and if we want the forests to continue to be viable and not have all the soil wash away in the next hurricane, the forest industry must be incented to contribute to the sustainment of the land. It’s pretty simple: no soil, no new growth of any kind, period. The only thing I can think of is something like a reverse carbon tax: for every large tree of a certain diameter, etc they can prove they didn’t cut down, they’d get a tax credit. Somehow, though, I’m guessing that all that would result in is plenty more bureaucracy but not a whole lot more in the way of sensible land management.

Anyway, I digress. The things you think about when you’re walking.

Trail approaching Russell Falls
Approaching Russell Falls in Mt Field National Park, Tasmania, Australia.

Beautiful.

AfC


RSS 2.0 Atom 0.3 Category Specific Feeds. Use these links for an RSS or ATOM feed limited to this category and its descendants. Technorati Profile


Material on this site copyright © 2005-2008 Operational Dynamics Consulting Pty Ltd, unless otherwise noted. All rights reserved. Not for redistribution or attribution without permission in writing.

We make this service available to our staff in order to promote the discourse of ideas especially as relates to the development of Open Source worldwide. Blog entries on this site, however, are the musings of the authors as individuals and do not represent the views of Operational Dynamics. All times UTC.