<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<?xml-stylesheet href="http://research.operationaldynamics.com/blogs/atom.css" type="text/css"?>

<feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">

<title type="text/plain">Andrew Cowie</title>
<tagline type="text/plain">Blog postings by Andrew Cowie about Open Source
and Software Development. This section is about the systems used to build
applications, run tests, and deploy to production, be they small standalone
programs or huge e-commerce platforms.</tagline>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs" />
<id>tag:research.operationaldynamics.com,2008:/andrew/software/build-systems</id>
<generator url="http://www.blosxom.com/" version="2.0">Blosxom</generator>
<modified>2008-11-18T01:36:00Z</modified>

<entry>
<id>tag:research.operationaldynamics.com,2008:/andrew/software/build-systems/hash-bang</id>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs/andrew/software/build-systems/hash-bang.html" />
<title type="text/plain">The arcane secrets of hash-bang</title>
<dc:subject>/andrew/software/build-systems</dc:subject>
<issued>2008-01-25T11:40:00Z</issued>
<modified>2008-01-25T11:40:00Z</modified>
<author>
  <name>Andrew Cowie</name>
</author>
<content type="text/html" xml:base="http://research.operationaldynamics.com/blogs" xml:lang="en" xml:space="preserve" mode="escaped">
<![CDATA[<p>I&#8217;ve been working for a while now prototyping various different domain specific approaches to modelling software configuration information. Most of these involve putting the configuration data in the body of an executable script. To that end, I&#8217;ve been digging in to how interpreted scripts actually work on Linux and other Unix-like operating systems work.</p>

<h2><code>#!</code> interpreter</h2>

<p>Anyone who has ever written a Shell script, Perl program, or Python program is familiar with <code>#!</code> lines:</p>

<pre><code>    #! /bin/sh
    #
    # A program to do something very special.
    #

    echo "Hello World"
</code></pre>

<p>and</p>

<pre><code>    #! /usr/bin/perl
    #
    # Another program to do something very special.
    #

    while (&lt;&gt;) {
        print "Hello World\n";
    }
</code></pre>

<p>etc. The program mentioned after the magic <code>#!</code> characters is the program that will interpret the script. There are <em>many</em> gotchas with that (notably portability concerns owing to the fact that some idiotic flavours of Unix don&#8217;t put Perl in <code>/usr/bin</code>, that sort of thing)</p>

<p>I always figured that the script file got piped by the OS into the interpreter on <code>stdin</code>. A reasonable guess given the way that most of the tools we use work, but it turns out it is nothing of the sort. Every time I tried to write an interpreter (in C) I got stuck.</p>

<p>What threw me off was that <code>cat</code> works as an &#8220;interpreter&#8221;:</p>

<pre><code>    #! /bin/cat
    This is the script body
    which will happily be sent
    to stdout
</code></pre>

<p>If you put that in a file called <code>script</code> and run that from your terminal, then sure enough,</p>

<pre style="background: black; color: white; padding-left: 15px; padding-top: 10px; padding-bottom: 10px; width: 60%; min-width: 400px;" ><code>$ ./script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$
</code></pre>

<p>which of course is exactly what would happen if you did:</p>

<pre style="background: black; color: white; padding-left: 15px; padding-top: 10px; padding-bottom: 10px; width: 60%; min-width: 400px;" ><code>$ cat < script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$
</code></pre>

<p>and that&#8217;s what totally had me on the wrong track. I figured that the interpreter on the <code>#!</code> line was being fed the body of the executing file on <code>stdin</code>. Nope.</p>

<h2>Seek and ye shall find, sort of</h2>

<p>I thought that I might be able to find out what was going on by reading the code of an interpreter program. I started by looking at the sources for <code>/sbin/runscript</code> (which is on the <code>#!</code> line for all of Gentoo Linux&#8217;s RC scripts), expecting that to be quite simple. It was simple. Too simple. All it does is some environment filtering and then fires off <code>bash</code> to run <code>/sbin/runscript.sh</code> (in other words, it&#8217;s largely a workaround for the fact that you can&#8217;t actually make a shell script itself an interpreter). Nothing at all in there about reading <code>stdin</code>. So then I looked at the source code for Perl (Whoa, there&#8217;s a beast). Nothing obvious there either. Lots of stuff about reading from <code>stdin</code> but nothing about that being the origin of the script to be executed. A lot of messing around with argument signatures though.</p>

<p><code>#!</code> is not exactly the easiest term to put into a search engine. I did, however, happen to know that one of the ways <code>#!</code> is pronounced is &#8220;<strong>hash bang</strong>&#8221; (being two common names for the respective characters, though lots of old suspender-snapping sandal-wearing bearded Unix freaks would, I&#8217;m sure, tell you with great passion that it has to be  pronounced some other way). Searching on &#8220;hash bang&#8221; brought up lots of arcana, including something that lead me to an obscure article by one Andries Brouwer on the <a href="http://homepages.cwi.nl/~aeb/std/hashexclam-1.html#ss1.3">parameter signature at invocation</a> wherein I discovered that there is a <em>calling convention</em> for how arguments are passed to the interpreter program being invoked.</p>

<p>It&#8217;s a bit complicated, since you can have command line arguments for both the interpreter and for the script being run. It goes something like this. Let&#8217;s say you have an script that begins with the following:</p>

<pre><code>    #! /path/to/program -v -d
</code></pre>

<p>(with <code>-v</code> perhaps meaning &#8220;verbose&#8221; and <code>-d</code> perhaps meaning &#8220;debug&#8221;) and you have it in a file called <code>./script</code>, then running it will actually cause <code>program</code> to execute. The trick is, <em>with what arguments</em>? Check this out. If you do:</p>

<pre class="terminal"><code>$ ./script -p -r
</code></pre>

<p>(with <code>-r</code> and <code>-p</code>, for the sake of illustration, perhaps having the same meanings as <code>cp</code>, that is &#8220;preserve&#8221; and &#8220;recursive&#8221; respectively) then when our interpreter <code>program</code> is executed, it will be invoked with the following arguments:</p>

<pre><code>/path/to/program -v -d ./script -p -r
</code></pre>

<p>the mapping is a bit obscure. It&#8217;s actually:</p>

<table border="0" cellpadding="5">
<tr>
<td><i>argv0</i></td> <td><i>argi</i></td> <td><i>argn</i></td> <td><i>args&#8230;</i></td> <td></td>
</tr>
<tr>
<td><code>argv[0]</code></td> <td><code>argv[1]</code></td> <td><code>argv[2]</code></td> <td><code>argv[3]</code></td> <td><code>argv[4]</code></td>
</tr>
<tr>
<td><code>/path/to/program</code></td> <td><code>-v -d</code></td> <td><code>./script</code></td> <td><code>-p</code></td> <td><code>-r</code></td>
</tr>
</table>

<p>(to use the terminology in the above link). This all shed a little light on what I&#8217;d seen in <code>runscript.c</code> and <code>perl.c</code>, but still not a single mention of the script being fed in on <code>stdin</code>. So I pondered that for a while longer, until finally the light bulb went off.</p>

<h2>Eureaka &amp; Company</h2>

<p>The reason I couldn&#8217;t find any mention of <code>./script</code> being fed in on <code>stdin</code> is because is is <em>not</em> fed in on <code>stdin</code>. You don&#8217;t need it to be: <strong>you&#8217;ve got the name of the script file</strong> fed to you in your interpreter&#8217;s argument list (from the above example, it&#8217;s in <code>argv[2]</code>, one happy looking string containing &#8220;<code>./script</code>&#8221;). So read it already!</p>

<pre><code>    FILE* body;

    body = fopen(argv[2], "r");
    ...
</code></pre>

<p>and ta da, that&#8217;s where you get your script&#8217;s program body from. Now you can at last get on with parsing your script, and running it.</p>

<p>Most big programs spend lots of time munging the argument list, dealing with the fact that <code>argv[1]</code> could be full of all sorts of stuff jammed into, or nothing, etc. The whole thing goes from elegant to clumsy when you discover that if there are no arguments to the interpreter on the <code>#!</code> line then the script file will be in <code>argv[1]</code>, and it goes to nightmare level when you look at the list of <a href="http://www.in-ulm.de/~mascheck/various/shebang/">variations in behaviour across different operating systems</a>, compiled by one Sven Mascheck. Nonetheless, the interpreter is your program, and presumably you can recognize, parse, and skip over zero, one or more arguments to yourself before deciding you&#8217;ve reached the name of the script. Judicious use of  <code>argv++; argc--;</code> is your friend here, apparently.</p>

<p>Anyway, this all explains why my <code>cat</code> example was working but my own efforts were not. <code>cat</code> is <em>not</em> reading data being fed to it on <code>stdin</code> (which is <code>cat</code>&#8217;s behaviour if you run it <em>without</em> any arguments), it&#8217;s being executed <em>with</em> an argument, namely <code>./script</code> as <code>argv[1]</code>, ie exactly the same as:</p>

<pre style="background: black; color: white; padding-left: 15px; padding-top: 10px; padding-bottom: 10px; width: 60%; min-width: 400px;" ><code>$ cat ./script
#! /bin/cat
This is the script body
which will happily be sent
to stdout
$
</code></pre>

<p>But now that I know what&#8217;s going on, I can write my own <code>interpreter.c</code>:</p>

<pre><code>    #include &lt;stdio.h&gt;
    #define LEN 128

    int main(int argc, char** argv) {
        char buf[LEN];
        FILE* body;

        body = fopen(argv[1], "r");
        while (fgets(buf, LEN, body) != NULL) {
            printf("%s", buf);
        }
        fclose(body);

        return 0;
    }
</code></pre>

<p>and if I compile that to <code>interpreter</code>, then I can write a domain specific language that is interpreted by this program, say:</p>

<pre><code>    #! ./interpreter
    This is a test of the Emergency Broadcast System
</code></pre>

<p>in a file called <code>script</code>, then, at last,</p>

<pre style="background: black; color: white; padding-left: 15px; padding-top: 10px; padding-bottom: 10px; width: 60%; min-width: 400px;" ><code>$ ./script
This is a test of the Emergency Broadcast System
$
</code></pre>

<p>Yeay!</p>

<p>Ok, so that&#8217;s <code>cat</code>, but <code>cat</code> is the Hello World of input/output. <code>:)</code> The real point is that running <code>script</code> caused <code>interpreter</code> to be executed, and <code>interpreter</code> got at the body of the script that was &#8220;run&#8221;, and was able to do something with it. Onwards at last.</p>

<p>AfC</p>

<hr/>

<p><strong>Comments</strong></p>

<p>Julio Merino Vidal wrote in suggesting:</p>

<blockquote>
  <p>Take a look at NetBSD&#8217;s <a href="http://netbsd.gw.com/cgi-bin/man-cgi?script+7+NetBSD-current"><code>script(7)</code></a> manual page for some more details <br/>
  about how that is supposed to work and some things you must consider <br/>
  for portability (such as being able to feed a single argument to the <br/>
  interpreter through the <code>#!</code> line, or the maximum length of it).</p>
</blockquote>

<p><strong>Updates</strong></p>

<p>Quite by accident, I just came across the related information for Linux; see the <a href="man:execve"><code>execve(2)</code></a> man page for
a succinct treatment of both <code>exec()</code>&#8216;ing in general, and the topic of interpreting scripts.</p>
]]>
</content>
</entry>

<entry>
<id>tag:research.operationaldynamics.com,2006:/andrew/software/build-systems/cargo-hierarchy</id>
<link rel="alternate" type="text/html" href="http://research.operationaldynamics.com/blogs/andrew/software/build-systems/cargo-hierarchy.html" />
<title type="text/plain">Understanding Cargo</title>
<dc:subject>/andrew/software/build-systems</dc:subject>
<issued>2006-01-21T08:52:00Z</issued>
<modified>2006-01-21T08:52:00Z</modified>
<author>
  <name>Andrew Cowie</name>
</author>
<content type="application/xhtml+xml" xml:base="http://research.operationaldynamics.com/blogs" xml:lang="en" xml:space="preserve" mode="xml">
<div xmlns="http://www.w3.org/1999/xhtml"><p>One of my clients has me working on revamping the infrastructure they use to build their products and run functional tests across them. They&#8217;re a Java shop, and so it&#8217;s no surprise that their product, a rather large web application, is built in Java Servlets and JSP; since they target a wide range of enterprise customers they need to test their app in as many application server &#8220;containers&#8221; as possible.</p>

<p>Not terribly unusual, but when you&#8217;re trying to run <em>automated</em> tests, it gets tricky. Although in theory one should be able to interchangeably use different app-servers, the different vendors (be they open source or commercial) who have implemented the Servlet, JSP, and J2EE specs all have their quirks. Even assuming the thing you are testing doesn&#8217;t use vendor specific extensions, you still have to deal with the problem of setting up, starting, and stopping the app-server containers themselves. And as you&#8217;d expect, each different app-server has a rather significantly different way of being configured and run.</p>

<p>Enter <a href="http://cargo.codehaus.org/">Cargo</a>. It&#8217;s pretty slick! They have figured out how to configure, start, stop a wide range of different containers and in some cases can control them during runtime. This is all important if you&#8217;re trying to do automated testing of a Servlet based web application, because you need to have the container running and your app deployed into it before you can start doing functional tests against it. Their API is primarily meant to be used from within Java but they&#8217;ve also made <code>ant</code> tasks and <code>maven</code> plugins.</p>

<p>There are a few examples on the Cargo website, but figuring it out took some doing. Cargo has about three different ways to do any given task &#8212; you can set something up using the fully derived strongly typed implementing classes, or you can use one of two factory methods. Presenting all of this at the same time is confusing to say the least. Cargo consists of several very steep class hierarchies with parallel naming conventions. The terms &#8220;Local&#8221;, &#8220;Remote&#8221;, &#8220;Existing&#8221;, &#8220;Standalone&#8221; are used in permutation with &#8220;Container&#8221;, &#8220;Configuration&#8221; and &#8220;Deployer&#8221;; for instance you have a <code>WebLogic8xLocalContainer</code> and a <code>Resin3xStandaloneLocalConfiguration</code>. Gets confusing when you&#8217;re trying to learn the API for the first time, and makes code assist completion really hard (typing &#8220;Tomcat&#8221; and hitting assist, you have the joy of selecting from <code>Tomcat3xLocalContainer</code>, <code>Tomcat3xStandaloneLocalConfiguration</code>, <code>Tomcat4xLocalContainer</code>, <code>Tomcat4xRemoteContainer</code>, <code>Tomcat4xStandaloneLocalConfiguration</code>, <code>Tomcat5xExistingLocalConfiguration</code>, <code>Tomcat5xLocalContainer</code>, <code>Tomcat5xStandaloneLocalConfiguration</code>, <code>TomcatCopyingLocalDeplyoer</code>, <code>TomcatLocalDeployer</code>, <code>TomcatRemoteDeployer</code>, &#8230; you get the idea. Pretty crazy). </p>

<p>The point of me looking through all this was to be able to help my client make a decision about whether to use Cargo in their build &amp; test architecture. I battled my way through their documentation and javadoc trying to duplicate a simple example. Once I started drawing some simple diagrams of the class hierarchies, though, it began to come together. I did up my notes in OpenOffice Draw to make them presentable. Here&#8217;s the result:</p>

<p>First, cargo has the notion of a &#8220;configuration&#8221; being a particular setup for an instance of a running app-server container. This takes a brief moment to grok because normally one installs WAR/EAR files directly into the appropriate directory within (under) wherever the server is installed &#8212; only since one normally only ever has one instance of a server on a given box, you don&#8217;t tend to think about it. It turns out (always something to learn) that the Servlet and Enterprise Application Server specs state that the information and data to be used in a running instance can all be bundled together and put &#8230; wherever. Cargo calls it a <code>Configuration</code>. It&#8217;s a steep hierarchy, but the important bits seem to be as follows:</p>

<p><img src="/blogs/andrew/software/build-systems/cargo_Configuration-800.png" alt="Configuration class hierarchy" title=""/></p>

<p>Once you&#8217;ve got a configuration, then you bring up the app-server container with it. Cargo&#8217;s <code>Container</code> hierarchy is like this:</p>

<p><img src="/blogs/andrew/software/build-systems/cargo_Container-800.png" alt="Container class hierarchy" title=""/></p>

<p>As you can see, there are considerable variations on the core theme - did you set it up before you ran the app-server, or after; is the app-server <code>Local</code> where Cargo can get at it, or <code>Remote</code> where it can&#8217;t&#8230;</p>

<p>Once you get the sense of it, though, it&#8217;s a very powerful tool. To accomodate the wide range of containers that Cargo does is a remarkable achievement, and watching it Do The Right Thing (tm) in each different case is remarkable.</p>

<p>AfC</p>
</div>
</content>
</entry>


</feed>
