<?xml version="1.0"?>

<rss version="2.0">
  <channel>
    <title>Andrew Cowie</title>
    <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/</link>
    <description>3rd generation distributed version control systems</description>
    <language>en</language>
    <copyright>Copyright (c) 2008 Operational Dynamics Consulting Pty Ltd. All rights reserved. Not for redistribution or attribution without permission in writing.</copyright>

    <image>
      <url>http://research.operationaldynamics.com/images/andrew_Hackergotchi.png</url>
      <title>Operational Dynamics Research</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/</link>
      <width>80</width>
      <height>86</height>
    </image>

    <lastBuildDate>Sun, 05 Oct 2008 13:05:00 GMT</lastBuildDate>

    <item>
      <title>A Bazaar branch of GTK</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/bzr-branch-of-gtk.html</link>
      <description><![CDATA[<p>Converting a project from one version control system to another is always painful. Doing so is a rather momentous change to contemplate. With organizational &amp; community inertia being what it is, not something easily rushed into. Worse is when you attempt to use a modern tool as a better front end for an old one.</p>

<p>One of the neat things about the 3rd generation distributed version control system projects (ie Bazaar, Git, or Mercurial) is their plugins which allow you to continue to work with a Subversion upstream project but use the modern tool as a front end locally. For those of us who have been very productively using DVCS for some time, this is a godsend.</p>

<p>Using one of these plugins to thence evaluate the usability or performance of one of a DVCS is not really the best approach one could take; none of them were built with using Subversion as a peer in mind, and lord knows Subversion is not built to store revisions with such complex relationships. Thus the idea has turned out to be a <em>very</em> challenging problem space for everyone but most have had their tools mature to the point where Subversion is the bottleneck. And despite my misgiving that starting out your experience with a 3rd generation DVCS by first using it to access some legacy project will not show your new shiny tool in its best light, the reality remains that most people need to get on with being productive, and doing a full blown conversion (or starting use with a brand new projects) are less likely to be your circumstance.</p>

<p>Having been using <a href="http://www.bazaar-vcs.org/">Bazaar</a> (known best by the abbreviation <code>bzr</code>) for well over 18 months and overall being very happy with the experience, I would like to encourage other GNOME hackers to try it; likewise I have some patches I&#8217;d like to contribute to GTK so I thought I&#8217;d start by using <code>bzr-svn</code> to get a branch of GTK as a starting point (and to likewise give Bazaar&#8217;s Subversion interface a workout). So I created Bazaar branch of GTK.</p>

<h2>Initial import was very slow</h2>

<p>The first step was to create a Bazaar branch that represents the foreign Subversion repository. This was painful.</p>

<p>I originally attempted to do this while I was at the GTK Hackfest we had in Berlin; that turned out to be a disaster because (as one does at a conference or meeting) we were moving around every few hours connecting and disconnecting from the &#8216;net, and I was busy trying to do this from my laptop rather than a server somewhere. (Unfortunately <code>bzr-svn</code> relies on a memory fix that is not yet in the released version of Subversion, and not every distro has their Subversion patched with it). So that fell flat on its face.</p>

<p>I was able to restart the process a week or two later and leave it running overnight. Along the way I&#8217;d learned a technique for doing a few hundred revisions at a time (do <code>bzr pull -r &#036;i</code> in a loop and increment the variable by a hundred or whatever each cycle) and so was able to cope with the need to disconnect periodically. Which is well, because it took <strong>2 days</strong> to do the import.</p>

<p>While I was at first appalled by this, I really can&#8217;t blame Bazaar. The GTK codebase is, of course, rather large. First preserved commits were in 1997; there are over 15,000 revisions; a working tree (excluding VCS metadata) is 83 MB in size. The bigger problem however, is that Subversion is <em>not</em> fast at accessing historical data (the time taken to process revisions started at over 32 minutes per 100 but eventually dropped to about 9 minutes per 100 as it got to the most recent history; interesting).</p>

<p>Clearly, I could have picked an easier example for my experiments with <code>bzr-svn</code>, but it&#8217;s done now.</p>

<h2>Usage is fine once branch created</h2>

<p>I wasn&#8217;t entirely sure whether the DVCS property that branches are all peers would apply to one that had started life backed by a Subversion repository would hold, but things work just as they are supposed to. Creating a new branch to work on a feature with <code>bzr branch</code>, or use the technique of using a <code>bzr checkout</code> along with <code>bzr switch</code> to do change-in-place, both worked fine. So I&#8217;m all set.</p>

<p>The real point, though, was to encourage more GNOME hackers to give Bazaar a try. If everyone had to put up with a endless import process like mentioned above then no one would touch it. But now that the branch is created (and up to date as of yesterday or so), anyone can use it.</p>

<p>With Olav&#8217;s concurrence, I put my branch on in my GNOME directory, so you can branch from it as follows:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ bzr init-repo --rich-root-pack gtk+
$ cd gtk+/
$ bzr branch http://www.gnome.org/~afcowie/bzr/gtk+/trunk/
</pre>

<p>You do the first step so that you can create a &#8220;shared repository&#8221; (in Bazaar parlance) which will allow that various branches under that directory will share all the revision data for the project. I need to warn you, though. Modern DVCS tools give you the full history of the project, but for a large project that can be pretty big. If you grab the above branch you&#8217;ll be downloading about 180 MB and it takes about 5 minutes. <em>Yes</em> 5 minutes is a long time, but <em>no</em> there&#8217;s no way around it* and in any case 180 MB is not unreasonable for such a mature project. Don&#8217;t be too worried if it seems like it is taking a bit to do the transfer. You only need to do it once, and look on the bright side. It took me 2 days. Just let it run.</p>

<h2>Workin&#8217;</h2>

<p>Personally, I always do my work in a working branch separate from my mirror of upstream, allowing me to easily compare against the last update of upstream that I have done. So:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ bzr branch trunk working
$ cd working/
$ ./autogen.sh
$ make
</pre>

<p>and away we go. Branching takes like 3 seconds. Nice and fast, no problem.</p>

<p>I haven&#8217;t got a script running religiously updating my GTK branch; I do a <code>bzr pull</code> locally once every few days in my copy of <code>'trunk'</code> and have been pushing that up to <code>http://www.gnome.org/~afcowie/gtk+/trunk/</code>. But now that you&#8217;ve got the branch, you don&#8217;t have to rely on me anymore; this whole distributed thing kicks in and you can do it yourself. Assuming you have a <code>svn.gnome.org</code> account, then you can update with:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ cd ../trunk/
$ bzr pull --remember svn+ssh://username@svn.gnome.org/svn/gtk+/trunk
</pre>

<p>(use the <code>--remember</code> argument the first time to change where it updates from by default so you&#8217;re not relying on my branch anymore) and push with:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ bzr push svn+ssh://username@svn.gnome.org/svn/gtk+/trunk
</pre>

<p>The very first time you update from Subversion <code>bzr-svn</code> will have to build some caches and whatnot locally, but it&#8217;ll be pretty fast from there on.</p>

<p>I&#8217;m ignoring the pushing and pulling of revisions between your mirror of &#8216;<code>trunk</code>&#8217; and your local <code>'working'</code> (or whatever) branch; I&#8217;m sure you can figure that part out yourself: but here&#8217;s a hint: do your work, test it, and commit it, all in <code>'working'</code>; repeat; then use, say:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ bzr diff -r ancestor:../trunk/
</pre>

<p>to consider what your current patch looks like, then shuttle it to <code>'trunk'</code> with <code>bzr pull</code> or <code>bzr merge</code> (assuming you run it from <code>'trunk</code>&#8217; and whether you need to merge or not) and then <code>bzr push</code> to <code>svn.gnome.org</code>. Hope it works for you.</p>

<p>If you need help, poke your head into <code>#bzr</code> on FreeNode or write to their mailing list, or ping me and I&#8217;ll try and lend a hand. Make sure you&#8217;re using the latest releases; I doubt it will work otherwise and so recommend that you use <code>bzr &gt;= 1.4.0</code> and <code>bzr-svn &gt;= 0.4.10</code>. Older versions do not contain critical bug fixes. I also encourage you to grab <code>bzr-gtk &gt;= 0.93.0</code> as the <code>visualize</code> command it adds:</p>

<pre style="background: black; color: white; margin: 10px; padding: 12px;">
$ bzr viz
</pre>

<p>is really amazing.</p>

<p>Good luck!</p>

<p>AfC</p>

<p><strong>Update</strong>: <br/>
You need to have PycURL installed (it&#8217;s an optional dependency; used at runtime if detected). There is a <a href="https://bugs.launchpad.net/bugs/229076">bug</a> that will prevent you doing the branch from me if you don&#8217;t. Debian and Ubuntu package <code>python-pycurl</code>; Gentoo <code>USE=curl</code> pulls in package <code>dev-python/pycurl</code>; Fedora package is also <code>python-pycurl</code>.</p>

<div style="font-size: small">
* Interestingly, the Bazaar hackers are working their way towards a &#8220;shallow branch&#8221; capability, which will be really cool; really, we <i>don&#8217;t</i> need 15,000 revisions of history just to tweak some project; we need the last 100 or so so we can branch from it, build, fix something, commit, and then submit the resultant revision upstream. Full history is wonderful and frequently useful, but this will be hand to help reduce the data transfer barrier to casual contribution in cases where bandwidth is at a premium.
</div>

<hr/>

<p><em>This very moment as the GNOME systems administration team is presently working through the consequences of the Debian SSH key vulnerability; they have for the moment very properly <a href="http://mail.gnome.org/archives/devel-announce-list/2008-May/msg00002.html">locked down</a> all access to the services provided to GNOME hackers, and that knocks out access to the Subversion server meaning people can&#8217;t work with their source code properly. If there was ever a clearer demonstration on the inadequacy of old 1st generation centralized version control systems, I cannot think of it.</em></p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Thu, 15 May 2008 05:35:00 GMT</pubDate>
      <guid isPermaLink="false">bzr-branch-of-gtk</guid>
    </item>

    <item>
      <title>Bazaar 0.91 released!</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/bzr-use-smart-server.html</link>
      <description><![CDATA[<p><center>
<a href="http://www.bazaar-vcs.org/"><img alt="bzr logo" src="http://bazaar-vcs.org/LogoOptions?action=AttachFile&amp;do=get&amp;target=Bazaar+Logo+2006-07-27.png" border="0"/></a>
</center></p>

<p>Congratulations to the Bazaar team on the <a href="http://bazaar-vcs.org/">release</a> of <code>0.91</code>!</p>

<hr/>

<h2>Stop using dumb protocols</h2>

<p>Probably long overdue, but we set a <code>bzr serve</code> daemon running on our R&amp;D site.</p>

<p>Hackers who had write access to create their own branches there used the <code>bzr</code> server automatically since they <code>`bzr push bzr+ssh://...</code> and can also use it for <code>branch</code>ing and <code>pull</code>ing. Most other people, however, were doing an initial checkout via <code>http://</code> because that&#8217;s the URL we published. Which wasn&#8217;t very nice of us: initial checkout time for java-gnome from using the dumb <code>http://</code> protocol used to be 20 minutes (serious round-trip penalty when from Australia to Europe. Ouch).</p>

<p>Switching to <code>bzr://</code> dropped the time to clone a branch to <strong>1 minute 46 seconds</strong>&#8230; Sweet! And that&#8217;s even before the monster revision streaming and repository format performance improvements that are likely to land in the <code>0.92</code> - <code>0.93</code> time frame! Excuse me while I rush off to update our <a href="http://java-gnome.sourceforge.net/4.0/README.html"><code>README</code></a> and <a href="http://java-gnome.sourceforge.net/4.0/HACKING.html"><code>HACKING</code></a> files <code>:)</code>.</p>

<p><em>Yes, dumb protocol time was abysmal &#8212;  But <code>http</code> is a convenience for when all else fails, and has the nice side effect of allowing people to <a href="http://research.operationaldynamics.com/bzr/java-gnome/mainline/">surf</a> your public branch&#8217;s code. The problem is that people don&#8217;t necessarily get around to putting the server in place (most people could if they tried).</em></p>

<p>AfC</p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Wed, 26 Sep 2007 07:43:00 GMT</pubDate>
      <guid isPermaLink="false">bzr-use-smart-server</guid>
    </item>

    <item>
      <title>Incremental performance improvement</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/bzr-initial-commit-time.html</link>
      <description><![CDATA[<p>In passing in <code>#bzr</code> this afternoon, <a href="http://www.robertcollins.net/">Robert Collins</a> mentioned that he had dropped commit time for the example case he&#8217;s working on from 29.5 seconds to 18.8 seconds. Nice!</p>

<p>I asked him if this was towards improving the time taken to import new (large) projects into <a href="http://bazaar-vcs.org/">Bazaar</a>? He said yes, but pointed out that some of this particular bit of optimization also impacts normal incremental commits and merges elsewhere.</p>

<p>It&#8217;s nice to see smart people working on things that benefit in not just one but many places. I wondered, though, how to quantify the sort of improvement that Robert is working on. The obvious target is the time taken to do the task being optimized. The broader question of knock on effects would rely on a broader performance measurement suite. It&#8217;s not a simple matter of &#8220;just make it run faster&#8221; &#8212; you have to be concerned that fixing one thing doesn&#8217;t make other important cases perform poorly as a result. </p>

<p>That reminded me that this is one of the things that db4objects is really good at. They continuously use their &#8220;PolePosition&#8221; benchmark system to evaluate the impact of changes and [hoped for] optimizations across a wide range of scenarios (read performance, write performance, writing in highly contended environments, etc) as evidenced by <a href="http://developer.db4o.com/blogs/carl/">Carl Rosenberger</a>&#8217;s recent work on improving <a href="http://developer.db4o.com/blogs/product_news/archive/2007/09/15/embedded-client-server-performance-improvements.aspx">transactional performance</a> when <strong><code>db4o</code></strong> is being used in an embedded (single VM) environment.</p>

<p>Obviously, the cost of premature optimization is lost developer time spent working on things that don&#8217;t matter (if you&#8217;re lucky) and this risk of wrong architectural choices (which have much greater consequence). But design choices are unavoidable; you face them every day and then have to live with the consequences.</p>

<p><em>This isn&#8217;t &#8220;bad&#8221;; you make such decisions for good reasons. Some of the projects we&#8217;re doing for clients at the moment involves assessing the complexity of different architectural options and what impact those choices have on developer effectiveness, future maintainability, present performance and how hard improving it will be down the road. It&#8217;s interesting work.</em></p>

<p>This is one of the things I really respect about the team of people hacking on <strong><code>bzr</code></strong>. They took the time to map out a really sound architecture, and have worked hard on developing really robust algorithms capable of dealing with the complex corner cases that arise when doing merges and working with repositories with wildly different performance characteristics  (branching locally versus branching from a repository half way around the world; are you using the dumb <code>http://</code> protocol, or are you streaming at wire speed using the <code>bzr://</code> protocol, etc). They took some heat early on because the consequence of their focus on architecture and robustness meant that some operations were rather slow, but they have been improving by leaps and bounds as they turn their attention from internals to performance. It works, it works <em>correctly</em>, and increasingly, it works fast.</p>

<!--
Congratulations to the Bazaar hackers for having just released `bzr` 0.91!
-->

<p>AfC</p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Thu, 20 Sep 2007 07:05:00 GMT</pubDate>
      <guid isPermaLink="false">bzr-initial-commit-time</guid>
    </item>

    <item>
      <title>Git depends on RCS</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/git-uses-rcs.html</link>
      <description><![CDATA[<p>Somehow, that says it all.</p>

<p><em>Apparently Git depended on the RCS package to get at certain merge functions. This is ironic, because RCS is a primitive 1st generation version control tool (sort of) that predates CVS.</em></p>

<p>AfC</p>

<p>Update:</p>

<ul>
<li>Dave O&#8217;Neill just pointed out to me that this dependency was actually removed in January &#8216;06. I looked around further and discovered that it&#8217;s the Debian 
&#8220;etch&#8221; <a href="http://packages.debian.org/etch/git-core">package</a> of Git 1.4.x that depends on RCS. This has been superceeded in &#8220;lenny&#8221; which presents 1.5.x.</li>
</ul>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Thu, 13 Sep 2007 02:26:00 GMT</pubDate>
      <guid isPermaLink="false">git-uses-rcs</guid>
    </item>

    <item>
      <title>Creating orthogonal patches</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/bzr-orthogonal-patches.html</link>
      <description><![CDATA[<p>All open source projects encourage people to submit patches, but in order to review them you want them to be <em>orthogonal</em>, that is, nice and clean, carrying just the changes relating to one bugfix or enhancement.</p>

<p>The old way of doing things is that I, being someone with commit access to the upstream project, take the bits that I am willing to accept and commit them. But if I commit a patch for someone, it looks like I did the work. No. We don&#8217;t have to live with that inequity any longer. One of the hallmarks of the modern distributed revision control systems is that the <em>submitter</em>&#8217;s name shows up in the revision history and they get full credit and ownership of their contribution, so I am very adamant that contributors make their own commits. They they bundle up the revision or revisions that make up the contribution they want to make, and send it along.</p>

<p>So in addition to being clean and isolated we also want a contribution to actually be a revision carrying the metadata that makes up a commit &#8212; but not depend on unrelated revisions.</p>

<p>It&#8217;s this last criteria that is tricky. All of the people working on our projects are proceeding merrily on their way creating numerous or infrequent revisions as is their taste. The problem is that an individual branch history is, in essence, linear.  From a sequence A,B,C,D,E someone who wants to contribute can&#8217;t just easily pull C out and send C to the upstream project because C depends on A,B.</p>

<p>This is one definition of the <strong>cherry picking</strong> problem, that is, pulling out only the change set you want and not everything else.</p>

<p><em>The only tool that ever got this right was <a href="http://www.darcs.net/">Darcs</a>, and it has a magnificent user interface for doing so. Unfortunately Darcs doesn&#8217;t scale &#8212; add too many files and you&#8217;ll hit the <a href="http://research.operationaldynamics.com/blogs/andrew/software/version-control/red-giant-bugs.html">red giant bug</a> which does tend to put a crimp into most development styles. This is not the only definition of cherry picking, mind you, but in general the &#8220;manually pick one revision out&#8221; thing has been tricky. Since the developers of <a href="http://www.bazaar-vcs.org/">the version control tool of choice</a> know and highly respect Darcs, I took an off hand statement along the lines of &#8220;no, no Darcs-like UI to compose changesets yet&#8221; to mean that the cherry picking of revisions wasn&#8217;t there either. It turns out that this wasn&#8217;t quite correct.</em></p>

<p>So how to create orthogonal patches?</p>

<p>This has had me stuck for a while. I&#8217;ve been trying to figure out how to help new participants create clean orthogonal patch bundles, but getting those contributions up to scratch has been somewhat trying. </p>

<p>The obvious quick answer to creating orthogonal patches was:</p>

<h2>&#8220;Individual micro branches for everything&#8221;</h2>

<p>which at first makes sense; after all, in the 3rd generation distributed version control tools, branches are cheap. So every time you want to work on a new feature, just create a new branch:</p>

<pre><code>&#036; bzr clone mainline fix-the-broken-juicer
&#036; cd fix-the-broken-juicer/
</code></pre>

<p>then do the necessary work, make a commit, and finishing by creating a patch for submission:</p>

<pre><code>&#036; bzr commit
&#036; bzr bundle ../mainline &gt; fix-the-broken-juicer.patch
</code></pre>

<p>resulting in a nice clean orthogonal bundle containing just the revisions needed for that fix which you can happily email off to the maintainer.</p>

<p>But this isn&#8217;t really practical, right? The whole point is that we get on with stuff. Nobody thinks in so compartmentalized a way. By the time you even think of fixing this bug you&#8217;ve already added two new features, refactored a bunch of stuff, and improved the build system. You&#8217;ve made lots of commits already, and you depend on that build system fix to even get the project to compile! But you don&#8217;t want to send all that stuff in, not yet anyway. You&#8217;re still working on some of it. And even if it was ready to go, it&#8217;d be a monster patch highly likely to have <em>something</em> wrong with it resulting it its rejection.</p>

<p>So how to submit code?</p>

<p>On the maintainer&#8217;s side, I don&#8217;t want to lower our standards, but if I reject their contributions too many times they&#8217;ll lose their enthusiasm and leave the project. Sure, lots of give and take, usually resulting in me having to clean up their patches in the end. Initially I was ok with that; it&#8217;s a reasonable effort to helping someone get the idea of what was required (and is itself collaboration). But there is often a time lapse between them creating a bit of code and their submitting it, my reviewing it, and then getting back to them with comments, and in the intervening time they&#8217;ve no doubt raced ahead adding more revisions, making it all the harder for them to rewind and try to keep everything straight in perfect little micro branches.</p>

<p>The solution turns out to be:</p>

<h2>&#8220;You can cherry pick after all&#8221;</h2>

<p>One new contributor, Nat Pryce, wrote in saying:</p>

<blockquote>
  <p>&#8230; the only way I can think of to avoid this is to manually copy the
  files from my branch into a mainline checkout, commit, generate a
  bundle, and then discard the mainline checkout again.</p>
</blockquote>

<p>I agreed with him that this sounded horrible, and was thinking that this wouldn&#8217;t
have the desired effect of &#8220;transporting the revisions&#8221; we want to
convey.</p>

<p>It would seem, however, that we&#8217;re all missing the point.</p>

<p>The object of the exercise is to get code into the repository. <em>What the
revision graph looks like is irrelevant</em>. That part I more or less knew
already. But the more important parts are:</p>

<ul>
<li><p>it doesn&#8217;t matter which revision gets a bit of code into mainline so long as it has the contributor&#8217;s name on it; and</p></li>
<li><p>if multiple revisions have the same code fragment, <strong>THAT&#8217;S OK</strong>.</p></li>
</ul>

<p>When the two branches containing these various revisions eventually merge, they
will be MERGED and (oops, smack myself in the head) they will not
conflict because (duh) they&#8217;re they same code!</p>

<p>In other words, where we end up is that we need to create revisions X
and perhaps Z that convey code to the project maintainer out of a stream
of revisions A,B,C,D,E,F.</p>

<p>If X = B [in the java <code>equals()</code> sense], and if Z = D..E, it&#8217;s <em>no problem</em>,
because when the day comes that X and Z are merged to some branch which
happens to be the public mainline, and you later pull from that public
branch, then X and B will &#8220;contend&#8221; and merge without conflict, and
likewise Z will merge without conflict. <strong>It&#8217;s all the same text</strong>. Mission accomplished.</p>

<p>Aside: it turns out there <em>is</em> a good reason to merge X and Z now into your local working branch (ie, what will be) A,B,C,D,E,F,X,Z,G,H,I,&#8230; rather than waiting for it to turn up via the mainline branch revision stream: you&#8217;ll be giving the merge resolution algorithms a hand. Beyond the text not conflicting you&#8217;ll now already have the revision locally which will save you from pulling it over the wire later, which is actually back to what we started with in terms of trying to ship revisions around. On the other hand if you had already diverged and there was a conflict, then you&#8217;ll already have had to resolve it locally, which is good. And if upstream changed something in that code, say X,X&#8217; and Z,Z&#8217;,Z&#8221; the merge delta to be computed when you go to merge them in is small. If the result is a conflict, well that&#8217;s ok &#8212; indeed, resolving those cases are ultimately what it&#8217;s all about, but in general they won&#8217;t conflict and we&#8217;re done.</p>

<h2>Support from Bazaar</h2>

<p>So back to or quandary, that manually copying files in</p>

<blockquote>
  <p>create branch, manually copy file, commit, create bundle &amp; send</p>
</blockquote>

<p>would be ugly.</p>

<p>It turns out that Bazaar supports this with a slight twist.</p>

<p>You are still going to have to create a [momentary] branch, make a
commit, and bundle that off. But using the capabilities built into bzr&#8217;s
merge command, we can quickly create branches containing just what we
want to submit.</p>

<p>First create a pristine branch of a recent upstream:</p>

<pre><code>    &#036; ls
    mainline/ working/
    &#036; bzr clone mainline plumbing-fix
    &#036; cd plumbing-fix
</code></pre>

<p>Now either a) select a few specific revisions, perhaps:</p>

<pre><code>    &#036; bzr merge -r 292..294 ../working
    &#036; bzr diff
</code></pre>

<p>or more interestingly, b) select specific files or directories, for
example:</p>

<pre><code>    &#036; bzr merge ../working/src/bindings/org/gnome/gtk/Plumbing.java
    &#036; bzr merge --force ../working/src/bindings/some/other/file.txt
    &#036; bzr diff
</code></pre>

<p>No matter that you might have made 14 commits to get there. When you
create this branch with its single additional revision, you can make a
proper commit message. And if there&#8217;s stuff from pending work that
shouldn&#8217;t be in there, you can clean it up before you commit. But either way, using the merge command means that you do a <em>merge</em>, and any irreconcilable collisions that you&#8217;ve introduced relative to the branch you&#8217;re merging into will be dealt with as such.</p>

<p>Finally,</p>

<pre><code>    &#036; bzr commit
    &#036; bzr bundle ../mainline &gt; john-plumbing-fix.patch
</code></pre>

<p>Email the patch, and you&#8217;re done.</p>

<p>Optionally, you can of course:</p>

<pre><code>    &#036; cd ..
    &#036; rm -r plumbing-fix/
</code></pre>

<p>although you may want to keep the branch around in case any further changes need to be made; should that be necessary you can then just commit the fix-ups in the existing branch. Or you can create a new patch from scratch. Or you can always just reapply your bundle and recreate the branch. Lots of choices.</p>

<p>Eventually your patch will be merged to mainline, and when subsequently
the revision that carried it is pulled down from mainline to your local
copy of it and you are merging mainline into your primary working
branch, there won&#8217;t be a conflict because that code was in your working
branch all along <em>(unless you&#8217;ve further changed something, of course; but that&#8217;s why it is advantageous to merge your contributed branch|patch back into your working branch(es) early; then its not just the code text that was once
common, but now there are recent revision(s) for common ancestors, and the
merge engine will slurp it right up)</em>.</p>

<h2>Conclusion</h2>

<p>This dramatically improves matters, I think. Now I don&#8217;t have to feel
so bad about rejecting patches that aren&#8217;t quite there yet. When I do
so I know I&#8217;m not inflicting as much pain as I was worried I was.</p>

<p>Contributors meanwhile can create all the private revisions they ever wanted, and don&#8217;t have to forecast which are going to be publicly visible and which
aren&#8217;t. So they can leave off worrying about a proper commit messages, etc,
until they actually go to create and submit an orthogonal patch.</p>

<p>But best of all, it restores the flexibility for people to do whatever they want. &#8220;No one can tell you no&#8221;; you might not get your change into upstream, but no one can keep it out of <em>your</em> code. And that&#8217;s what software freedom is all about.</p>

<p>AfC</p>

<p><em>Thanks to Robert Collins for having reviewed and commented on this essay.</em></p>

<hr/>

<p>Notes:</p>

<ol>
<li>The <code>bzr merge --force filename</code> one-at-a time thing is a bit cumbersome. I had a word with some of the Bazaar hackers about it, and they&#8217;re going to see what they can do. Don&#8217;t forget there&#8217;s also the <code>bzr merge -r A..B</code> form.</li>
<li>Actually, the whole process is still a bit cumbersome. Someday, perhaps, we&#8217;ll have a UI to automate much of the changeset selection and bundle creation. Until then, though, this works, and it accomplishes the aim, which is facilitating contribution. That&#8217;s important to keep in mind.</li>
<li>Most of this discussion is characteristic of the 2nd and 3rd generation distributed revision control tools as a class, although obviously command syntax will differ. The key to all this holding up at scale, however, is ultimately the quality of the merge algorithms in the tool you&#8217;re using and its ability to deal with corner cases.</li>
<li>At the end of it all, though, if you <em>can</em> create a new branch and just do the work there in an isolated fashion resulting in an orthogonal patch, that <em>is</em> easier. <code>:)</code> It depends on the nature of the project you&#8217;re working on, the size of the change you&#8217;re contemplating contributing, and they way you work. In our case, the insane refactoring power of an IDE like Eclipse tends to mean that we get a lot of unrelated changes in flight at the same time, thus exacerbating the original question of creating orthogonal patches.</li>
</ol>

<p><strong>Updates</strong>:</p>

<ol>
<li>As of Bazaar 0.90, there will be a <code>bzr send -r</code> command which, when supplied with a revision, will create an orthogonal patch in a single step!</li>
</ol>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Tue, 14 Aug 2007 10:22:00 GMT</pubDate>
      <guid isPermaLink="false">bzr-orthogonal-patches</guid>
    </item>

    <item>
      <title>Version control that doesn&#8217;t make your eyes bleed</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/git-is-like-cvs.html</link>
      <description><![CDATA[<p>At the GUADEC closing, <a href="http://primates.ximian.com/~federico/news.html">Federico</a> remarked:</p>

<blockquote>
  <p>trying to learn <code>git</code> felt just like trying to learn <code>cvs</code> the first time 10 years ago felt &#8212; why is this so hard?</p>
</blockquote>

<p>That was <em>exactly</em> how I felt about <code>git</code> for the 4-5 months I spent dogfooding it &#8212; despite my respect for the underlying notions in Git, I was constantly perplexed at why basic things seemed so complicated &#8212; and in the back of my mind was &#8220;I&#8217;ve been at this for a while; how on earth is someone new to Linux supposed to learn this?&#8221;</p>

<p>And then I tried <code>bzr</code>!</p>

<p><center>
<a href="http://www.bazaar-vcs.org/"><img alt="bzr logo" src="http://bazaar-vcs.org/LogoOptions?action=AttachFile&amp;do=get&amp;target=Bazaar+Logo+2006-07-27.png" border="0"/></a>
</center></p>

<p>We chose <code>bzr</code> (Bazaar-NG, as it was briefly known) as the VCS for the new Java bindings for GNOME for Bazaar&#8217;s relative straight-forwardness and because of our faith in the ethos and extreme competence of its developers. Anyone experienced with the old world 1st generation centralized VCS tools like CVS or Subversion will be able to make sense of it and you can learn from there. Bazaar is constantly improving in performance terms, has a vibrant developer community, is widely portable, and most of all the fact that they actually follow test-driven-development practices (unit test suite has over 7266 tests) to keep them honest biases in their favour.</p>

<p>In production use for the last year, we have found Bazaar to be reliable, amazingly easy for newcomers to the Open Source world to learn, and a big contributor to our goal of reducing barriers to entry. One of my most treasured emails this year has been when someone wrote me saying:</p>

<blockquote>
  <p>Nice <a href="http://java-gnome.sourceforge.net/4.0/HACKING.html"><code>HACKING</code></a> instructions, really easy to follow, even though I never
  used <code>bzr</code> before. Attached are 2 small fixes for the configure script&#8230; builds on OpenSolaris now.</p>
</blockquote>

<p>Newcomer to contributed patch in less than an hour. Wow.</p>

<p>Some people have argued that they should wait to see which 3rd generation decentralized version control system will &#8220;win&#8221; before migrating away from <code>svn</code> or god forbid <code>cvs</code>. This is, in my view, foolish &#8212; none of Bazaar, Git or Mercurial are going to go away; the determining factor, rather, is usability &#8212; and that makes the choice an easy one.</p>

<p>Congratulations to the Bazaar team on the release of <code>0.18</code>!</p>

<p>AfC</p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Fri, 20 Jul 2007 15:56:00 GMT</pubDate>
      <guid isPermaLink="false">git-is-like-cvs</guid>
    </item>

    <item>
      <title>Pushing a bzr branch with rsync</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/bzr-repository-rsync.html</link>
      <description><![CDATA[<p>For some reason <code>bzr rspush</code>, a plugin from the &#8220;bzrtools&#8221; unstable command collection, doesn&#8217;t know how to push a branch that&#8217;s in a <a href="http://bazaar-vcs.org/">Bazaar</a> repository.</p>

<h2>Repository layouts</h2>

<p>What&#8217;s a Bazaar repository? Good question. It&#8217;s <em>not</em> what you get if you do <code>bzr init</code>. You have to do <code>bzr init-repo --trees .</code> in a directory, then within <em>that</em> directory create and copy branches. Apparently this then allows the branches to share revision delta files in common. </p>

<p><em>Why oh why can&#8217;t they just be immutable like Git does and thus hard-linkable?</em></p>

<p>Anyway, I&#8217;ve ended up with a directory structure as follows. There&#8217;s a top level Bazaar repository here:</p>

<pre><code>~/src/andrew/java-gnome
</code></pre>

<p>The <strong><code>bzr</code></strong> data actually lives in, as you might expect,</p>

<pre><code>~/src/andrew/java-gnome/.bzr
</code></pre>

<p>But that repository is <strong>not</strong> a branch and <strong>not</strong> a working directory. In the repository directory, I create branches and clone them:</p>

<pre><code>~/src/andrew/java-gnome/codegen
~/src/andrew/java-gnome/equivalence-fixes
~/src/andrew/java-gnome/mainline
~/src/andrew/java-gnome/primary
~/src/andrew/java-gnome/website
</code></pre>

<p>And so on. Each one <strong>also</strong> has a <code>.bzr</code> directory:</p>

<pre><code>~/src/andrew/java-gnome/mainline/.bzr
~/src/andrew/java-gnome/website/.bzr
</code></pre>

<p>Apparently the deal is that if the actual revision data isn&#8217;t in the proximate branch&#8217;s <code>.bzr</code>, it&#8217;ll look up one to see if there&#8217;s a <code>.bzr</code> there (and finding itself in a repository, there is, and so the data it needs is in <code>../.bzr/repository</code>).</p>

<p>Since I used the <code>--trees</code> option to <code>bzr init-repository</code>, I get working copies in each of those directories. I can then point a symlink to it from the place where my Eclipse IDE expects my working directory to be,</p>

<pre><code>~/workspace/BindingsPrototype -&gt; ~/src/andrew/java-gnome/primary
</code></pre>

<p>and finally I can get to work. [You go to a lot of effort to configure Eclipse to work on a given Java project in a specific place. Switching the basedir of that path from one place to another is a non-starter, except by doing a full refactoring. So, the symlink]</p>

<p><em>I&#8217;m liking <strong><code>bzr</code></strong> well enough, but doesn&#8217;t this strike you as a bit complicated? Granted <strong><code>git</code></strong> is a usability disaster zone, but the core concepts of repository storage and branching seem really rock solid. I <strong>really</strong> miss the create-a-branch-in-place and switch-between-branches-in-place aspects of Git. Those were killer features for me. Bazaar has a evolved in a few too many parallel directions, and they need a bit of good old GNOME 2.0 style refactoring to get out some of the cruft. Still, though, they have unit tests. That&#8217;s worth a lot in my book.</em></p>

<h2>Pushing the wall</h2>

<p>So now its time to publish your work. My experience trying to <code>bzr push sftp://</code> to a server half way around the world was that the sftp implementation is brutally slow. Using their &#8220;Smart Server&#8221; with a <code>bzr push bzr+ssh://</code> command was not much better.</p>

<p>Obviously this sort of thing is crying out for a little <strong><code>rsync</code></strong> lovin&#8217;, but unfortunately the <code>bzr rspush</code> command which supposedly does just that doesn&#8217;t know how to push a branch that&#8217;s in a Bazaar repository. <em>Oh, great.</em> So I had to do it manually.</p>

<p>I have a script that does a bunch of switching to figure what I want to send (source, website, photos, whatever) to where. For source code, I use it something like this:</p>

<pre><code>upload source &lt;project&gt; &lt;branch&gt;
</code></pre>

<p>You get the idea. The trick is to figure out how to tell rsync to include the repository <code>.bzr</code>, the branch directory (and its `.bzr), and exclude everything else. I could have just done two separate SOURCEs, but I like to see all the traffic reported relative to a single root, so I use rsync&#8217;s include/exclude syntax instead. That stuff is notoriously voodoo; this time was no exception.</p>

<p>Here&#8217;s the magic smoke I used to get <strong><code>rsync</code></strong> to push a Bazaar branch within a Bazaar repository:</p>

<pre><code>    PROJECT="&#036;2"
    BRANCH="&#036;3"

    # note trailing slash
    SOURCE="/home/andrew/src/andrew/&#036;PROJECT/"
    DEST="andrew@centauri.lhr.operationaldynamics.com:/export/web/com/operationaldynamics/research/bzr/&#036;PROJECT"

    # for some reason the escaping necessary eluded me; workaround:
    s='*'

    # now go to the trouble of including each directory in the chain
    # (otherwise exclude * nukes them) and the three things we care
    # about in &lt;project&gt;: /.bzr, /&lt;branch&gt;, and /.bzr

    OPTIONS="--include=/
        --include=/.bzr
        --include=/&#036;BRANCH
        --include=/.bzr/&#036;s&#036;s
        --include=/&#036;BRANCH/&#036;s&#036;s
        --exclude=/&#036;BRANCH/tmp
        --exclude=/&#036;BRANCH/.config
        --exclude=&#036;s"

    shift 3
</code></pre>

<p>&#8230;</p>

<pre><code>    exec nice rsync \
        --verbose \
        --recursive \
        --update \
        --links \
        --hard-links \
        --perms \
        --times \
        --sparse \
        -e /usr/bin/ssh \
        --partial \
        --progress \
        --compress \
        &#036;OPTIONS \
        &#036;* \
        &#036;SOURCE &#036;DEST
</code></pre>

<p>So with a nice simple:</p>

<pre><code>upload source java-gnome mainline
</code></pre>

<p>I publish (only) the &#8220;<code>mainline</code>&#8221; branch in my <code>~/src/andrew/java-gnome</code> repository.</p>

<p>Hopefully that will be of help to someone; I can&#8217;t be the only person who has found themselves wanting to publish a Bazaar branch from within a project repository. What would be great, though, would be a nice and simple <code>bzr push ssh://</code> that just assumes rsync-over-ssh and does all this for you. Shouldn&#8217;t be too hard for someone to hack in.</p>

<p>AfC</p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Sun, 10 Dec 2006 14:12:00 GMT</pubDate>
      <guid isPermaLink="false">bzr-repository-rsync</guid>
    </item>

    <item>
      <title>Gotta hate those Red Giant bugs</title>
      <link>http://research.operationaldynamics.com/blogs/andrew/software/version-control/red-giant-bugs.html</link>
      <description><![CDATA[<p>In a recent thread on the <a href="http://www.abridgegame.org/pipermail/darcs-users/2005-June/007829.html">darcs-user</a>
mailing list, this response from David
Roundy (the original author of <code>darcs</code>) came along:</p>

<blockquote>
  <p>What you&#8217;ve run into is the infamous O(2^N) behavior of darcs when it
  encounters certain sorts of conflicts.  The code should eventually complete,
  but it&#8217;s possible that our sun will become a red giant before that happens
  (which would most likely cause darcs to fail).  :(</p>
</blockquote>

<p>I have to admit that the red giant date is indeed a bit beyond my planning
horizon.  Of course, I can chuckle at this since <em>I</em> haven&#8217;t been burned by
this bug, whatever it is :)</p>

<p><em><a href="http://www.darcs.net/">Darcs</a> is a 2nd generation distributed version control tool. I&#8217;d been
rather happy with it for a few months. It&#8217;s surprisingly easy to use, very
powerful, has brilliantly straight forward cherry picking of patches, and seems
to Just Work (tm) in all the use cases that I have for it. The way Darcs
facilitates casual users to quickly create and submit patches is awesome.</em></p>

<p>AfC</p>

<p><strong>Update</strong>: Spoke too soon; hit what appeared to be the Red Giant bug in Jan
2006. Decided not to wait around to find out and stopped using Darcs for new
work. Switched to evaluating 3rd generation tools. Tried <a href="http://git.or.cz/">Git</a> for six
months. Hated it &#8212; its user interface is worse that GPG&#8217;s if you can believe
such a thing is even possible. Tried <a href="http://www.bazaar-vcs.org/">Bazaar</a> aka <code>bzr</code>. Great usability,
Just Works on incredibly complex corner cases. Awesome developer and user
community. Using it for all projects as of Nov 2006.</p>
]]></description>
      <author>andrew@operationaldynamics.com (Andrew Cowie)</author>
      <category>/andrew/software/version-control</category>
      <pubDate>Fri, 29 Jul 2005 02:58:00 GMT</pubDate>
      <guid isPermaLink="false">red-giant-bugs</guid>
    </item>


  </channel>
</rss>
