Notes for Andy Oram and Greg Wilson, editors Beautiful Code: Leading Programmers Explain How They Think
Key concepts: Quicksort, regular expressions, streaminess, Subversion.
Related theorists: Jon Bentley, Jim Blandy, Karl Fogel, C.A.R. Hoare, Brian Kernighan, Stephen Kleene, Donald Knuth, Rob Pike, Ken Thompson.
Students learning programming are not encouraged to study masterful code.
are taught to look at buildings, and composers study one another's
scores, but programmers—they look at each other's work only when
there's a bug to fix: even the, they try to look at as little as
possible. We tell students to use sensible variable names, introduce
them to some basic design patterns, and then wonder why so much of
what they write is so ugly.
(xv) In May 2006, I asked some well-known (and not so well-known) software designers to dissect and discuss the most beautiful piece of code they knew.
Experts were solicited to contribute insights about beautiful code, with proceeds of the book sales going to Amnesty International.
(xvii) Beautiful Code was conceived by Greg Wilson in 2006 as a way to elicit insights from leading software developers and computer scientists. Together, he and his co-editor, Andy Oram, approached experts with diverse backgrounds from all over the world. They received a flood of responses, partly because royalties from the book are being donated to Amnesty International.
A Regular Expression Matcher
Kernighan notes regular expressions as special-purpose pattern matching language invented by Kleene, implemented by Thompson for rapid text pattern matching in one of the first software patents.
(1-2) Regular expressions are notations for describing patterns of text and, in effect, make up a special-purpose language for pattern matching. . . . Stephen Kleene invented regular expressions in the mid-1950s as a notation for finite automata; in fact, they are equivalent to finite automata in what they represent. They first appeared in a programm setting in Ken Thompson's version of the QED text editor in the mid-1960s. In 1967, Thompson applied for a patent on a mechanism for rapid text matching based on regular expressions. The patent was granted in 1971, one of the very first software patents. Regular expressions moved from QED to the Unix editor ed, and then to the quintessential Unix tool grep, which Thompson created by performing radical surgery on ed. These widely used programs helped regular expressions become familiar throughout the early Unix community.
Kernighan explains Thompson pattern match fast because it generated machine instructions on the fly and carried forward all possible matches.
(2) Thompson's original matcher was very fast because it combined two independent ideas. One was to generate machine instructions on the fly during matching so that it ran at machine speed rather than by interpretation. The other was to carry forward all possible matches at each stage, so it did not have to backtrack to look for alternative potential matches.
The Practice of Programming
Kernighan presents implementation of simple regular expression matching algorithm by Rob Pike as beautiful code exemplifying recursion and C pointers.
(3) Rob [Pike]'s implementation itself is a superb example of beautiful code: compact, elegant, efficient, and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers. Although at the time we were most interested in conveying the important role of good notation in making a program easier to use (and perhaps easier to write as well), the regular expression code has also been an excellent way to illustrate algorithms, data structures, testing, performance enhancement, and other important topics.
Bulding on It
Kernighan describes proven use of pattern matching code teaching programming, good for exploring performance, class extensions, and testing techniques.
The purpose of The
Practice of Programming
to teach good programming. At the time the book was written, Rob and
I were still at Bell Labs, so we did not have firsthand experience of
how the book would be best used in a classroom. It has been
gratifying to discover that some of the material does work well in
classes. I have used this code since 2000 as a vehicle for teaching
important points about programming.
(6) First, it shows how recursion is useful and leads to clean code in a novel setting; it's not yet another version of Quicksort (or factorial!), nor is it some kind of tree walk.
(6-7) It's also a good example for performance experiments. . . . On the other hand, it is also a fine illustration of the importance of a good algorithm. If a pattern includes several .* sequences, the straightforward implementation requires a lot of backtracking, and , in some cases, will run very slowly indeed.
(7) Extensions to the regular expression class can form the basis of a variety of assignments.
(8) I've also used this code extensively to explore testing techniques.
Subverion's Delta Editor: Interface as Ontology
Fogel discusses beauty of Subversion delta editor interface, which is defined as a C structure.
Here, with the luxury of several pages to work in, I'd like to talk
about a larger sort of beauty—not necessarily the kind that would
strike a passing reader immediately, but the kind that programmers
experience with the problem domain. My example is not an algorithm,
but an interface: the programming interface used by the open source
version control system Subversion to express the difference between
two directory trees, which is also the interface used to transform
one tree into the other. In Subersion, its formal name is the C type
svn_delta_editor_t, but it is known colloquially as the delta
(12) It breaks down the problem along boundaries so natural that anyone designing a new feature for Subversion can easily tell when to call each function, and for what purpose. . . . Perhaps most important, the design has proved very resilient during enhancements and updates.
Fogel notes Subversion delta editor designed by a single very experienced person over a short time period, confirming a suspicion about good designs proposed by Brooks among others.
(12) And as if to confirm suspicions about the origins of good design, the delta editor was created by a single person over the course of a few hours (although that person was very familiar with the problem and the case base).
Version Control and Tree Transformation
Fogel explains Subversion repository based on directory versioning.
(12) As a version control system, one of Subversion's goals is to
track revisions to directory structures as well as individual file
contents. In fact, Subversion's server-side repository is
fundamentally designed around directory versioning. A repository is
simply a series of snapshots of a directory tree as that tree
transforms over time.
(14) Thus each revision in the repository points to the root node of a unique tree, and the difference between that tree and the preceding one is the change that was committed in the new revision. To trace the changes, a program walks down both trees simultaneously, noting where entries point to different places.
Fogel explains that maintaining the working copy on the Subversion client side is the major design challenge.
(15) If Subversion were only a repository, this would be the end of the story. However, there's a client side, too: the working copy, which is a user's checked-out copy of some revision tree plus whatever local edits the user has made but not yet committed.
Expressing Tree Differences
Fogel recounts visit to Jim Blandy for design guidance on how to express Subversion tree differences, from which the delta editor was born.
Humorous account by Fogel of discussing Subversion design with Jim Blandy, only to be sent away so he could think.
The most common action in Subversion is to transmit changes between
the two sides: from the repository to the working copy when doing an
update to receive others' changes, and from the working copy to the
repository when committing one's own changes. Expressing the
difference between two trees is also key to many other common
operations—e.g., diffing, switching to a branch, merging changes
from one branch to another, and so on.
(16) In frustration, I drove with another developed, Ben Collins-Sussman, from Chicago down to Bloomington, Indiana, to seek the advice of Jim Blandy, who had invented Subversion's repository model in the first place, and who has, shall we say, strong opinions about design. Jim listened quiety as we described the various avenues we'd explored for transmitting tree differences, his expression growing grimmer and grimmer as we talked. When we reached the end of the list, he sat for a moment and then politely asked us to scram for a while so he could think. I put on my jogging shoes and went running; Ben stayed behind and read a book in another room or something. So much for collaborative development.
(16) After I returned from my run and showered, Ben and I went back into Jim's den, and he showed us what he'd come up with. It is essentially what's in Subversion today; there have been various changes over the years, but none to its fundamental structure.
The Delta Editor Interface
Fogel provides commentary on large segments of Subversion delta editor interface source code that goes for a number of pages.
(17) The interface starts with an introduction, to put a reader of the code in the right frame of mind. This text is almost unchanged since Jim Blandy wrote it in August of 2000, so the general concept has clearly weathered well. . . . Then comes a long, formal documentation comment, followed by the interface itself, which is a callback table whose invocation order is partially constrained.
But Is It Art?
Fogel argues Subversion delta editor interface is art for its use of constraints, carrying context by means of batons, and providing clear boundaries between suboperations, which he calls streaminess.
(23) The first thing that strikes one
about the delta editor is that it chooses
even though there is no philosophical requirement that tree edits be
done in depth-first order (or indeed in any order at all), the
interface enforces depth-firstness anyway, by means of the baton
relationship. This makes the interface's usage and behavior more
(23) The second thing is that an entire operation unobtrusively carries its context with it, again by means of the batons.
(23) The third important feature is that the interface provides clear boundaries between the various suboperations involved in expressing a tree change. . . . These boundaries are a consequence of the interface's dedication to streaminess, as expressed in its introductory comment: “instead of representing the tree delta explicitly, we define a standard way for a consumer to process each piece of a tree delta as soon as the producer creates it.” It would have been tempting to stream only the largest chunks (that is, the file diffs), but the delta editor interface goes the whole way and streams the entire tree delta, thus giving both producer and consumer fine-grained control over memory usage, progress reporting, and interruptibility.
(23) It was only after we began throwing the new delta editor at various problems that these features began to show their value.
Abstraction As a Spectator Sport
(25) Our next
indication of the delta editor's flexibility came when we needed to
do two or more distinct things in the same tree edit. One of the
earliest such situations was the need to handle cancellations.
(26) As you can tell, I still miss editor composition for its sheer elegance. But in the end it was more abstraction than we needed. Much of the functionality that we initially implemented using composed editors, we later rewrote to use custom callbacks passed to the editor-creation routines.
To Fogel the Subversion API guides thinking by providing a model to follow, especially for future project members, providing an example of practical finesse by relieiving development community of need for certain debates.
(27) The real strength of this API,
and, I suspect, of any good API, is that it guides one's thinking.
All operations involving tree modifications in Subversion are now
shaped roughly the same way. This not only reduces the amount of time
newcomers must spend learning existing code, it gives new code a
clear model to follow, and developers have taken the hint.
(28) Not only does having the right API reduce learning time, it also relieves the development community of the need to have certain debates: design discussions that would have spawned long and contentious mailing list threads simply do not come up. That may not be quite the same thing as pure technical or aesthetic beauty, but in a project with many participants and a constant turnover rate, it's a beauty you can use.
The Most Beautiful Code I Never Wrote
Bentley writes about runtime of classic Quicksort program to talk about beautiful code that is not there.
(29) In software,
the most beautiful code, the most beautiful functions, and the most
beautiful programs are sometimes not there at all.
(29) It is, of course, difficult to talk about things that aren't there. This chapter attempts this daunting task by presenting a novel analysis of the runtime of the classic Quicksort program.
The Most Beautiful Code I Ever Wrote
Bentley wrote thesis on divide-and-conquer algorithms, and admits tiptoeing around innermost loop in famous Quicksort algorithm by Hoare.
(30) I wrote my thesis on divide-and-conquer algorithms, and found that C.A.R. Hoare's Quicksort is undeniably the granddaddy of them all. It is a beautiful algorithm for a fundamental problem that can be implemented in elegant code. I loved the algorithm, but I always tiptoed around its innermost loop. I once spent two days debugging a complex program that was based on that loop, and for years I carefully copied that code whenever I needed to perform a similar task. It solved my problems, but I didn't really understand it.
More and More with Less and Less
Bentley developed an experimental approach to analyzing the Quicksort algorithm when he noted undergraduates could not get the gist of its mathematical proof.
(31) Hoare's analysis of this question is beautiful, but unfortunately over the mathematical heads of many programmers. When I taught Quicksort to undergraduates, I was frustrated that many just didn't “get” the proof, even after sincere effort. We'll now attack that problem experimentally.
Bentley comes around to Knuth summing factor technique after experimenting with probes in the algorithm code.
(37) Knuth uses the mathematical technique of a “summing factor” to achieve the solution . . . . Thus we have smoothly progressed from experimenting on a program by augmenting it with probes to a completely mathematical analysis of its behavior.
A Bonus Analysis
What Is Writing?
Bentley provides a number of philosophical aphorisms to express theme of simplicity, elegance, and concision in the Quicksort algorithm study.
(39) This chapter has concentrated on the beauty conferred by
simplicity, elegance, and concision. The following aphorisms all
express this overarching theme.
(40) I find that one of the most useful places to practice is on small code fragments, say of just one or two dozen lines.
Bray claims search is a primary occupation for users and programmers alike.
Oram, Andy and Greg Wilson, eds. Beautiful Code: Leading Programmers Explain How They Think. Sebastopol, CA: O'Reilly, 2007. Print.