Notes for Andy Oram and Greg Wilson, editors Beautiful Code: Leading Programmers Explain How They Think

Key concepts: Quicksort, regular expressions, streaminess, Subversion.


Related theorists: Jon Bentley, Jim Blandy, Karl Fogel, C.A.R. Hoare, Brian Kernighan, Stephen Kleene, Donald Knuth, Rob Pike, Ken Thompson.


Foreword
Greg Wilson

Students learning programming are not encouraged to study masterful code.

(xv) Architects are taught to look at buildings, and composers study one another's scores, but programmers—they look at each other's work only when there's a bug to fix: even the, they try to look at as little as possible. We tell students to use sensible variable names, introduce them to some basic design patterns, and then wonder why so much of what they write is so ugly.
(xv) In May 2006, I asked some well-known (and not so well-known) software designers to dissect and discuss the most beautiful piece of code they knew.


Preface

Experts were solicited to contribute insights about beautiful code, with proceeds of the book sales going to Amnesty International.

(xvii) Beautiful Code was conceived by Greg Wilson in 2006 as a way to elicit insights from leading software developers and computer scientists. Together, he and his co-editor, Andy Oram, approached experts with diverse backgrounds from all over the world. They received a flood of responses, partly because royalties from the book are being donated to Amnesty International.


CHAPTER ONE
A Regular Expression Matcher
Brian Kernighan

Kernighan notes regular expressions as special-purpose pattern matching language invented by Kleene, implemented by Thompson for rapid text pattern matching in one of the first software patents.

(1-2) Regular expressions are notations for describing patterns of text and, in effect, make up a special-purpose language for pattern matching. . . . Stephen Kleene invented regular expressions in the mid-1950s as a notation for finite automata; in fact, they are equivalent to finite automata in what they represent. They first appeared in a programm setting in Ken Thompson's version of the QED text editor in the mid-1960s. In 1967, Thompson applied for a patent on a mechanism for rapid text matching based on regular expressions. The patent was granted in 1971, one of the very first software patents. Regular expressions moved from QED to the Unix editor ed, and then to the quintessential Unix tool grep, which Thompson created by performing radical surgery on ed. These widely used programs helped regular expressions become familiar throughout the early Unix community.

Kernighan explains Thompson pattern match fast because it generated machine instructions on the fly and carried forward all possible matches.

(2) Thompson's original matcher was very fast because it combined two independent ideas. One was to generate machine instructions on the fly during matching so that it ran at machine speed rather than by interpretation. The other was to carry forward all possible matches at each stage, so it did not have to backtrack to look for alternative potential matches.

The Practice of Programming

Kernighan presents implementation of simple regular expression matching algorithm by Rob Pike as beautiful code exemplifying recursion and C pointers.

(3) Rob [Pike]'s implementation itself is a superb example of beautiful code: compact, elegant, efficient, and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers. Although at the time we were most interested in conveying the important role of good notation in making a program easier to use (and perhaps easier to write as well), the regular expression code has also been an excellent way to illustrate algorithms, data structures, testing, performance enhancement, and other important topics.

Implementation

Discussion

Alternatives

Bulding on It

Kernighan describes proven use of pattern matching code teaching programming, good for exploring performance, class extensions, and testing techniques.

(6) The purpose of The Practice of Programming was to teach good programming. At the time the book was written, Rob and I were still at Bell Labs, so we did not have firsthand experience of how the book would be best used in a classroom. It has been gratifying to discover that some of the material does work well in classes. I have used this code since 2000 as a vehicle for teaching important points about programming.
(6) First, it shows how recursion is useful and leads to clean code in a novel setting; it's not yet another version of Quicksort (or factorial!), nor is it some kind of tree walk.
(6-7) It's also a good example for performance experiments. . . . On the other hand, it is also a fine illustration of the importance of a good algorithm. If a pattern includes several .* sequences, the straightforward implementation requires a lot of backtracking, and , in some cases, will run very slowly indeed.
(7) Extensions to the regular expression class can form the basis of a variety of assignments.
(8) I've also used this code extensively to explore testing techniques.

Conclusion


CHAPTER TWO
Subverion's Delta Editor: Interface as Ontology
Karl Fogel

Fogel discusses beauty of Subversion delta editor interface, which is defined as a C structure.

(11) Here, with the luxury of several pages to work in, I'd like to talk about a larger sort of beauty—not necessarily the kind that would strike a passing reader immediately, but the kind that programmers experience with the problem domain. My example is not an algorithm, but an interface: the programming interface used by the open source version control system Subversion to express the difference between two directory trees, which is also the interface used to transform one tree into the other. In Subersion, its formal name is the C type svn_delta_editor_t, but it is known colloquially as the delta editor.
(12) It breaks down the problem along boundaries so natural that anyone designing a new feature for Subversion can easily tell when to call each function, and for what purpose. . . . Perhaps most important, the design has proved very resilient during enhancements and updates.

Fogel notes Subversion delta editor designed by a single very experienced person over a short time period, confirming a suspicion about good designs proposed by Brooks among others.

(12) And as if to confirm suspicions about the origins of good design, the delta editor was created by a single person over the course of a few hours (although that person was very familiar with the problem and the case base).

Version Control and Tree Transformation

Fogel explains Subversion repository based on directory versioning.

(12) As a version control system, one of Subversion's goals is to track revisions to directory structures as well as individual file contents. In fact, Subversion's server-side repository is fundamentally designed around directory versioning. A repository is simply a series of snapshots of a directory tree as that tree transforms over time.
(14) Thus each revision in the repository points to the root node of a unique tree, and the difference between that tree and the preceding one is the change that was committed in the new revision. To trace the changes, a program walks down both trees simultaneously, noting where entries point to different places.

Fogel explains that maintaining the working copy on the Subversion client side is the major design challenge.

(15) If Subversion were only a repository, this would be the end of the story. However, there's a client side, too: the working copy, which is a user's checked-out copy of some revision tree plus whatever local edits the user has made but not yet committed.

Expressing Tree Differences

Fogel recounts visit to Jim Blandy for design guidance on how to express Subversion tree differences, from which the delta editor was born.

Humorous account by Fogel of discussing Subversion design with Jim Blandy, only to be sent away so he could think.

(16) The most common action in Subversion is to transmit changes between the two sides: from the repository to the working copy when doing an update to receive others' changes, and from the working copy to the repository when committing one's own changes. Expressing the difference between two trees is also key to many other common operations—e.g., diffing, switching to a branch, merging changes from one branch to another, and so on.
(16) In frustration, I drove with another developed, Ben Collins-Sussman, from Chicago down to Bloomington, Indiana, to seek the advice of Jim
Blandy, who had invented Subversion's repository model in the first place, and who has, shall we say, strong opinions about design. Jim listened quiety as we described the various avenues we'd explored for transmitting tree differences, his expression growing grimmer and grimmer as we talked. When we reached the end of the list, he sat for a moment and then politely asked us to scram for a while so he could think. I put on my jogging shoes and went running; Ben stayed behind and read a book in another room or something. So much for collaborative development.
(16) After I returned from my run and showered, Ben and I went back into Jim's den, and he showed us what he'd come up with. It is essentially what's in Subversion today; there have been various changes over the years, but none to its fundamental structure.

The Delta Editor Interface

Fogel provides commentary on large segments of Subversion delta editor interface source code that goes for a number of pages.

(17) The interface starts with an introduction, to put a reader of the code in the right frame of mind. This text is almost unchanged since Jim Blandy wrote it in August of 2000, so the general concept has clearly weathered well. . . . Then comes a long, formal documentation comment, followed by the interface itself, which is a callback table whose invocation order is partially constrained.

But Is It Art?

Fogel argues Subversion delta editor interface is art for its use of constraints, carrying context by means of batons, and providing clear boundaries between suboperations, which he calls streaminess.

(23) The first thing that strikes one about the delta editor is that it chooses constraint: even though there is no philosophical requirement that tree edits be done in depth-first order (or indeed in any order at all), the interface enforces depth-firstness anyway, by means of the baton relationship. This makes the interface's usage and behavior more predictable.
(23) The second thing is that an entire operation unobtrusively carries its context with it, again by means of the batons.
(23) The third important feature is that the interface provides clear boundaries between the various suboperations involved in expressing a tree change. . . . These boundaries are a consequence of the interface's dedication to
streaminess, as expressed in its introductory comment: “instead of representing the tree delta explicitly, we define a standard way for a consumer to process each piece of a tree delta as soon as the producer creates it.” It would have been tempting to stream only the largest chunks (that is, the file diffs), but the delta editor interface goes the whole way and streams the entire tree delta, thus giving both producer and consumer fine-grained control over memory usage, progress reporting, and interruptibility.
(23) It was only after we began throwing the new delta editor at various problems that these features began to show their value.

Abstraction As a Spectator Sport

(25) Our next indication of the delta editor's flexibility came when we needed to do two or more distinct things in the same tree edit. One of the earliest such situations was the need to handle cancellations.
(26) As you can tell, I still miss editor composition for its sheer elegance. But in the end it was more abstraction than we needed. Much of the functionality that we initially implemented using composed editors, we later rewrote to use custom callbacks passed to the editor-creation routines.

Conclusions

To Fogel the Subversion API guides thinking by providing a model to follow, especially for future project members, providing an example of practical finesse by relieiving development community of need for certain debates.

(27) The real strength of this API, and, I suspect, of any good API, is that it guides one's thinking. All operations involving tree modifications in Subversion are now shaped roughly the same way. This not only reduces the amount of time newcomers must spend learning existing code, it gives new code a clear model to follow, and developers have taken the hint.
(28) Not only does having the right API reduce learning time, it also relieves the development community of the need to have certain debates: design discussions that would have spawned long and contentious mailing list threads simply do not come up. That may not be quite the same thing as pure technical or aesthetic beauty, but in a project with many participants and a constant turnover rate, it's a beauty you can use.


CHAPTER THREE
The Most Beautiful Code I Never Wrote
Jon Bentley

Bentley writes about runtime of classic Quicksort program to talk about beautiful code that is not there.

(29) In software, the most beautiful code, the most beautiful functions, and the most beautiful programs are sometimes not there at all.
(29) It is, of course, difficult to talk about things that aren't there. This chapter attempts this daunting task by presenting a novel analysis of the runtime of the classic Quicksort program.

The Most Beautiful Code I Ever Wrote

Bentley wrote thesis on divide-and-conquer algorithms, and admits tiptoeing around innermost loop in famous Quicksort algorithm by Hoare.

(30) I wrote my thesis on divide-and-conquer algorithms, and found that C.A.R. Hoare's Quicksort is undeniably the granddaddy of them all. It is a beautiful algorithm for a fundamental problem that can be implemented in elegant code. I loved the algorithm, but I always tiptoed around its innermost loop. I once spent two days debugging a complex program that was based on that loop, and for years I carefully copied that code whenever I needed to perform a similar task. It solved my problems, but I didn't really understand it.

More and More with Less and Less

Bentley developed an experimental approach to analyzing the Quicksort algorithm when he noted undergraduates could not get the gist of its mathematical proof.

(31) Hoare's analysis of this question is beautiful, but unfortunately over the mathematical heads of many programmers. When I taught Quicksort to undergraduates, I was frustrated that many just didn't “get” the proof, even after sincere effort. We'll now attack that problem experimentally.

Perspective

Bentley comes around to Knuth summing factor technique after experimenting with probes in the algorithm code.

(37) Knuth uses the mathematical technique of a “summing factor” to achieve the solution . . . . Thus we have smoothly progressed from experimenting on a program by augmenting it with probes to a completely mathematical analysis of its behavior.

A Bonus Analysis

What Is Writing?

Conclusion

Bentley provides a number of philosophical aphorisms to express theme of simplicity, elegance, and concision in the Quicksort algorithm study.

(39) This chapter has concentrated on the beauty conferred by simplicity, elegance, and concision. The following aphorisms all express this overarching theme.
(40) I find that one of the most useful places to practice is on small code fragments, say of just one or two dozen lines.


CHAPTER FOUR
Finding Things
Tim Bray

Bray claims search is a primary occupation for users and programmers alike.

(41)






Oram, Andy and Greg Wilson, eds. Beautiful Code: Leading Programmers Explain How They Think. Sebastopol, CA: O'Reilly, 2007. Print.