Notes for Lou Burnard, Katherine O'Brien O'Keefe, and John Unsworth editors Electronic Textual Editing

Key concepts: autopoietic functions, concurrency problem, critical versus non-critical editing, diplomatic transcription, documentary editing, double-editing, linkeme, multiple hierarchies, normalized transcription, readerly discovery, SGML, text, TEI, transclusive flexibility.

Claim in foreword is computer as tool does not fundamentally alter reading or subjectivity. Markup as highly reflexive act, oscillating indeterminacy like self-organizing systems. In line with videogame studies, electronic literature. Many contributors emphasize using non-proprietary formats and open policies for scholarly editions comparable to GPL, and existential imperative to build devices (McGann's theory-as-poiesis). Compare repeatability of analysis to software quality assurance methodologies. Researching social textualities via user logs dovetails nicely with software studies and projects for future digital humanities scholars. Sizable examples of working code to illustrate “the devils bargain with HTML.” What is a text? Ontology matters for noncritical operations, such as transcription, especially if it turns out to be non-nesting, non-hierarchical. Problem of multiple hierarchies as the major challenge to text encoding. I develop comparison to software revision control systems. RCS commits as speech acts? Still problem of non-nesting information of BNF-style grammars to keep theorists busy. Encoding becomes a form of noncritical close reading. Discussion of copyright and contracts has implications for transmedia events such as sounds in virtual realities that are generated from copyrighted text via text to speech synthesis. Division of intellectual capital that externalizes program and interface dissolves with inclusion of digitally native, electronic literature as the subject of scholarly editions. CCS link: cultural bias in encoding recommends ASCII entity references over direct Unicode.

Related theorists: Kirschenbaum, Landow, Lessig, Manovich, McGann, Tanselle.

(1) Of course, whether e-books are as convenient to handle and use as the codices that preceded them is not an irrelevant matter, but convenience is not entirely a function of developing technology, for what one becomes accustomed to is a fundamental factor.
(2) An increasingly populated field of literary study has made scholars acutely aware of what psychologists, in their different way, have long understood: people may think that “any book” will do for reading a given work, but all the details of graphic design, which are likely to vary from printing to printing (and from e-book to e-book), do affect readers' responses.

Tanselle in Foreword argues computer as tool does not fundamentally alter reading or subjectivity, whereas Manovich, Hayles and others disagree; seems to not consider digitally native electronic texts, only electronic versions of texts originally composed with prior media forms.

(3) But when the excitement leads to the idea that the computer alters the ontology of texts and makes possible new kinds of reading and analysis, it has gone too far. The computer is a tool, and tools are facilitators; they may create strong breaks with the past in the methods for doing things, but they are at the service of an overriding continuity, for they do not change the issues that we have to cope with.
(3) When we create and use an electronic text, we still have to ponder the mode of existence of the linguistic medium; we still have to think about the relations among mental, audible, and visible texts; we still have to consider whether it its meaningful to pursue authorially intended texts or whether the documentary texts that survive from the past (perhaps purged of their obvious errors) are the only ones we should study; we still have to decide how to present the results of our textual research to other readers.
(4) The idea that electronic texts encourage a new kind of reading has also been overstated. . . . Such aids to radial reading can be well or poorly constructed whether the means of presentation is printed or electronic.
(4) Surely the richest kind of reading depends on having at hand, for constant reference, the information that textual scholars have amassed. The real issue is how best to provide guidance for readers.
(5) It is a distinct advantage, of course, for readers to be able to choose the points of entry they wish to use; but to engage in radial reading effectively, they need the editor's assistance in the form of comments on the textual history of the work, organized records of variants in the relevant documentary texts, and the like. They also need editorially emended texts in order to see how the mass of evidence has been used in reading by scholars who have made themselves expert in the textual history of the work. . . . This point shows, once again, that the advantages of the electronic form are maximized when one recognizes that the technology contributes to the process of building on past accomplishments.
(6) Whether or not we wish to claim an ontological distinction between ink and pixels, the concept of text has obviously shifted its meaning between the two sentences. In the first, a text is a physical thing, an arrangement of words and punctuation in a particular visible form. In the second, it is the sequence of words and punctuation itself, an abstraction that can be given any number of concrete renderings (in the same medium or different media). . . . The philosophical conundrum as to where a text resides is exactly the same as it always was.

What is a text? Where does it reside? No room for new media practices? See comment in last paragraph of introduction.

(6) We will be spared some drudgery and inconvenience, but we still must confront the same issues that editors have struggled with for twenty-five hundred years.


(9) The Text Encoding Initiative's
Guidelines for Electronic Text Encoding and Interchange was first published in April 1994 as two substantial green volumes known as TEI P3 (P for “public”). In 2001, the Text Encoding Initiative was reestablished as a membership consortium, jointly hosted by two United States and two European universities. Its first act was to sponsor a revision of the TEI guidelines. This edition, known as TEI P4, was a maintenance release, bring the guidelines up-to-date with changes in the technical infrastructure—most notably the use of the W3C's extensible markup language (XML) as its means of expression rather than the ISO standard, SGML, used by earlier editions. TEI P4 was published in 2002 under the imprint of the University of Virginia Press and forms the current reference standard.
(9) A first release of TEI P5 appeared in January 2005; see for its current status.
(10) In January 2005, the preparation of TEI P5 moved to a new level with the decision to make the source text of the new edition of the guidelines available under an open-source license. For this new edition, it was decided to replace the SGML/XML DTD version of the TEI scheme with a version that could be expressed in any of the three schema languages now in wide use: DTD, W3C Schema, and RelaxNG. The current state of this work is now accessible to all from the TEI's repository at

(11) Ever since the invention of the codex, the long and distinguished history of textual editing has been intimately involved in the physique of the book. . . . The scholarly debates over what sort of editions to produce—whether favoring the textual object, the author of the text, or the text's reception history—were driven as much by economics as by ideology. Quite simply, one could not have it all.
(11-12) The rapid spread of computing facilities and developments in digital technology in the 1980s and 1990s offered the possibility of circumventing a number of practical (both physical and economic) limitations posed by the modern printed codex.
(12) Coincident with the spread of computing facilities, and with their adoption as the basic means of communication among academics at all levels, has been an extraordinary democratization in the production of textual editions. . . . The democratization of publishing through access to the Internet has not brought with it, however, a concomitant broadening in the reliability of such editions. . . . The challenge is to make available to prospective editors—either to those approaching the task for the first time or to seasoned veterans of print—the kinds of information they must have to engage with electronic textual editing at the level of needed knowledge, conceptual and practical.


(14-15) An international and interdisciplinary standards project, the Text Encoding Initiative (TEI) was established in 1987 to develop, maintain, and promulgate hardware- and software-independent methods for encoding humanities data in electronic form.
(15) The TEI guidelines, like the CSE guidelines, outline a set of best practices, but they also embody them in a formal and computable expression, originally constructed using standard generalized markup language (SGML) and since the fourth revision of the guidelines (published in 2002) expressible in extensible markup language (XML) as well (Sperberg-McQueen and Burnard,
Guidelines for Electronic Text Encoding).
(15) The TEI guidelines today take the form of a 1,300-page reference manual, documenting and defining some six hundred elements that can be combined and modified in a variety of ways for particular purposes. Each such combination can be expressed formally as a kind of document grammar, technically known as a document type definition (DTD).

(16) There is work for a generation or more of textual editors in the transmission of our cultural heritage from print to electronic media, but if that work is to be done, then a rising generation of scholars must receive professional credit for doing it. For that credit to be given, tenure and promotion committees will need to evaluate work of this kind. . . . The need for such a volume is immediate: there are currently few manuals, summer courses, of self-guided tutorials that would help even trained textual editors transfer their skills from print to electronic works.

MLA CSE guidelines a goldmine of work for a future generation of humanities scholars.

(17) First in the volume, we provide a complete revision of the MLA's CSE Guidelines for Editors of Scholarly Editions. . . . The twenty-six essays that follow are grouped under two headings: material and theoretical approaches (“Sources and Orientations”) and actual practices and procedures.
(19) Anne Mahoney addresses digitization projects in Greek and Latin inscriptions and discusses the extent to which it is possible to preserve both information and its interpretation in such a context.
(19-20) Next, Greg Crane explains the inner workings of the
Perseus Digital Library, one of the oldest and largest collections of electronic editions. Perseus—originally focused on, and still best known for, editions of classical-era texts—has for nearly two decades grappled with changes in language technology. Christian Wittern explains in lucid detail where those technologies stand today, shows how text encoding is built on character encoding, and demonstrates the importance to editors of understanding how character encoding actually works.
(20-21) For that matter, even if we start the history of electronic scholarly editions with Father Busa's punch-card Aquinas in 1949, we are not many decades into developing an understanding of how to make and use electronic documents in general, let alone electronic scholarly editions in particular. It took five hundred years to naturalize the book and a hundred and fifty years to develop the conventions of the scholarly edition in print. Those schedules reflect the time required for social, not technological, change, and while the acceleration of technological change in this case may rush the social evolution of rhetoric for digital editions of print and manuscript sources, it will still be generations before the target of this volume stops moving. Even before that happens, as
Matthew Kirshenbaum has pointed out, we will soon be grappling with the problem of editing primary sources that are themselves digital—a problem with entirely new practical and theoretical dimensions (“Editing”).

1.1. Principles

(23) The scholarly edition's basic task is to present a reliable text: scholarly editions make clear what they promise and keep their promises. Reliability is established by accuracy, adequacy, appropriateness, consistency, explicitness—accuracy with respect to texts, adequacy and appropriateness with respect to documenting editorial principles and practice, consistency and explicitness with respect to methods.

1.2. Sources and Orientations
1.2.1 Considerations with Respect to Source Material
1.2.2 The Editor's Theory of a Text
1.2.3 Medium (or Media) in Which the Edition Will Be Published

I. Basic Materials, Procedures, and Conditions
II. Textual Essay
III. Apparatus and Extratextual Materials
IV. Matters of Production

(32) 20.0 If the edition—whether print or electronic—is prepared in electronic files, are those files encoded in an open, nonproprietary format (e.g., TEI XML rather than
Microsoft Word or Word Perfect)?
V. Electronic Editions
(34) 27.3 If any software has been uniquely developed for this edition, is source code for the software available and documented?




(54) Scholarly editing is grounded in two procedural models: facsimile editing of individual documents and critical editing of a set of related documentary witnesses.
(55) The critical editor's working premise is that textual transmission involves a series of translations.
(55) A key device for pursuing such a goal is stemmatic analysis.
(56) Three important variations on the two basic approaches to scholarly editing are especially common: best-text editions, genetic editions, and editions with multiple versions.
(57) Genetic editing procedures were developed in order to deal with the dynamic character of an author's manuscript texts.

(58) For example, one can now design and build scholarly editions that integrate the functions of the two great editorial models, the facsimile and the critical edition. . . . In short, digital tools permit one to conceive of an editorial environment incorporating materials of many different kinds that might be physically located anywhere.
(59) Scholars whose work functions within the great protocols of the codex—one of the most amazing inventions of human ingenuity—appear to think that the construction of a Web site fairly defines digital scholarship in the humanities.

(60) Traditional text—printed, scripted, oral—is regularly taken, in its material instantiations, as self-identical and transparent. It is taken for what it appears to be: nonvolatile.
(60-61) Any explicit feature of a text can be conceived as a mark. We may thus say that digital text is marked by the linear ordering of the string of coded characters that constitutes it as a data type, for the string shows explicitly its own linear structure.
(61) When we mark up a text with TEI or XML code, we are actually marking the pre-existent bibliographic markup and not the content, which has already been marked in the bibliographic object.
(61) With the introduction of declarative markup languages, such as SGML and its humanities derivative TEI, tags came to be used as “structure markers” (Joloboff 87).
(62) Text can thus be conceived as an “ordered hierarchy of content objects” (DeRose, Durand, Mylonas, and Renear 6; this is the OHCO textual thesis). But can textual content be altogether modeled as a mere set of hierarchically ordered objects?
(62) In principle, markup must therefore be able to make evident all implicit and virtual structural features of the text. . . . Textual structure is not bound, in general, to structural features of the expression of the text.
(63) In computational terms, it describes data structures but does not provide a data model or a semantics for data structures and an algebra that can operate on their values.
(63) The crucial problem for digital text representation and processing lies therefore in the ability to find consistent ways of relating a markup scheme to a knowledge-representation scheme and to its data model.
(63) In this approach, the problem to solve consists precisely in relating the scheme that describes the format of the documents to the scheme that describes their content. The first would be an XML schema, “a document that describes the valid format of an XML dataset” (Stuart), and the second would be a metadata schema such as the resource description framework (RDF) being developed for the Semantic Web.

(64) Diacritical signs are self-describing expressions of this kind, and markup can be viewed as a sort of diacritical mark. . . . Markup, therefore, can be seen either as a meta linguistic description of a textual feature or as a new kind of construction that extends the expressive power of the object language and provides a visible sign of some implicit textual content.
(65) Marks of this kind, viewable either way, behave just as Ludwig Wittgenstein's famous duck-rabbit picture (
Philosophical Investigations 2.11).

Buzzetti and McGann discuss insufficiency of OHCO thesis for missing structural mobility, assuming meaning embedded in syntactic form, assuming coincidence between syntactic and semantic forms.

(66) The OHCO thesis about the nature of the text is radically insufficient, because it does not recognize structural mobility as an essential property of the textual condition. . . . A digital text representation need not assume that meaning can be fully represented in a syntactic logical form. . . . A formal representation of textual information does not require an absolute coincidence between syntactic and semantic logical form. In this respect, the role of markup can be of paramount importance in bringing their interconnections to the fore.

Buzzetti and McGann see markup as highly reflexive act, oscillating indeterminacy like self-organizing systems; in line with videogame studies, electronic literature.

(67) Diacritical ambiguity, then, enables markup to provide a suitable type of formal representation for the phenomena of textual instability. . . . Markup should be conceived, instead, as the expression of a highly reflexive act, a mapping of text back onto itself: as soon as a (marked) text is (re)marked, the metamarkings open themselves to indeterminacy.
(68) The continual oscillation and interplay between indetermination and determination of the physical and the informational parts of the text renders its dynamic instability very similar to the functional behavior of self-organizing systems. Texts can thus be thought of as a simulation machine for sense-organizing of an autopoietic kind. Text works as a self-organizing system inasmuch as its expression, taken as a value, enacts a sense-defining operation, just as its sense or content, taken as a value, enacts an expression-defining operation. Text provides an interpreter with a sort of prosthetic device to perform autopoietic operations of sense communication and exchange.
(69) Louis Kauffman's and Francisco Varela's extension of Spencer-Brown's calculus of indications accounts more specifically for the “dynamic unfoldment” of self-organizing systems and may therefore be consistently applied to an adequate description of textual mobility.


Buzzetti and McGann invoke pragmatistic, existential imperative to build devices.

(69) Scholarly editions are a special, highly sophisticated type of self-reflexive communication, and the fact is that we now must build such devices in digital space. This necessity is what Charles Sanders Peirce would call a “pragmatistic” fact: it defines a kind of existential (as opposed to a categorical) imperative that scholars who wish to make these tools must recognize and implement.
(70) In fact one can transform the social and documentary aspects of a book into computable code. . . . We were able to build a machine that organizes for complex study and analysis, for collation and critical comparison, the entire corpus of Rossetti's documentary materials, textual as well as pictorial.

Interesting suggestion by Buzzetti and McGann for researching autopoietic functions of social textualities via user logs dovetails nicely with software studies and projects for future digital humanities scholars.

(71) The autopoietic functions of the social text can also be computationally accessed through user logs. This set of materials—the use records, or hits, automatically stored by the computer—has received little attention by scholars who develop digital tools in the humanities. Formalizing its dynamic structure in digital terms will allow us to produce an even more complex simulation of social textualities. Our neglect of this body of information reflects, I believe, an ingrained commitment to the idea of the positive text or material document.
(71) We are interested in documentary evidence precisely because it encodes, however cryptically at times, the evidence of the agents who were involved in making and transmitting the document.
(72) Our aim is not to build a model of one made thing, it is to design a system that can simulate the system's realizable possibilities—those that are known and recorded as well as those that have yet to be (re)constructed.
(72) McKenzie's central idea, that bibliographic objects are social objects, begs to be realized in digital terms and tools, begs to be realized by those tools and by the people who make them.

(74) In this essay, I use my experience with the
Canterbury Tales Project, with which I have been involved since its beginnings in 1989, to explore five such propositions.
(75) It is a material help too that all these texts are comprehensively out of copyright (though rights in manuscript images do need to be negotiated with the individual owners). Medieval authors are safely dead, and so too are any relatives who might have any copyright interest.

Proposition 1: The user of computer technology in the making of a particular edition takes place in a particular research context.

Robinson: propositions reached from Canterbury Tales project digital edition: specificity of research context, inclusion of full-text transcription, restoring exhaustive historical criticism, editing and reading altered, adopt open transcription policy.

(77) Until the late 1980s, a few experiments and articles appeared to suggest that a combination of the computer, with its ability to absorb and reorder vast amounts of information, and new methods of analysis begin developed in computer science (in the form of sophisticated relational databases) and in mathematics and in other sciences might be able to make sense of the many millions of pieces of information in a complex collation and provide a historical reconstruction of the development of tradition.

Proposition 2: A digital edition should be based on full-text transcription of original texts into electronic form, and this transcription should be based on explicit principles.
(78-79) We were fortunate indeed in the time a place. In time: the three-year project began just after the inception of the Text Encoding Initiative (TEI), and in those three years the first steps were being made toward electronic publishing, first on CD-ROM and later over the Internet as the Web began to take shape. In place: Oxford, where the project began, was intensely involved in the TEI through Hockey and Lou Burnard. . . . This close link with the TEI became crucial because of something to which I had not given much thought before: the need for a stable and rich encoding scheme both to record the transcripts of the original texts we were to make and to hold the record of variation created by the collation program.
(79) We have been able to use readily available commercial SGML/XML software (first
DynaText, later Anastasia) to achieve excellent results. . . . Many years on, the gap between the programs and the transcribers has narrowed but persists (emacs is still the tool of choice for many).
(79-80) But what is good for interchange is not necessarily good for capture, where an efficient and focused system is required for the transcribers. Nor may it be good for programming, as attempted by the XSLT (extensible stylesheet language transformation) and similar initiatives.
(80) Encoding the words and letters in a printed text can be quite simple: just establish the characters used by the printer and allocate a computer sign to each. . . . But in a manuscript, where the range of marks that can be made by a scribe is limitless, matters are not so simple. The transcriber must decide which of these marks is meaningful and then which of the range of signs available on a given computer system best represents that meaning.
(82) The collation tool we had developed by then had the ability to regularize as we collated, thereby shifting the responsibility for deciding exactly what a variant was to the editor from the transcriber. We decided therefore to adopt a graphemic system: as transcribers, we would represent individual spellings but not (normally) the individual letter shapes. We also included in our transcriptions sets of markers to represent nonlinguistic features, such as varying heights of initial capitals, different kinds of scribal emphasis, and the like: what Jerome McGann calls bibliographic codes (
Textual Condition52).

Proposition 3: The use of computer-assisted analytic methods may restore historical criticism of large textual traditions as a central aim for scholarly editors.
(84) So we formed an alliance: Cambridge would purchase
DynaText, and we would work out how to use it both to publish our own CD-ROMs with Cambridge and help Cambridge publish other CD-ROMs.
(84) Accordingly, in 1996 the first of our CD-ROMs was published: the Wife of Bath's Prologue on CD-ROM (
Wife). It included all transcripts of the fifty-eight witnesses, images of all pages of the text in these manuscripts, the spelling databases we had developed as a by-product of the collation, collation in both regularized spelling and original spelling forms, and various descriptive and discursive materials. It presents a mass of materials such as an editor might use in the course of preparing an edition.
(84-85) There is an obvious analogy between the processes of copying and descent we might hypothesize for manuscript copying and those of replication and evolution underlying biological sciences: both appear instances of “descent with modification,” to use Darwin's phrase. . . . we were able to show that
phylogenetic software developed for biological sciences gave useful results when applied to manuscript traditions.
(85) As to the first question, our experiments suggest that such programs may indeed produce representations of relations among manuscripts that correspond with historical sequences of copying.
(85) As to the second question (Are such reconstructions useful for editors?), where these techniques show a group of manuscripts as apparently descended from a single exemplar in the tradition, one should be able to deduce just what readings were introduced by the exemplar.

Proposition 4: The new technology has the power to alter both how editors edit and how readers read.

Robinson: inspires comparing repeatability of analysis to software quality assurance methodologies exemplified by development of Anastasia software tool for Javascript rendition of XML encoded files.

(86) Because of the electronic publication mode, we were also able to include the actual software and all the data we used for the analysis, with exercises that allowed readers to run the software themselves, so that they might confirm, extend, or deny the hypotheses suggested in the article.
(87) We developed a new software tool,
Anastasia, specifically to offer a bridge between the XML, into which we now decanted all our files, and the new JavaScript-and-HTML interfaces now appearing. Our first publication to make use of this combination was our third CD-ROM, the Hengwrt Chaucer Digital Facsimile (Stubbs). For this we had a new aspiration: it should be beautiful. . . . For us, this was an opportunity to give practical expression to what was becoming a core belief of the project, that we could use the new tools and our materials to change the way people experience a text.

Proposition 5: Editorial projects generating substantial quantities of transcribed text in electronic form should adopt, from the beginning, an open transcription policy.
(88-89) The
Canterbury Tales Project is not a legal entity and so cannot own anything, including copyright. Copyright in the transcripts varies, belonging either to the individuals who did them or to the institutions in which those individuals were based. . . . We would like others to take them, reuse them, elaborate on them (e.g., they could include the graphetic information we rejected), and republish them: exactly the means of scholarship promoted by the fluid electronic medium. But if future scholars must go through a process of increasingly lengthy, multisided negotiation, then the transcripts will become unusable, walled from the world by legal argument.

Robinson: inspires comparing open transcription policy to four freedoms enshrined in GPL.

(89) The answer to this problem, we can now see, is an open transcription policy, modeled on the copyright licensing arrangements developed by the Open Software Foundation (now part of the Open Group). . . . What it does mean is that the copyright holders assert that the transcripts may be freely downloaded, used, altered, and republished subject to certain conditions (basically, republication must be under the same conditions, all files must retain a notice with them to this effect, and permission must still be obtained for any paid-for-publication).
(89) This massive burst of activity across all the traditional domains of medieval philology puts one in mind of the grand editorial projects of the nineteenth century.


Rosenberg: recounts development of a major scholarly editing project of Edison Papers that includes its technological evolution.

(92) The Edison Papers is working to combine images and text, and I hope that a careful examination of some avenues and lessons learned in that process will be helpful to anyone fortunate enough and bold enough to undertake such a task.
(93) A second unusual aspect of the Edison corpus [after its size] is the central importance of drawings and even physical artifacts to an understanding of its subject's work, which is a direct consequence of Edison's being an inventor and fresh territory for documentary editing.

(93) It had been impressed on the project organizers that the only way to control a collection of this size was with an electronic database. Fortunately, the Joseph Henry Papers Project had already started blazing a trail into that mysterious territory. Using that experience and knowledge as a foundation, the Edison Papers created a database that would prove two decades later to be the heart of their electronic edition.
(93) The first incarnation of the database lived on a university mainframe and was written by a hired programmer in Fortran 77. The main table had twenty-four data fields.

(96) Increasing the bit depth of the images allowed us to scan them at a relatively modest resolution of 200 dpi.
(96) When it was done, we had nearly 1,500 CDs holding a terabyte of data.
(97) With the help of an outside programmer (and a 21-inch screen), we created an interface that displayed successive digitized images on one side and database information on the other.
(97) The result is an online image edition that allows the user to sample or assemble the documents in a number of ways—name, date, document type, editorial organization—and to view as a group documents scattered across many reels of microfilm.

(98) But there was no existing symbol for an artifact, nor was there one for technical notes or notebook entries, both of which we had in abundance. So we created new ones: M (model) for physical objects and X for technical materials. . . . What stumped us was the question of authorship raised by such documents. . . . Finally we cut the Gordian knot and declared that documents of type X had indeterminate authors, even when Edison wrote them. This solution turned out to reflect the way work was pursued as well as the way many of Edison's coworkers felt about the work. They realized that they were active participants, but they also recognized that when Edison was not in the laboratory, work slowed after a couple of days, and that in fact the world would not have existed without him to drive it.
(99) As Edison's designs stretch the notion of artifact to include his electric central stations and the Ogdensburg ore-millling plant of the 1890s, modes of presentation will doubtless adapt to them.
(99) It was planned from the beginning of the electronic edition that the text of the print volumes would be included, marked up with SGML (later XML) to take full advantage of the capabilities of live digital text.
(100) The work done by David Chesnutt, Michael Sperberg-McQueen, and Susan Hockey for the Model Editions Partnership (MEP) was immensely helpful, as was the DTD they developed.
(100-101) We had the advantage of long familiarity with our word processor, which allowed us to write fairly complex macros that greatly simplified much of the tagging.
(101) Embedding editorial material proved challenging. The first big decision concerned the index.
(102-103) The other complex editorial decision involved references and is still very much in process. . . . What in the book are haphazard strings of frame numbers must become a new type of online document, an artifice that allows the user to see the relevant notebook pages as the assemblage we intend. Just as we made notebooks, account books, and scrapbooks browsable by creating a new data table, these compound references will be a creation of the database.

(105) The tasks of learning how to encode, how to adopt or create a DTD (document type definition) sufficiently complex to account for all the poems and manuscripts that will be a part of the edition, how to imagine the overall editorial environment the edition will provide, how to ensure the stability and portability of the edition over time, and how to make deliverable over the Web (if desired) the finished edition can be daunting.
(106) All scholarly editing requires the editor to pay attention to texts in more then one way, to what Jerome McGann has called the “concurrent structures” that divide the editor's attention between, on the one hand, bibliographic codes for design and presentation and, on the other hand, linguistic codes for structural and semantic communication (“Editing” 90).
(106) Because poetry, with its enhanced self-consciousness of the physique of texts, expresses itself inextricably through particular interfaces, any editor of poetic texts in the digital medium must be centrally concerned with interface, with matters of textual display and appearance. . . . It seems to us as well that any serious editor of electronic texts must pay attention to an even wider field for such questions, looking outward to the “contextual” relation of multiple individual texts and other materials on the Net as a whole and within hyperlinked clusters, paying attention to the poem
and the network.

Fraistat and Jones: give sizable examples of working code to illustrate the devils bargain with HTML as well as creative layout of text and critical apparatus.

(107) What is at stake in the devil's bargain with HTML is perhaps best illustrated in one of our very early texts, Shelley's broadside ballad of 1812, “The Devil's Walk.” . . . The pragmatic limitations of HTML markup are clear, here, to anyone with an elementary knowledge of encoding, including the then-necessary but inelegant use of the nonbreaking space tag (“ ”) to create indentation.
(108) One's ability in HTML to divide a screen window into separate frames allowed us to think creatively about how to display the textual apparatus of the “The Devil's Walk” edition in relation to the text proper.
(109) Along with the Web design community,
Romantic Circles has since moved away from the use of frames wherever possible, but they served a purpose for a time in the wide-area protocols of the Web, and they illustrate the kinds of provisional editorial solutions the larger context and intrastructure of the network sometimes require.

Fraistat and Jones: TEI for encoding poetic text at level of structure, describing in ordered hierarchy.

(111) If we wish to encode a poetic text at the level of its structure, to describe (not format) its components—stanzas, parts of stanzas, lines, and so on, for search, retrieval, analysis, and recombination by a computer—we must turn to SGML proper and the guidelines developed by the Text Encoding Initiative (TEI). . . . It now seems likely that both the HTML 3.0 and SGML (TEI Lite) versions of “The Devil's Walk” will in the near future need to be made available in XML (or the Web-ready standard it has created, XHTML).
(112) By nesting multiple sets of tags of this sort, it becomes possible logically to mark the portions of a stanza—octet, sestet, quatrian, couplet—such that software recognizing the document type could parse, search, and manipulate the text in complex ways. To put it in computer terms, we focus on the text's content objects as they can be described in an ordered hierarchy.
(113) All this data and metadata will be marked in the text itself, not in a separate file, and will then be carried with the edition in a form that will survive across various platforms and delivery systems.
(113) In general, XML now promises to overcome the crudest form of the binary opposition between structural and display markup, which is very good news for electronic editions of poetry.

Fraistat and Jones: dynamic collation of Graver and Tetreault hints at single sourcing and RCS features.

(114-115) To represent that multiplicity of versions in meaningful ways, Graver and Tetreault replace the standard apparatus criticus with what they call dynamic collation, a script that allows for comparative viewing of textual cruxes in their original contexts (fig. 2).

Fraistat and Jones: MOOzymandias virtual reality experiment enacts the autopoietic functions of social texts envisioned by Buzzetti and McGann, demonstrating similarities between editing and programming.

(116) More recently, we have moved beyond the Web page and HTML as such in MOOzymandias, an ambitious collaborative experiment in editing that situates Shelley's sonnet “Ozymandias” in a text-based multiuser virtual-reality environment, making the edition, its text and apparatus, more like a game or theatrical space than a letterpress artifact. MOOzymandias was created to attempt what no existing markup scheme can really do well yet: deal with the multidirectional, spatialized, phenomenological effects of poetic language—and the multilayered complexity with which poems mean, in terms of both their presentational and structural features and in terms of the contextual editorial environments constructed by every edition through its acts of annotation and interpretation.
(116) Textual editors should be among those attempting such important and innovative experiments in electronic environments. There are, for instance, interesting possibilities for using 3-D editorial environments to interrelate text and apparatus, as suggested by Matthew Kirschenbaum (“Lucid Mapping”). We could even imagine future editions or archives structured as databases that could be customized to the needs and interests of individual users: first in response to a user's electronic registration form indicating those interests, then by the distribution of relevant information to users based on their behavior while interacting with the edition or archive, much as tailors itself to the behavior of customers.
(117) Schreibman's editor as literary encoder has already at some point learned how to grapple with a logic native to computer programmers. . . . In time, it is likely that tools will be developed that allow editors to produce simple markup by uploading the text and then filling in fields in an online form. . . . For now, editors planning to use XML markup would do well to purchase a software editor, such as
XmetaL or OXygen, that can facilitate uniform and valid encoding throughout the edition.

(123) The initial version of the electronic side will contain all the print edition in digital form as well as the complete old-spelling texts and image facsimiles of all the early print and manuscript witnesses, a full census of those witnesses, life and court-masque primary archives, performance calendars, a reconstruction of Jonson's library, and a diverse collection of other materials that might help us better understand these important works. Once the basic electronic archive is completed, the developmental strategy will shift from traditional to innovative, from compiling and organizing the essential evidence to investigating and analyzing the complex possible interactions among the various elements. . . . Hypertext theorists refer to this complex and unpredictable texture as
rhyzomatic, a term that comes from the tangled root structure beneath a field of grass, a non-hierarchical mass of ever-growing links between and among tufts.
(124) Many electronic archives and editions rely on relatively simple frameworks that structure material according to rigorous hierarchies branching from a central core. . . . Unlike such rigid e-text collections, the
CEWBJ seeks to explore the vision of electronic textuality imagined by Jerome McGann in his influential “The Rationale of HyperText.”
(124) The
CEWBJ derives its core texts from manual and keyboarded transcriptions of the source witnesses, employing either copies of the early quartos and folios owned by editors or the UMI Early English Books series of microfilm facsimiles, along with on-site transcriptions of manuscripts held in research archives.

(125) Considered structurally, drama consists of spoken language presented in soliloquial or dialogic form. These speeches are usually organized into a sequence of scenes, which in the Western tradition can also be grouped into acts. Typographically the representation of this structure on the printed page has changed very little since the first publication of interludes in the early sixteenth century.
(126) Beyond speech, however, printed drama can also contain a variety of components that interpret the theatrical circumstances of the work for the reader—character lists, stage directions and descriptions, acting notes, and details of real or ideal performance.
(126) Again, early printed drama provides numerous examples of how printers learned to use format to distinguish among the textual components.
(127) Once printers started using roman and italic fonts instead of black letter in the 1580s, they could differentiate speech prefix and speech typographically; this differentiation by font became the model for the next four centuries. By the late sixteenth or seventeenth century, authors began to bring an awareness of format and design elements to their work, sometimes providing in their holograph manuscripts a template for the printed book.

(128) From the outset the Text Encoding Initiative recognized the special textual and material requirements of theatrical works, including in its guidelines, a set of encoding strategies designed specifically for drama.
(128-129) Speech, the component common to almost every dramatic work throughout history, is encoded with the <sp> element. . . . Application of the constant
who attribute allows to interrogate the linguistic aspects of a character across an entire play or, in play cycles, across multiple works.
(129) The contents of a speech usually consist of prose or verse text in a variety of arrangements; a prose paragraph is encoded as
<p>, while a line of verse appears within the <l> tag. Furthermore, verse organized in stanzas, verse paragraphs, or other poetic structures can be encoded in linegroup <lg> tags.
(131) The
<stage> tag is used to mark the nonverbal stage directions included in a piece of dramatic text and employs the type attribute, while the <move> tag signals the actual movement of a character or characters on, off, or around the stage.
(132) While Jonson's plays employ a vertical hierarchy, his masques and entertainments are much more horizontally structured, consisting of a mixture of speeches, songs, dances, and prose commentary.

Gants: detailed examples of tag usable and working code examples for encoding drama, fleshing out problem of multiple hierarchies as the major challenge to text encoding.

(134-135) Each of the above examples presents a textual unit organized in a fairly hierarchical fashion, an arrangement ideal for the structural nesting principle at the hear of XML's design. But in practice literary works rarely conform to vertical hierarchies for very long, instead evolving sophisticated linguistic patterns that overlap and overlay in complex ways. The TEI guidelines offer a number of solutions to the problem of multiple hierarchiesfor example, using a lattice of pointers and targets or linking elements with location ladders—although none are completely satisfactory. When dealing with performance works in which intersecting structures are part of the fabric of the text, marking the individual pieces in an aggregate fashion and employing the <join> element to coordinate them all has proved especially useful. . . . Each section of the letter is assigned a unique identifier from a1 through a10, and any program designed to process the text can reconstruct the letter as a single unit by using the information in the <join> element.
(135-136) A strategy similar to
<join> is used when representing simultaneous speech and action in a stage play. Each component receives a unique id and the overlapping relation is declared with the corresp attribute to the <stage> element. . . . The corresp attribute provides the processing instructions needed to reconstruct the performance circumstances in whatever format is required.

(138) To manage the problem of sheer scale, the anthology, like the canon, exercises a strategic simplification. But with the digital anthology this same strategic purchase can be achieved not through exclusion and brevity but through the intelligence of the data itself, which can enable the reader to discover the thematic subcollections within a larger assembly of texts.

Flanders: consider this notion of readerly discovery promoted by Flanders for software and critical code studies.

(139) This emphasis on readerly discovery is part of a crucial shift that has shaped the digital collection and its editorial assumptions.
(140) If one result of these developments has been a tendency to view a digital collection in the spirit of an archive—as a body of source material on which may be built a superstructure of metadata, retrieval and analysis tools, and editorial decisions—the corollary has been an almost ironic interest in the materiality of the text.

(141) Although it might seem absurd to imagine such a text substituting for a visit to the physical archive, our goal was to represent all the linguistic detail that a view of the original would provide and to capture all the document's contents, even where they were almost unconnected with the main work (e.g., advertisements).
(141-142) In transcribing the text, we preserve the readings of the original text, whether or not they seem correct, explicable, or intended by the author or printer. Our premise here is, first, that errors may be significant, whatever their source: they are part of the information that circulated to readers when the text was first published and are part of the evidence that literary researchers may wish to view. Second, in many cases (particularly in earlier texts) it may be difficult to say with confidence that a given reading is an error.
(142) We treat authors' specific intentions with respect to literary meaning as not only largely unknowable but also beside the point: what we wish to represent is a cultural document, a piece of historical currency whose modern readers may or may not find in it insight into the author's mind.
(143) Most important, the header for each file includes identification of each participant—author, editor, publisher, printer, and potentially many others—together with the possibility of demographic information on each.
(143) By using a vocabulary for describing genre and textual structure that locates the particular instance within a larger framework, we not only allow for comparisons across the collection but also potentially between this collection and others similarly prepared.

(144) As we currently present it, the text is displayed in a manner that preserves the most significant details of its original formatting. . . . Very shortly we will also be able to offer a display that shows the original readings and offers the ability to switch between views.
(144) Most significant, we do not capture any of the graphic features of the text such as illustrations and ornaments. Our transcription includes placeholders for such features, and for figures (images with representational content) we encode a detailed description of the illustration and a transcription of any words it may contain.
(145) Because we currently do not represent more than one copy of a given text, we do not have any apparatus representing textual variants. But in some texts we do need to represent manuscript deletions and revisions.

(146) Providing page images is not practicable for us, but our full-text transcriptions and metadata are encoded in XML following the TEI guidelines with a degree of detail that is unusual (perhaps even unmatched) among projects of this sort. We are also unusual, though not unique, in providing a detailed account of our editorial and transcriptional methods to the reader as part of our site documentation.
(146) The WWP prefers to capture any emendation using XML encoding rather than make silent alterations; as a result our approach may lend itself more than others to offering readers alternative versions of the text (concerning the treatment of details like typographic errors or abbreviations).

(147) Consistency of encoding is the most difficult to achieve, particularly with a complex system like the TEI guidelines. . . . Like most digitization projects, we rely first of all on careful and extensive documentation that our encoders use both during training and as a reference while they are transcribing.
(147) The WWP's example illustrates a few trade-offs that are particularly significant in the transition to digital editing. . . . By capturing the text so as to represent its variability as a data structure, we are able to create a distinct editorial space that stands apart from the source transcription and from any final editorial result. This space is accessible to us as editors—it is where the editing proper can occur—but it is also accessible to readers, enabling them to inspect the decisions that have been made and choose different strategies if they wish.

(150) This long and complex genesis with more than twenty versions makes
Stirrings Still a particularly interesting example to discuss the scholarly editing of bilingual writings.

(152) While it may seem remarkable that an edition advises its own reproduction, the idea of a working copy is essential and underlies the interface design of a digital equivalent of the face-to-face representation, to be consulted at all times.
(153) Beckett's act of self-translation has the paradoxical effect of fixing a text by reproducing it in another language. . . . Since he did not necessarily stick to one version to make his translations, the idea of an original or source text is problematized and cannot serve as a general principle to choose the base text.
(154) It is remarkable that neither the
Guardian publication nor the text in the Beckett Shorts (vol. 11) published by John Calder mentions the dedication “For Barney Rosset.” It was Beckett's concern for Rosset's situation that brought about the publication.
(154) As a consequence, the choice of the limited edition “for Barney Rosset” as the base text is inspired primarily by this social circumstance: not because it is deluxe but because it was meant to help a friend. This case shows that authorial intention and social orientation are not mutually exclusive.


Van Hulle: argues transclusive flexibility afforded by not only digital format but nonproprietary format so that it can be machine processed in new ways.

(155) The transcription of the documents in Reading is encoded in TEI-compliant XML. The advantage of this nonproprietary format is the resulting transclusive flexibility of the textual material. Depending on the user's focus, the draft material can be rearranged in several ways: (1) in a documentary approach, based on the catalog numbers; (2) in chronological order; (3) by language; (4) with a focus on translation; (5) in retrograde direction, starting from the published texts.

Van Hulle: proposes Vanhoutte linkable unit linkeme a basic concept of electronic texts (see if Landow covers).

(156) Every paragraph in the reading text can be linked to and compared with other versions of it. Vanhoutte has called this linkable unit a linkeme, “the smallest unit of linking in a given paradigm” (“Linkemic Approach”).

(157) Traditionally the notion of variants applies to variation either between copies of an ancient or medieval document by scribes or between different editions of the same work. When dealing with modern texts, a distinction must be made between transmission variants and genetic (or composition) variants. The edition of a bilingual work requires an extra category of translation variants.
(158) The edition offers users the chance to adapt the size of the textual unit they wish to compare (large, medium, small)--i.e., the unit of the section
<div>, the paragraph <p>, or the sentence <seq>, which is already a refined form of versioning. But it is possible to go further and make the edition into a critical genetic edition, where the editor indicates the genetic variants explicitly.

Van Hulle: discusses versions and variants comparable to source control systems and versioning in word processors to deal with self-generative, algorithmic character of traditional text (McGann).

(158) Except for the first extant version, a previous version can always serve as a temporary invariant against which the genetic variants can be measured, even if the writing was eventually aborted and never published.
(160) Beckett was well aware of what McGann calls “the
algorithmic character of traditional text(Radiant Textuality 151): text generates text, and for Beckett translation played a crucial role in the exploitation of this self-generative power. Authorial translations give evidence of an enhanced textual awareness. As a consequence, their textual examination and scholarly editing are a crucial part of their critical interpretation.

(162) To allow a functional debate on editing and editions in the electronic paradigm, editors should provide an explicit definition of an electronic edition as well as the kind of scholarly edition they are presenting in electronic form. . . . To avoid confusion among different meanings and types of edition, I sketch out my definition of an electronic scholarly edition in the first section of this essay and formulate six requirements that editors could embrace to ensure that their edition is treated as such.

(163) By electronic edition, I mean an edition (1) that is the immediate result or some kind of spin-off product from textual scholarship; (2) that is intended for a specific audience and designed according to project-specific purposes; (3) that represents at least one version of the text or the work; (4) that has been processed from a platform-independent and nonproprietary basis, that is, it can both be stored for archival purposes and also be made available for further research (Open Source Policy); (5) whose creation is documented as part of the edition; and (6) whose editorial status is explicitly articulated.
(163) What Tanselle and Shillingsburg here seem to overlook is that the practice of creating an edition with the use of text encoding calls for explicit ontologies and theories of the text that do generate new sets of theoretical issues.

Editorial Principles and Markup

(165) The aim of the [
De teleurgang van den Waterhoek] editorial project was to explore different ways to deal with textual instability, textual variation, the genetic reconstruction of the writing process, and the constitution of a critically restored reading text.
(166-167) For sociological reasons, the edition presents two critically edited texts: the versions of the first and second print editions. . . . In constituting these texts, we applied the principles of the German (authorial) editorial tradition, which allows only justified corrections of manifest mistakes. In these two critical texts, the emendations were documented by the use of the
<corr> element, containing a correction, and a sic attribute, whose value documents the original reading.

A Linkemic Approach to Textual Variation
(167) In order for users of the edition to be able to evaluate what they see, the facsimiles are accompanied by a full account of the imaging procedure, including the documentation on the software and hardware (and settings) used in the project, which I believe is an essential requirement.

Vanhoutte: addresses new aspects of documenting how the edition was created, such as linkeme methodology, and new ways of reading provided by automagic of sed and awk, tracing cultural boundaries between digital humanities scholarship and IT, which are foregrounded by emphasizing noncritical operations.

(168) Instead of linking the orientation text to a variorum apparatus, the editors opted for what I have called a linkemic approach to textual variation. I define a linkeme as the smallest unit of linking in a given paradigm. This unit can be structural (word, verse, sentence, stanza, etc.) or semantic. In the glossary provided with the orientation text, the linkeme is of a semantic class that can be defined as the unit of language that needs explanation. . . . The architecture was automagically generated from the digital archive by a suit of sed and awk scripts. The linkemic approach provides the user with enough contextual information to study the genetic history of the text, and it introduces new ways of reading the edition.


Vanhoutte: dossier genetique resorts to internal creative process as internal monologue of the editor, and thus a form of speech, points to attempt to reconstruct final software product from long history of revisions and contested negotiations.

(168-169) Despite its strengths, this practice is problematic for a genetic edition based on modern manuscript material. . . . As an alternative I suggest that further research on a methodology and practice of noncritical editing or transcription of modern manuscript material may result in markup strategies that can be applied to the constitution, reading, and analysis of a so-called dossier genetique. My approach to the manuscript as a filtered materialization of an internal creative process, one that is comparable with the process of internal monologue or dialogue and that thus can be considered a form of speech, might be helpful in this respect.

Observation 1: (Non)Critical Editing
(169) The essential, difficult, and time-consuming step of the transcription of primary textual sources is not explicitly mentioned in this outline.
(170) Electronic noncritical editing is concerned with the twofold transformation from one format into another: first the transformation from the text of a physical document to the transcription of that text; second the transformation from one medium, the manuscript, to another, the machine-readable transcription.
(170) The reason for the neglect of noncritical editing in the theory and practice of textual criticism, however, is frequently the lack of a satisfactory ontology of the text on which a methodology of noncritical editing can be modeled.
(170) So the chapter that seemingly deals with noncritical editing in the TEI guidelines addresses issues that are central in critical editing and includes in its DTD subset tags to encode them. . . . Whereas they emphasize the unimportance of noncritical editing in their theories, the French school of
critique genetique mainly works with noncritical representations of the documents under study.

Observation 2: Methodology

Vanhoutte: answering what is a text, ontology matters for noncritical operations, such as transcription, especially if it turns out to be non-nesting, non-hierarchical.

(171) Only when a project has a clear argument on the ontology of the text can a methodology for text transcription be developed.

Vanhoutte: argues modern texts often feature non-nesting problems from time and overlapping hierarchies.

(171) The transcription of modern manuscript material using TEI proves to be more problematic because of a least two essential characteristics of such complex source material: time and overlapping hierarchies.
(172) Therefore, the structural unit of a modern manuscript is not the paragraph, page, or chapter but the temporal unit of writing. These units form a complex network that often is not bound to the chronology of the page.

Vanhoutte: explores need for better ways to handle temporal elements via spatialization in digital data (Castells), and the choice to use only digital facsimiles acknowledges the limits of TEI, and is punting; multiple versions model of text lends itself to encoding via revision control system as well as TEI.

(172) The current inability to encode these temporal and genetic features of the manuscript and the overlapping hierarchies with a single, elegant encoding scheme forces an editor to make choices that result in impoverished and partial representations of the complex documentary source. . . . Therefore, in the electronic edition of De teleurgang van den Waterhoek, we opted to represent the complex documentary sources by means of digital facsimiles only, preserving in that way the genetic context of the author's dynamic writing process.
(173) This fear of testing existing transcription systems with modern manuscript material of a complicated nature in several projects may signal the fact that a coherent system or methodology for the transcription of modern material still must be developed and tested and that an ontology of the text must be agreed on.

Genetic Criticism—Critique Genetique

Vanhoutte: three categories of genetic criticism are transversal, horizontal, vertical.

(174) Therefore, critique genetique does not aim to reconstitute the optimal text of a work and is interested not in the text but in the dynamic writing process, which can be reconstructed by close study of the extant drafts, notebooks, and so on. . . . Rather than produce editions, the geneticiens put together a dossier genetique by localizing and dating, ordering, deciphering, and transcribing all pre-text witnesses. Only then can they read and interpret the dossier genetique. But the publication of genetic editions is still possible.

Putting Time Back in Manuscripts
(175) These four complexities [process, scriptorial pauses, nonverbal elements, sub-chronological segmentation] are exactly what the TEI guidelines consider “distinctive features of speech.”
(176) If we consider any holograph witness as a filtered materialization of an internal creative process (thinking) that can be roughly compared to an internal dialogue between the author and the biographical person, we may have a basis on which to build a methodology for the transcription of modern manuscript material. By combining the TEI DTD subsets for the transcription of primary sources, the encoding of the critical apparatus, and the transcription of speech, we could try to transcribe a manuscript and analyze it with tools for the manipulation of corpora of spoken language. It is interesting in this respect to observe how
critique genetique describes authorial interventions like deletions, additions, Sofortkorrektur or currente calamo, substitutions, and displacements in terms of material or intellectual gestures, as if they were kinesic (nonverbal, nonlexical) phenomena.
(176) This approach does not do away with the essential problem of non-nesting information, which is an inescapable fact of textual life and even results from a one-way analysis.

Vanhoutte: even considering RCS commits as speech acts, still problem of non-nesting information of BNF-style grammars to keep theorists busy; encoding becomes a form of noncritical close reading, for example Greg Crane describing how the PDLS lookup files mitigate.

(176-177) Creating a noncritical edition-transcription of such a text with the use of encoding is the closest kind of reading one can do. . . . Paradoxically, existent and extant manuscripts generate, by their resistance to current systems of text encoding, new ontologies of the text and new approaches toward that encoding.

Wittgenstein's Nachlass: The Bergen Electronic Edition was published at Oxford University Press in 2000. This electronic edition is the first publication of the Austrian philosopher Ludwig Wittgenstein's complete philosophical Nachlass. It contains more than 20,000 searchable pages of of transcription and a complete color facsimile.


Huitfeldt: argues Wittgenstein manuscripts provide almost every imaginable complicating variation for textual markup and requiring keen awareness of of nuances of diplomatic reproduction.

(182) Like many modern manuscripts, Wittgenstein's writings contain deletions, overwritings, interlinear insertions, marginal remarks and annotations, substitutions, counterpositions, shorthand abbreviations, as well as orthographic errors and slips of the pen. . . . Moreover, Wittgenstein had his own peculiar editorial conventions, such as an elaborate system of section marks, cross-outs, cross-references, marginal marks and lines, and various distinctive types of underlining.
(182) These inter- and interatextual relations, although complicated and by no means fully known, are of interest to scholars studying the development of Wittgenstein's thought.


Wittgenstein's Nachlass: The Bergen Electronic Edition has three main components: a facsimile, a diplomatic transcription, and a normalized transcription, each providing an interrelated but independent view of the Nachlass.
(183) The
diplomatic version record faithfully not only every letter and word but also details relating to the original appearance of the text. One might say it acknowledges that our understanding of the text derives in no small part from the visual appearance of material on the page.
(183) The
normalized version, on the other hand, presents the text in its thematic and semantic aspect.


Huitfeldt: Wittgenstein Nachlass a forty man-year project, like a modern videogame or other software application, exceeds the capability of any single individual to produce; see Hayles on collaborative aspects of electronic literature.

(186) The Wittgenstein Archives at the University of Bergen spent altogether forty man-years (including text transcription and editing, management, administration, systems development and maintenance, and all other tasks related to the project), to give an average throughput of two pages per person per day, which is high compared with other editorial projects.


(187) Be we decided not to use SGML for this project. Instead, a special code syntax was developed for the Wittgenstein Archives, and software that allowed for flexible conversion to other formats was developed. This syntax and software were called a multielement code system (MECS).

(187) That the TEI guidelines provide various alternative mechanisms for the encoding of many (or even most) textual phenomena is one of the strengths of the guidelines and one of the reasons why they are found applicable to a large number of widely different projects involved in text encoding. At the same time, their openness and flexibility create the danger of inconsistency.
(188) For example, abbreviations may be encoded in basically two different ways according to the TEI guidelines.
(189) The Wittgenstein Archives decided to make a distinction between standard and nonstandard abbreviations and to represent both as element content.


(190) One might say that the aim of a diplomatic representation is to get every letter of the original right, whereas the aim of a normalized representation is to get every word and every reading right.
(190) The Wittgenstein Archives decided to use a less strict definition: a diplomatic reproduction should reproduce the original, grapheme by grapheme; contain indication of indentation and the relative spatial positioning of text elements on the page; and include information about deletion, interlinear insertion, and a number of different kinds of underlining. It was not considered necessary to indicate every line break or allograph variation.



(194) An intriguing aspect of editing philosophical texts is that the editorial work itself exemplifies a number of classical philosophical problems, such as the relations between representation and interpretation, the subjective and the objective.



Crane: overview of design and programming considerations going into Perseus Digital Library System (PDLS) crosses humanities scholarship into philosophical programming.

(277) The PDLS [Perseus Digital Library System] is significant in that is shows concretely which functions one evolving group of humanists felt were valuable and feasible.

Adding Reference Metadata to a TEI File
Displaying the Contents of the TEI File


Basic Display and Browsing

(281) Submit a URL and return an unformatted, well-formed fragment, allowing a third-party system to format or analyze the XML source. We condier this feature to be critical, since it makes possible for multiple systems to apply a wide range of analytic and visualization techniques to the data that we manage.

Processing Data Files
Convert SGML to XML. . . . Extract core metadate from the XML file. . . . Aggregate the metadata for the PDL. . . . Generate the lookup table.
(282) We can use the lookup tables to support overlapping hierarchies, addressing a well-known drawback of
BNF-style grammars such as SGML/XML.

From Citations to Bidirectional Links

Crane: argues persistent linking schemes for print citations exemplify pre-digital solutions, whereas PDLS developed abstract bibliographic object concept from which bidirectional links can be generated.

(283) Their monodirectional nature makes the Web a directed graph and has profound implications for its topology. In digital libraries, however, having greater control over content, we can track links between documents. More important, long before computers were invented, many formal publications developed canonical schemes that gave print citations persistent value: there are various ways to abbreviate Homer and Odyssey, but Hom.Od.4.132 described the same basic chunk of text in 1880 and 1980.
(283) Persistent citation schemes are fuzzy, and this fuzziness gives them flexibility. The PDLS uses the concept of an abstract bibliographic object (ABO) to capture the fact that a single work may appear in many editions.
(284) ABOs are arguably most exciting when they allow use to convert individual citations into bidirectional, many-to-many links. . . . Clearly, this service raises interesting problems of filtering and customization as annotations encrust heavily studied canonical texts, but we view such problems as necessary challenges and the clusters of annotations on existing texts as opportunities to study the problems of managing annotations.

Crane: invitation to studying problems of managing annotations encrusting heavily studied texts; suggests lexicon can become a commentary.

(284) In an online environment, however, the lexicon can become a commentary: that is, the readers of a text can see the words that the lexicon comments on.

Indexing Textual Links in Perseus

(285) Information extraction seeks to automate the process of identifying people, places, things, and the relations among them.
(286) The output of automatic parsing is imperfect and will vary from corpus to corpus, but imperfect scalable analysis of large bodies of data can reveal significant patterns.

Crane: extracting place and date information to identify events hints at Manovich big data analysis.

(286) Information extraction tends to be domain-specific.
(286-287) The generalized architecture for text engineering (GATE), developed at Sheffield, provides one model of how to integrate complementary information extraction modules and may point the way for digital library systems that incorporate these functions as a matter of course.

Extracting Places
(287) We scan all XML files for possible place-names.
(287) We currently combine geospatial data the
Perseus collected for Greco-Roman sites with TGN [Getty Thesaurus of Geographic Names] data.

Extracting Dates
(287) We scan all XML files for dates. In practice, dates have proved much easier to identify than place names.

Using Places and Dates to Identify Events
(288) Once lists of places and dates are available, it is possible to look for associations between the two to identify significant events.
(289) The structures that we add to our documents reflect elaborate (if often unconscious) cost-benefit decisions not only about the interests of our audience but also about how future systems will shape and enable those interests.

(346) A contract often requires the author to transfer all copyrights to the publisher and to guarantee that the work does not infringe the copyright of others. The author must also agree to indemnify and reimburse the publisher for expenses incurred if a claim is made that the author has infringed a copyrighted work and the publisher is sued. The obligation to indemnify normally exists whether the claim is frivolous of not. The contract also requires the author to obtain permission for uses of works that go beyond fair use and to supply copies of each permission to the publisher.

A Brief Review of Copyright Law

(347) As soon as the original expression of a creator is fixed in any medium, it is protected by copyright. Covered by copyright are literary works; musical works; dramatic works; pantomimes and choreographic works; pictorial, graphic, and sculptural works; motion pictures and other audiovisual works; sound recordings; and architectural works (sec. 102a). Copyright protection does not extend to facts, ideas, concepts, procedures, and so on (sec. 102b) or to works of the government of the United States (sec. 105).
(348) The most relevant to the creation of a scholarly edition is section 107, which addresses fair use.
(348-349) In general, for works published from 1978 on, copyright protection now lasts for the life of the author plus seventy years (sec. 302a). . . . Copyright term has expired for works published in and before 1922, which means they are now in the public domain.
(349) The Digital Millennium Copyright Act (DMCA) allows copyright owners to control access to their works through the use of such technological protection measures as passwords and encryption (sec. 1201).

Implications for Scholarly Editions
(350) For works published between 1923 and 1978, the rules become more complex. But many more of these works are in the public domain than one might think.
(350) For works published outside the United States, obtaining legal advice may be a wise investment.

Case and Green: concern raised that extensive monitoring capabilities will make it harder for scholars to secure permissions from publishers; imagine when reach goes into real time, perspectical virtual worlds.

(351) Because copyright owners now use technological means to search the Web to find unauthorized uses of their content, a publisher may be unwilling to expose itself to the cost of responding to potential claims, whether it believes the use is fair use or not.

Case and Green: SCO UNIX case is a prime example of difference between transfer of ownership of object versus copyright.

(352) Copyright law (sec. 202) provides for the distinction between ownership of a material object and ownership of its copyright. The transfer of ownership of an object does not convey ownership of the copyright unless the copyright is explicitly transferred as a part of the agreement.

Editors and Contracts
(353) As soon as this editorial work or added content is “fixed in a tangible medium,” it is protected by copyright.
(353) Publishers of scholarly works tend to request the exclusive and complete transfer of copyright, but there is no legal reason that an author has to accept this condition.
(353) Contracts are used in turn by publishers in making electronic works available to users, whether to individuals or libraries. . . . Since the courts are unsettled about whether licenses can preempt copyright law, libraries have actively negotiated with publishers to modify licenses to allow fair uses that would support education and research.

Identifying the Copyright Owner

Case and Green: lack of authoritative search for ownership and rights from Library of Congress further complicates transmedia events such as sounds in virtual realities that are generated from copyrighted text via text to speech synthesis.

(354) Because the Library of Congress catalogs do not include entries for assignment or other recorded documents, they cannot be used authoritatively for searches involving the ownership of rights.
Audio. Should permission be required to use audio material, the editor should be aware of the possible need for several layers of permissions.

Seeking Permission
(356) You do not have to get permission in writing, but if you get verbal permission, make sure again that you carefully describe exactly what your use of the material will be and document your conversation with the rights holder.
(356) Seeking permission can be a lengthy and complex process, so consider copyright-related issues early in your project planning.

(359) What are editors, publishers, and librarians to do with the conundrum of preserving for scholars of tomorrow the fluid text of today?

Preserving Traditional Editions
(359) In the analog world, at one level of abstraction, the physical format—the carrier—
is the edition and all that libraries and librarians need to know in order to collect and preserve it. They are concerned, in Kathryn Sutherland's terminology, with the vehicular rather than the incarnational form of the edition (23).

Preserving Editions in a Digital Context

(362) Instability of citation is a critical problem; research and scholarship are based on a fundamental principle of reproducibility.
(362) When planning electronic editions, one should establish standards and working practices that make them interoperable: able to exchange data at some level with other systems.

Preserving Digital Data
(363) The preservation of digital data has two main components: preserving the integrity of the bits and bytes and preserving the information that they represent. . . . As the software that created the information becomes obsolete, the information becomes more and more difficult to access unless it is stored in some future-proof format or reformatted.

Deegan: Fedora project flexible extensible digital object repository architecture proposes new ways of reasoning based on behaviors rather than essential nature; compare to Tanaka-Ishii study of object-oriented programming methodologies.

(364) A new approach to the preservation of complex digital data is being explored by the University of Virginia and Cornell University, together with other academic partners: the Fedora project (flexible extensible digital object repository architecture), one of a number of repository architectures that have been proposed for use in digital libraries. . . . Fedora is of particular interest, because it proposes new ways of reasoning about digital data, based on data objects and their behaviors rather than on the essential nature of the data.
(364) A number of projects are looking at the problems of preserving information on Web sites.
(365) Many of the experiences in the preservation of Web sites offer insights into the preservation of networked editions.

What Are Editors to Do?

Deegan: program and interface least durable parts of electronic editions, however this division of intellectual capital that externalizes program and interface dissolves with inclusion of digitally native, electronic literature as the subject of scholarly editions.

(365-366) We might expand the distinction to a fivefold one: data, metadata, links, program, and interface. The first three contain the intellectual capital in an edition; the last two are (should be?) external. However important the programs used to create and deliver the edition and however important the interface through which it is accessed, scholars must always remember that these parts of any electronic edition are the least durable.

Creating Preservable Assets
(366) One relatively straightforward approach is to produce a fixed edition on some stable medium at regular stages in its life, as is being done for
The Cambridge Edition of the Works of Ben Jonson, described by David Gants.
(367) While this volume deals primarily with electronic textual editions, such is the power of the medium that other media can be included, which all must be created according to standards and stored in a format that is nonproprietary and well supported. Eaves observes that editors of multimedia editions should
double-edit: edit first in discrete, then in integrated media.

Applying Data Standards

Deegan: CCS link of cultural bias in encoding recommends ASCII entity references over direct Unicode; see Case and Gee.

(367) For text, the ASCII standard should always be used, with markup added that is also in ASCII. There has been great progress in the presentation of special characters through the Unicode standard, but it is preferable that characters be encoded as entity references that can be displayed in Unicode than encoded as Unicode itself.
(368) Image data should be captured at the best quality possible to reveal all significant information about the original, then stored in a nonproprietary file format using only lossless compression (if compression is used at all).
(369) The long-term prospects of electronic editions are also affected by the naming conventions used. . . . Work is being done on alternatives to URLs. Uniform resource names (URNs) identify a piece of information independent of its location: if the location changes, the information can still be found. One type of identifier that has been adopted by a number of publishers is the digital object identifier (DOI). DOIs are persistent names that link to some form of redirection.
(369) The problem with using these specifications [Xlink and Xpointer] is that there is more of a learning curve when linking is done through pointing and clicking in various kinds of authoring software. Bur for long-term security, links must be separated from programs.

Unsworth, John, Katherine O’Brien O’Keeffe, and Lou Burnard. Electronic Textual Editing. New York: Modern Language Association of America, 2006. Print.