Notes for Lou Burnard, Katherine O'Brien O'Keefe, and John Unsworth editors Electronic Textual Editing
Key concepts: autopoietic functions, concurrency problem, critical versus non-critical editing, diplomatic transcription, documentary editing, double-editing, linkeme, multiple hierarchies, normalized transcription, readerly discovery, SGML, text, TEI, transclusive flexibility.
Claim in foreword is computer as tool does not fundamentally alter reading or subjectivity. Markup as highly reflexive act, oscillating indeterminacy like self-organizing systems. In line with videogame studies, electronic literature. Many contributors emphasize using non-proprietary formats and open policies for scholarly editions comparable to GPL, and existential imperative to build devices (McGann's theory-as-poiesis). Compare repeatability of analysis to software quality assurance methodologies. Researching social textualities via user logs dovetails nicely with software studies and projects for future digital humanities scholars. Sizable examples of working code to illustrate “the devils bargain with HTML.” What is a text? Ontology matters for noncritical operations, such as transcription, especially if it turns out to be non-nesting, non-hierarchical. Problem of multiple hierarchies as the major challenge to text encoding. I develop comparison to software revision control systems. RCS commits as speech acts? Still problem of non-nesting information of BNF-style grammars to keep theorists busy. Encoding becomes a form of noncritical close reading. Discussion of copyright and contracts has implications for transmedia events such as sounds in virtual realities that are generated from copyrighted text via text to speech synthesis. Division of intellectual capital that externalizes program and interface dissolves with inclusion of digitally native, electronic literature as the subject of scholarly editions. CCS link: cultural bias in encoding recommends ASCII entity references over direct Unicode.
Related theorists: Kirschenbaum, Landow, Lessig, Manovich, McGann, Tanselle.
FOREWORD
G.
THOMAS TANSELLE
(1) Of course, whether e-books are as convenient
to handle and use as the codices that preceded them is not an
irrelevant matter, but convenience is not entirely a function of
developing technology, for what one becomes accustomed to is a
fundamental factor.
(2) An increasingly populated field of
literary study has made scholars acutely aware of what psychologists,
in their different way, have long understood: people may think that
“any book” will do for reading a given work, but all the details
of graphic design, which are likely to vary from printing to printing
(and from e-book to e-book), do affect readers' responses.
Tanselle in Foreword argues computer as tool does not fundamentally alter reading or subjectivity, whereas Manovich, Hayles and others disagree; seems to not consider digitally native electronic texts, only electronic versions of texts originally composed with prior media forms.
(3) But when the excitement leads to the idea that the computer
alters the ontology of texts and makes possible new kinds of reading
and analysis, it has gone too far. The computer is a tool, and tools
are facilitators; they may create strong breaks with the past in the
methods for doing things, but they are at the service of an
overriding continuity, for they do not change the issues that we have
to cope with.
(3) When we create and use an electronic text, we
still have to ponder the mode of existence of the linguistic medium;
we still have to think about the relations among mental, audible, and
visible texts; we still have to consider whether it its meaningful to
pursue authorially intended texts or whether the documentary texts
that survive from the past (perhaps purged of their obvious errors)
are the only ones we should study; we still have to decide how to
present the results of our textual research to other readers.
(4)
The idea that electronic texts encourage a new kind of reading has
also been overstated. . . . Such aids to radial reading can be well
or poorly constructed whether the means of presentation is printed or
electronic.
(4) Surely the richest kind of reading depends on
having at hand, for constant reference, the information that textual
scholars have amassed. The real issue is how best to provide guidance
for readers.
(5) It is a distinct advantage, of course, for
readers to be able to choose the points of entry they wish to use;
but to engage in radial reading effectively, they need the editor's
assistance in the form of comments on the textual history of the
work, organized records of variants in the relevant documentary
texts, and the like. They also need editorially emended texts in
order to see how the mass of evidence has been used in reading by
scholars who have made themselves expert in the textual history of
the work. . . . This point shows, once again, that the advantages of
the electronic form are maximized when one recognizes that the
technology contributes to the process of building on past
accomplishments.
(6) Whether or not we wish to claim an
ontological distinction between ink and pixels, the concept of text
has obviously shifted its meaning between the two sentences. In the
first, a text is a physical thing, an arrangement of words and
punctuation in a particular visible form. In the second, it is the
sequence of words and punctuation itself, an abstraction that can be
given any number of concrete renderings (in the same medium or
different media). . . . The philosophical conundrum as to where a
text resides is exactly the same as it always was.
What is a text? Where does it reside? No room for new media practices? See comment in last paragraph of introduction.
(6) We will be spared some drudgery and inconvenience, but we still must confront the same issues that editors have struggled with for twenty-five hundred years.
ACKNOWLEDGMENTS
NOTE ON THE CD
(9)
The Text Encoding Initiative's Guidelines
for Electronic Text Encoding and Interchange was
first published in April 1994 as two substantial green volumes known
as TEI P3 (P for “public”). In 2001, the Text Encoding Initiative
was reestablished as a membership consortium, jointly hosted by two
United States and two European universities. Its first act was to
sponsor a revision of the TEI guidelines. This edition, known as TEI
P4, was a maintenance release, bring the guidelines up-to-date with
changes in the technical infrastructure—most notably the use of the
W3C's extensible markup language (XML) as its means of expression
rather than the ISO standard, SGML, used by earlier editions. TEI P4
was published in 2002 under the imprint of the University of Virginia
Press and forms the current reference standard.
(9) A first
release of TEI P5 appeared in January 2005; see www.tei-c.org/P5/
for its current status.
(10) In January 2005, the preparation of
TEI P5 moved to a new level with the decision to make the source text
of the new edition of the guidelines available under an open-source
license. For this new edition, it was decided to replace the SGML/XML
DTD version of the TEI scheme with a version that could be expressed
in any of the three schema languages now in wide use: DTD, W3C
Schema, and RelaxNG. The current state of this work is now accessible
to all from the TEI's repository at http://tei.sf.net.
INTRODUCTION
(11)
Ever since the invention of the codex, the long and distinguished
history of textual editing has been intimately involved in the
physique of the book. . . . The scholarly debates over what sort of
editions to produce—whether favoring the textual object, the author
of the text, or the text's reception history—were driven as much by
economics as by ideology. Quite simply, one could not have it
all.
(11-12) The rapid spread of computing facilities and
developments in digital technology in the 1980s and 1990s offered the
possibility of circumventing a number of practical (both physical and
economic) limitations posed by the modern printed codex.
(12)
Coincident with the spread of computing facilities, and with their
adoption as the basic means of communication among academics at all
levels, has been an extraordinary democratization in the production
of textual editions. . . . The democratization of publishing through
access to the Internet has not brought with it, however, a
concomitant broadening in the reliability of such editions. . . . The
challenge is to make available to prospective editors—either to
those approaching the task for the first time or to seasoned veterans
of print—the kinds of information they must have to engage with
electronic textual editing at the level of needed knowledge,
conceptual and practical.
THE COMMITTEE ON SCHOLARLY EDITIONS
THE
TEXT ENCODING INITIATIVE
(14-15)
An international and interdisciplinary standards project, the Text
Encoding Initiative (TEI) was established in 1987 to develop,
maintain, and promulgate hardware- and software-independent methods
for encoding humanities data in electronic form.
(15) The TEI
guidelines, like the CSE guidelines, outline a set of best practices,
but they also embody them in a formal and computable expression,
originally constructed using standard generalized markup language
(SGML) and since the fourth revision of the guidelines (published in
2002) expressible in extensible markup language (XML) as well
(Sperberg-McQueen and Burnard, Guidelines
for Electronic Text Encoding).
(15)
The TEI guidelines today take the form of a 1,300-page reference
manual, documenting and defining some six hundred elements that can
be combined and modified in a variety of ways for particular
purposes. Each such combination can be expressed formally as a kind
of document grammar, technically known as a document type definition
(DTD).
ELECTRONIC
TEXTUAL EDITING
(16)
There is work for a generation or more of textual editors in the
transmission of our cultural heritage from print to electronic media,
but if that work is to be done, then a rising generation of scholars
must receive professional credit for doing it. For that credit to be
given, tenure and promotion committees will need to evaluate work of
this kind. . . . The need for such a volume is immediate: there are
currently few manuals, summer courses, of self-guided tutorials that
would help even trained textual editors transfer their skills from
print to electronic works.
MLA CSE guidelines a goldmine of work for a future generation of humanities scholars.
(17)
First in the volume, we provide a complete revision of the MLA's CSE
Guidelines for Editors of Scholarly Editions. . . . The twenty-six
essays that follow are grouped under two headings: material and
theoretical approaches (“Sources and Orientations”) and actual
practices and procedures.
(19) Anne Mahoney addresses digitization
projects in Greek and Latin inscriptions and discusses the extent to
which it is possible to preserve both information and its
interpretation in such a context.
(19-20) Next, Greg Crane
explains the inner workings of the Perseus
Digital Library,
one of the oldest and largest collections of electronic editions.
Perseus—originally
focused on, and still best known for, editions of classical-era
texts—has for nearly two decades grappled with changes in language
technology. Christian Wittern explains in lucid detail where those
technologies stand today, shows how text encoding is built on
character encoding, and demonstrates the importance to editors of
understanding how character encoding actually works.
(20-21) For
that matter, even if we start the history of electronic scholarly
editions with Father Busa's punch-card Aquinas in 1949, we are not
many decades into developing an understanding of how to make and use
electronic documents in general, let alone electronic scholarly
editions in particular. It took five hundred years to naturalize the
book and a hundred and fifty years to develop the conventions of the
scholarly edition in print. Those schedules reflect the time required
for social, not technological, change, and while the acceleration of
technological change in this case may rush the social evolution of
rhetoric for digital editions of print and manuscript sources, it
will still be generations before the target of this volume stops
moving. Even before that happens, as Matthew
Kirshenbaum has
pointed out, we will soon be grappling with the problem of editing
primary sources that are themselves digital—a problem with entirely
new practical and theoretical dimensions (“Editing”).
GUIDELINES
FOR EDITORS OF SCHOLARLY EDITIONS
COMMITTEE
ON SCHOLARLY EDITIONS, MLA
1.
GUIDELINES FOR EDITOS OF SCHOLARLY EDITIONS
1.1. Principles
(23) The scholarly edition's basic task is to present a reliable text: scholarly editions make clear what they promise and keep their promises. Reliability is established by accuracy, adequacy, appropriateness, consistency, explicitness—accuracy with respect to texts, adequacy and appropriateness with respect to documenting editorial principles and practice, consistency and explicitness with respect to methods.
1.2. Sources and Orientations
1.2.1 Considerations with
Respect to Source Material
1.2.2 The Editor's Theory of a
Text
1.2.3 Medium (or Media) in Which the Edition Will Be
Published
2. GUIDING QUESTIONS FOR VETTERS OF SCHOLARLY EDITIONS
I.
Basic Materials, Procedures, and Conditions
II. Textual Essay
III.
Apparatus and Extratextual Materials
IV. Matters of
Production
(32) 20.0 If the
edition—whether print or electronic—is prepared in electronic
files, are those files encoded in an open, nonproprietary format
(e.g., TEI XML rather than Microsoft
Word or
Word
Perfect)?
V.
Electronic Editions
(34)
27.3 If any software has been uniquely developed for this edition, is
source code for the software available and documented?
3. GLOSSARY OF
TERMS USED IN THE GUIDING QUESTIONS
4. ANNOTED BIBLIOGRAPHY: KEY
WORKS IN THE THEORY OF TEXTUAL EDITING
A SUMMARY OF PRINCIPLES
PART I
SOURCES
AND ORIENTATIONS
CRITICAL
EDITING IN A DIGITAL HORIZON
DINO
BUZZETTI AND JEROME McGANN
CODEX-BASED
SCHOLARSHIP AND CRITICISM
(54)
Scholarly editing is grounded in two procedural models: facsimile
editing of individual documents and critical editing of a set of
related documentary witnesses.
(55) The critical editor's working
premise is that textual transmission involves a series of
translations.
(55) A key device for pursuing such a goal is
stemmatic analysis.
(56) Three important variations on the two
basic approaches to scholarly editing are especially common:
best-text editions, genetic editions, and editions with multiple
versions.
(57) Genetic editing procedures were developed in order
to deal with the dynamic character of an author's manuscript texts.
TEXTUAL
AND EDITORIAL SCHOLARSHIP WITH DIGITAL TOOLS
(58)
For example, one can now design and build scholarly editions that
integrate the functions of the two great editorial models, the
facsimile and the critical edition. . . . In short, digital tools
permit one to conceive of an editorial environment incorporating
materials of many different kinds that might be physically located
anywhere.
(59) Scholars whose work functions within the great
protocols of the codex—one of the most amazing inventions of human
ingenuity—appear to think that the construction of a Web site
fairly defines digital scholarship in the humanities.
MARKING
AND STRUCTURING DIGITAL TEXT REPRESENTATIONS
(60)
Traditional text—printed, scripted, oral—is regularly taken, in
its material instantiations, as self-identical and transparent. It is
taken for what it appears to be: nonvolatile.
(60-61) Any explicit
feature of a text can be conceived as a mark. We may thus say that
digital text is marked by the linear ordering of the string of coded
characters that constitutes it as a data type, for the string shows
explicitly its own linear structure.
(61) When we mark up a text
with TEI or XML code, we are actually marking the pre-existent
bibliographic markup and not the content, which has already been
marked in the bibliographic object.
(61) With the introduction of
declarative markup languages, such as SGML and its humanities
derivative TEI, tags came to be used as “structure markers”
(Joloboff 87).
(62) Text can thus be conceived as an “ordered
hierarchy of content objects” (DeRose, Durand, Mylonas, and Renear
6; this is the OHCO textual thesis). But can textual content be
altogether modeled as a mere set of hierarchically ordered
objects?
(62) In principle, markup must therefore be able to make
evident all implicit and virtual structural features of the text. . .
. Textual structure is not bound, in general, to structural features
of the expression of the text.
(63) In computational terms, it
describes data structures but does not provide a data model or a
semantics for data structures and an algebra that can operate on
their values.
(63) The crucial problem for digital text
representation and processing lies therefore in the ability to find
consistent ways of relating a markup scheme to a
knowledge-representation scheme and to its data model.
(63) In
this approach, the problem to solve consists precisely in relating
the scheme that describes the format of the documents to the scheme
that describes their content. The first would be an XML schema, “a
document that describes the valid format of an XML dataset”
(Stuart), and the second would be a metadata schema such as the
resource description framework (RDF) being developed for the Semantic
Web.
MARKUP
AND THE GENERAL THEORY OF TEXTUALITY
(64)
Diacritical signs are self-describing expressions of this kind, and
markup can be viewed as a sort of diacritical mark. . . . Markup,
therefore, can be seen either as a meta linguistic description of a
textual feature or as a new kind of construction that extends the
expressive power of the object language and provides a visible sign
of some implicit textual content.
(65) Marks of this kind,
viewable either way, behave just as Ludwig Wittgenstein's famous
duck-rabbit picture (Philosophical
Investigations 2.11).
Buzzetti and McGann discuss insufficiency of OHCO thesis for missing structural mobility, assuming meaning embedded in syntactic form, assuming coincidence between syntactic and semantic forms.
(66) The OHCO thesis about the nature of the text is radically insufficient, because it does not recognize structural mobility as an essential property of the textual condition. . . . A digital text representation need not assume that meaning can be fully represented in a syntactic logical form. . . . A formal representation of textual information does not require an absolute coincidence between syntactic and semantic logical form. In this respect, the role of markup can be of paramount importance in bringing their interconnections to the fore.
Buzzetti and McGann see markup as highly reflexive act, oscillating indeterminacy like self-organizing systems; in line with videogame studies, electronic literature.
(67) Diacritical ambiguity, then, enables markup to provide a
suitable type of formal representation for the phenomena of textual
instability. . . . Markup should be conceived, instead, as the
expression of a highly reflexive act, a mapping of text back onto
itself: as soon as a (marked) text is (re)marked, the metamarkings
open themselves to indeterminacy.
(68) The continual oscillation
and interplay between indetermination and determination of the
physical and the informational parts of the text renders its dynamic
instability very similar to the functional behavior of
self-organizing systems. Texts can thus be thought of as a simulation
machine for sense-organizing of an autopoietic kind. Text works as a
self-organizing system inasmuch as its expression, taken as a value,
enacts a sense-defining operation, just as its sense or content,
taken as a value, enacts an expression-defining operation. Text
provides an interpreter with a sort of prosthetic device to perform
autopoietic operations of sense communication and exchange.
(69)
Louis Kauffman's and Francisco Varela's extension of Spencer-Brown's
calculus of indications accounts more specifically for the “dynamic
unfoldment” of self-organizing systems and may therefore be
consistently applied to an adequate description of textual mobility.
FROM TEXT TO WORK: A NEW HORIZON FOR SCHOLARSHIP
Buzzetti and McGann invoke pragmatistic, existential imperative to build devices.
(69)
Scholarly editions are a special, highly sophisticated type of
self-reflexive communication, and the fact is that we now must build
such devices in digital space. This necessity is what Charles Sanders
Peirce would call a “pragmatistic” fact: it defines a kind of
existential (as opposed to a categorical) imperative that scholars
who wish to make these tools must recognize and implement.
(70)
In fact one can transform the social and documentary aspects of a
book into computable code. . . . We were able to build a machine that
organizes for complex study and analysis, for collation and critical
comparison, the entire corpus of Rossetti's documentary materials,
textual as well as pictorial.
Interesting suggestion by Buzzetti and McGann for researching autopoietic functions of social textualities via user logs dovetails nicely with software studies and projects for future digital humanities scholars.
(71) The autopoietic functions of the social text can also be
computationally accessed through user logs. This set of
materials—the use records, or hits, automatically stored by the
computer—has received little attention by scholars who develop
digital tools in the humanities. Formalizing its dynamic structure in
digital terms will allow us to produce an even more complex
simulation of social textualities. Our neglect of this body of
information reflects, I believe, an ingrained commitment to the idea
of the positive text or material document.
(71) We are interested
in documentary evidence precisely because it encodes, however
cryptically at times, the evidence of the agents who were involved in
making and transmitting the document.
(72) Our aim is not to build
a model of one made thing, it is to design a system that can simulate
the system's realizable possibilities—those that are known and
recorded as well as those that have yet to be (re)constructed.
(72)
McKenzie's central idea, that bibliographic objects are social
objects, begs to be realized in digital terms and tools, begs to be
realized by those tools and by the people who make them.
THE CANTERBURY TALES AND
OTHER MEDIEVAL TEXTS
PETER
ROBINSON
(74) In this essay, I use my experience with the
Canterbury
Tales Project,
with which I have been involved since its beginnings in 1989, to
explore five such propositions.
(75) It is a material help too
that all these texts are comprehensively out of copyright (though
rights in manuscript images do need to be negotiated with the
individual owners). Medieval authors are safely dead, and so too are
any relatives who might have any copyright interest.
Proposition 1: The user of computer technology in the making of a particular edition takes place in a particular research context.
Robinson: propositions reached from Canterbury Tales project digital edition: specificity of research context, inclusion of full-text transcription, restoring exhaustive historical criticism, editing and reading altered, adopt open transcription policy.
(77) Until the late 1980s, a few experiments and articles appeared to suggest that a combination of the computer, with its ability to absorb and reorder vast amounts of information, and new methods of analysis begin developed in computer science (in the form of sophisticated relational databases) and in mathematics and in other sciences might be able to make sense of the many millions of pieces of information in a complex collation and provide a historical reconstruction of the development of tradition.
Proposition 2: A digital edition should be based on
full-text transcription of original texts into electronic form, and
this transcription should be based on explicit principles.
(78-79)
We were fortunate indeed in the time a place. In time: the three-year
project began just after the inception of the Text Encoding
Initiative (TEI), and in those three years the first steps were being
made toward electronic publishing, first on CD-ROM and later over the
Internet as the Web began to take shape. In place: Oxford, where the
project began, was intensely involved in the TEI through Hockey and
Lou Burnard. . . . This close link with the TEI became crucial
because of something to which I had not given much thought before:
the need for a stable and rich encoding scheme both to record the
transcripts of the original texts we were to make and to hold the
record of variation created by the collation program.
(79) We have
been able to use readily available commercial SGML/XML software
(first DynaText,
later Anastasia)
to achieve excellent results. . . . Many years on, the gap between
the programs and the transcribers has narrowed but persists (emacs is
still the tool of choice for many).
(79-80) But what is good for
interchange is not necessarily good for capture, where an efficient
and focused system is required for the transcribers. Nor may it be
good for programming, as attempted by the XSLT (extensible stylesheet
language transformation) and similar initiatives.
(80) Encoding
the words and letters in a printed text can be quite simple: just
establish the characters used by the printer and allocate a computer
sign to each. . . . But in a manuscript, where the range of marks
that can be made by a scribe is limitless, matters are not so simple.
The transcriber must decide which of these marks is meaningful and
then which of the range of signs available on a given computer system
best represents that meaning.
(82) The collation tool we had
developed by then had the ability to regularize as we collated,
thereby shifting the responsibility for deciding exactly what a
variant was to the editor from the transcriber. We decided therefore
to adopt a graphemic system: as transcribers, we would represent
individual spellings but not (normally) the individual letter shapes.
We also included in our transcriptions sets of markers to represent
nonlinguistic features, such as varying heights of initial capitals,
different kinds of scribal emphasis, and the like: what Jerome McGann
calls bibliographic codes (Textual
Condition52).
Proposition 3: The use of computer-assisted analytic methods
may restore historical criticism of large textual traditions as a
central aim for scholarly editors.
(84)
So we formed an alliance: Cambridge would purchase DynaText,
and we would work out how to use it both to publish our own CD-ROMs
with Cambridge and help Cambridge publish other CD-ROMs.
(84)
Accordingly, in 1996 the first of our CD-ROMs was published: the Wife
of Bath's Prologue on CD-ROM (Wife).
It included all transcripts of the fifty-eight witnesses, images of
all pages of the text in these manuscripts, the spelling databases we
had developed as a by-product of the collation, collation in both
regularized spelling and original spelling forms, and various
descriptive and discursive materials. It presents a mass of materials
such as an editor might use in the course of preparing an
edition.
(84-85) There is an obvious analogy between the processes
of copying and descent we might hypothesize for manuscript copying
and those of replication and evolution underlying biological
sciences: both appear instances of “descent with modification,”
to use Darwin's phrase. . . . we were able to show that phylogenetic
software developed
for biological sciences gave useful results when applied to
manuscript traditions.
(85) As to the first question, our
experiments suggest that such programs may indeed produce
representations of relations among manuscripts that correspond with
historical sequences of copying.
(85) As to the second question
(Are such reconstructions useful for editors?), where these
techniques show a group of manuscripts as apparently descended from a
single exemplar in the tradition, one should be able to deduce just
what readings were introduced by the exemplar.
Proposition 4: The new technology has the power to alter both how editors edit and how readers read.
Robinson: inspires comparing repeatability of analysis to software quality assurance methodologies exemplified by development of Anastasia software tool for Javascript rendition of XML encoded files.
(86)
Because of the electronic publication mode, we were also able to
include the actual software and all the data we used for the
analysis, with exercises that allowed readers to run the software
themselves, so that they might confirm, extend, or deny the
hypotheses suggested in the article.
(87) We developed a new
software tool, Anastasia,
specifically to offer a bridge between the XML, into which we now
decanted all our files, and the new JavaScript-and-HTML interfaces
now appearing. Our first publication to make use of this combination
was our third CD-ROM, the Hengwrt
Chaucer Digital Facsimile (Stubbs).
For this we had a new aspiration: it should be beautiful. . . . For
us, this was an opportunity to give practical expression to what was
becoming a core belief of the project, that we could use the new
tools and our materials to change the way people experience a text.
Proposition 5: Editorial projects generating substantial
quantities of transcribed text in electronic form should adopt, from
the beginning, an open transcription policy.
(88-89)
The Canterbury
Tales Project
is not a legal entity and so cannot own anything, including
copyright. Copyright in the transcripts varies, belonging either to
the individuals who did them or to the institutions in which those
individuals were based. . . . We would like others to take them,
reuse them, elaborate on them (e.g., they could include the graphetic
information we rejected), and republish them: exactly the means of
scholarship promoted by the fluid electronic medium. But if future
scholars must go through a process of increasingly lengthy,
multisided negotiation, then the transcripts will become unusable,
walled from the world by legal argument.
Robinson: inspires comparing open transcription policy to four freedoms enshrined in GPL.
(89) The answer to this problem, we can now see, is an open
transcription policy, modeled on the copyright licensing arrangements
developed by the Open Software Foundation (now part of the Open
Group). . . . What it does mean is that the copyright holders assert
that the transcripts may be freely downloaded, used, altered, and
republished subject to certain conditions (basically, republication
must be under the same conditions, all files must retain a notice
with them to this effect, and permission must still be obtained for
any paid-for-publication).
(89) This massive burst of activity
across all the traditional domains of medieval philology puts one in
mind of the grand editorial projects of the nineteenth century.
DOCUMENTARY
EDITING
BOB
ROSENBERG
Rosenberg: recounts development of a major scholarly editing project of Edison Papers that includes its technological evolution.
(92) The Edison Papers is working to combine images and text, and I
hope that a careful examination of some avenues and lessons learned
in that process will be helpful to anyone fortunate enough and bold
enough to undertake such a task.
(93) A second unusual aspect of
the Edison corpus [after its size] is the central importance of
drawings and even physical artifacts to an understanding of its
subject's work, which is a direct consequence of Edison's being an
inventor and fresh territory for documentary editing.
DATABASE
(93)
It had been impressed on the project organizers that the only way to
control a collection of this size was with an electronic database.
Fortunately, the Joseph Henry Papers Project had already started
blazing a trail into that mysterious territory. Using that experience
and knowledge as a foundation, the Edison Papers created a database
that would prove two decades later to be the heart of their
electronic edition.
(93) The first incarnation of the database
lived on a university mainframe and was written by a hired programmer
in Fortran 77. The main table had twenty-four data fields.
IMAGES
(96)
Increasing the bit depth of the images allowed us to scan them at a
relatively modest resolution of 200 dpi.
(96) When it was done, we
had nearly 1,500 CDs holding a terabyte of data.
(97) With the
help of an outside programmer (and a 21-inch screen), we created an
interface that displayed successive digitized images on one side and
database information on the other.
(97) The result is an online
image edition that allows the user to sample or assemble the
documents in a number of ways—name, date, document type, editorial
organization—and to view as a group documents scattered across many
reels of microfilm.
TEXT
(98)
But there was no existing symbol for an artifact, nor was there one
for technical notes or notebook entries, both of which we had in
abundance. So we created new ones: M (model) for physical objects and
X for technical materials. . . . What stumped us was the question of
authorship raised by such documents. . . . Finally we cut the Gordian
knot and declared that documents of type X had indeterminate authors,
even when Edison wrote them. This solution turned out to reflect the
way work was pursued as well as the way many of Edison's coworkers
felt about the work. They realized that they were active
participants, but they also recognized that when Edison was not in
the laboratory, work slowed after a couple of days, and that in fact
the world would not have existed without him to drive it.
(99) As
Edison's designs stretch the notion of artifact to include his
electric central stations and the Ogdensburg ore-millling plant of
the 1890s, modes of presentation will doubtless adapt to them.
(99)
It was planned from the beginning of the electronic edition that the
text of the print volumes would be included, marked up with SGML
(later XML) to take full advantage of the capabilities of live
digital text.
(100) The work done by David Chesnutt, Michael
Sperberg-McQueen, and Susan Hockey for the Model Editions Partnership
(MEP) was immensely helpful, as was the DTD they developed.
(100-101)
We had the advantage of long familiarity with our word processor,
which allowed us to write fairly complex macros that greatly
simplified much of the tagging.
(101) Embedding editorial material
proved challenging. The first big decision concerned the
index.
(102-103) The other complex editorial decision involved
references and is still very much in process. . . . What in the book
are haphazard strings of frame numbers must become a new type of
online document, an artifice that allows the user to see the relevant
notebook pages as the assemblage we intend. Just as we made
notebooks, account books, and scrapbooks browsable by creating a new
data table, these compound references will be a creation of the
database.
THE
POEM AND THE NETWORK: EDITING POETRY ELECTRONICALLY
NEIL
FRAISTAT AND STEVEN JONES
(105) The tasks of learning how to
encode, how to adopt or create a DTD (document type definition)
sufficiently complex to account for all the poems and manuscripts
that will be a part of the edition, how to imagine the overall
editorial environment the edition will provide, how to ensure the
stability and portability of the edition over time, and how to make
deliverable over the Web (if desired) the finished edition can be
daunting.
(106) All scholarly editing requires the editor to pay
attention to texts in more then one way, to what Jerome McGann has
called the “concurrent structures” that divide the editor's
attention between, on the one hand, bibliographic codes for design
and presentation and, on the other hand, linguistic codes for
structural and semantic communication (“Editing” 90).
(106)
Because poetry, with its enhanced self-consciousness of the physique
of texts, expresses itself inextricably through particular
interfaces, any editor of poetic texts in the digital medium must be
centrally concerned with interface, with matters of textual display
and appearance. . . . It seems to us as well that any serious editor
of electronic texts must pay attention to an even wider field for
such questions, looking outward to the “contextual” relation of
multiple individual texts and other materials on the Net as a whole
and within hyperlinked clusters, paying attention to the poem and
the
network.
Fraistat and Jones: give sizable examples of working code to illustrate the devils bargain with HTML as well as creative layout of text and critical apparatus.
(107)
What is at stake in the devil's bargain with HTML is perhaps best
illustrated in one of our very early texts, Shelley's broadside
ballad of 1812, “The Devil's Walk.” . . . The pragmatic
limitations of HTML markup are clear, here, to anyone with an
elementary knowledge of encoding, including the then-necessary but
inelegant use of the nonbreaking space tag (“ ”) to
create indentation.
(108) One's ability in HTML to divide a screen
window into separate frames allowed us to think creatively about how
to display the textual apparatus of the “The Devil's Walk”
edition in relation to the text proper.
(109) Along with the Web
design community, Romantic
Circles has
since moved away from the use of frames wherever possible, but they
served a purpose for a time in the wide-area protocols of the Web,
and they illustrate the kinds of provisional editorial solutions the
larger context and intrastructure of the network sometimes require.
Fraistat and Jones: TEI for encoding poetic text at level of structure, describing in ordered hierarchy.
(111) If we wish to encode a poetic text at the level of its
structure, to describe (not format) its components—stanzas, parts
of stanzas, lines, and so on, for search, retrieval, analysis, and
recombination by a computer—we must turn to SGML proper and the
guidelines developed by the Text Encoding Initiative (TEI). . . . It
now seems likely that both the HTML 3.0 and SGML (TEI Lite) versions
of “The Devil's Walk” will in the near future need to be made
available in XML (or the Web-ready standard it has created,
XHTML).
(112) By nesting multiple sets of tags of this sort, it
becomes possible logically to mark the portions of a stanza—octet,
sestet, quatrian, couplet—such that software recognizing the
document type could parse, search, and manipulate the text in complex
ways. To put it in computer terms, we focus on the text's content
objects as they can be described in an ordered hierarchy.
(113)
All this data and metadata will be marked in the text itself, not in
a separate file, and will then be carried with the edition in a form
that will survive across various platforms and delivery
systems.
(113) In general, XML now promises to overcome the
crudest form of the binary opposition between structural and display
markup, which is very good news for electronic editions of poetry.
Fraistat and Jones: dynamic collation of Graver and Tetreault hints at single sourcing and RCS features.
(114-115) To represent that multiplicity of versions in meaningful ways, Graver and Tetreault replace the standard apparatus criticus with what they call dynamic collation, a script that allows for comparative viewing of textual cruxes in their original contexts (fig. 2).
Fraistat and Jones: MOOzymandias virtual reality experiment enacts the autopoietic functions of social texts envisioned by Buzzetti and McGann, demonstrating similarities between editing and programming.
(116)
More recently, we have moved beyond the Web page and HTML as such in
MOOzymandias,
an ambitious collaborative experiment in editing that situates
Shelley's sonnet “Ozymandias” in a text-based multiuser
virtual-reality environment, making the edition, its text and
apparatus, more like a game or theatrical space than a letterpress
artifact. MOOzymandias
was
created to attempt what no existing markup scheme can really do well
yet: deal with the multidirectional, spatialized, phenomenological
effects of poetic language—and the multilayered complexity with
which poems mean, in terms of both their presentational and
structural features and in terms of the contextual editorial
environments constructed by every edition through its acts of
annotation and interpretation.
(116) Textual editors should be
among those attempting such important and innovative experiments in
electronic environments. There are, for instance, interesting
possibilities for using 3-D editorial environments to interrelate
text and apparatus, as suggested by Matthew Kirschenbaum (“Lucid
Mapping”). We could even imagine future editions or archives
structured as databases that could be customized to the needs and
interests of individual users: first in response to a user's
electronic registration form indicating those interests, then by the
distribution of relevant information to users based on their behavior
while interacting with the edition or archive, much as Amazon.com
tailors itself to the behavior of customers.
(117) Schreibman's
editor as literary encoder has already at some point learned how to
grapple with a logic native to computer programmers. . . . In time,
it is likely that tools will be developed that allow editors to
produce simple markup by uploading the text and then filling in
fields in an online form. . . . For now, editors planning to use XML
markup would do well to purchase a software editor, such as XmetaL
or
OXygen,
that can facilitate uniform and valid encoding throughout the
edition.
DRAMA
CASE STUDY: THE CAMBRIDGE EDITION OF THE
WORKS OF BEN JONSON
DAVID
GANTS
(123) The initial version of the electronic side will
contain all the print edition in digital form as well as the complete
old-spelling texts and image facsimiles of all the early print and
manuscript witnesses, a full census of those witnesses, life and
court-masque primary archives, performance calendars, a
reconstruction of Jonson's library, and a diverse collection of other
materials that might help us better understand these important works.
Once the basic electronic archive is completed, the developmental
strategy will shift from traditional to innovative, from compiling
and organizing the essential evidence to investigating and analyzing
the complex possible interactions among the various elements. . . .
Hypertext theorists refer to this complex and unpredictable texture
as rhyzomatic,
a term that comes from the tangled root structure beneath a field of
grass, a non-hierarchical mass of ever-growing links between and
among tufts.
(124) Many electronic archives and editions rely on
relatively simple frameworks that structure material according to
rigorous hierarchies branching from a central core. . . . Unlike such
rigid e-text collections, the CEWBJ
seeks
to explore the vision of electronic textuality imagined by Jerome
McGann in his influential “The Rationale of HyperText.”
(124)
The CEWBJ
derives
its core texts from manual and keyboarded transcriptions of the
source witnesses, employing either copies of the early quartos and
folios owned by editors or the UMI Early English Books series of
microfilm facsimiles, along with on-site transcriptions of
manuscripts held in research archives.
EDITING
DRAMA
(125)
Considered structurally, drama consists of spoken language presented
in soliloquial or dialogic form. These speeches are usually organized
into a sequence of scenes, which in the Western tradition can also be
grouped into acts. Typographically the representation of this
structure on the printed page has changed very little since the first
publication of interludes in the early sixteenth century.
(126)
Beyond speech, however, printed drama can also contain a variety of
components that interpret the theatrical circumstances of the work
for the reader—character lists, stage directions and descriptions,
acting notes, and details of real or ideal performance.
(126)
Again, early printed drama provides numerous examples of how printers
learned to use format to distinguish among the textual
components.
(127) Once printers started using roman and italic
fonts instead of black letter in the 1580s, they could differentiate
speech prefix and speech typographically; this differentiation by
font became the model for the next four centuries. By the late
sixteenth or seventeenth century, authors began to bring an awareness
of format and design elements to their work, sometimes providing in
their holograph manuscripts a template for the printed book.
ENCODING
DRAMA
(128)
From the outset the Text Encoding Initiative recognized the special
textual and material requirements of theatrical works, including in
its guidelines, a set of encoding strategies designed specifically
for drama.
(128-129) Speech, the component common to almost every
dramatic work throughout history, is encoded with the <sp>
element. . . . Application of the constant who
attribute
allows to interrogate the linguistic aspects of a character across an
entire play or, in play cycles, across multiple works.
(129) The
contents of a speech usually consist of prose or verse text in a
variety of arrangements; a prose paragraph is encoded as <p>,
while a line of verse appears within the <l>
tag.
Furthermore, verse organized in stanzas, verse paragraphs, or other
poetic structures can be encoded in linegroup <lg>
tags.
(131)
The <stage>
tag
is used to mark the nonverbal stage directions included in a piece of
dramatic text and employs the type
attribute,
while the <move>
tag
signals the actual movement of a character or characters on, off, or
around the stage.
(132) While Jonson's plays employ a vertical
hierarchy, his masques and entertainments are much more horizontally
structured, consisting of a mixture of speeches, songs, dances, and
prose commentary.
Gants: detailed examples of tag usable and working code examples for encoding drama, fleshing out problem of multiple hierarchies as the major challenge to text encoding.
(134-135)
Each of the above examples presents a textual unit organized in a
fairly hierarchical fashion, an arrangement ideal for the structural
nesting principle at the hear of XML's design. But in practice
literary works rarely conform to vertical hierarchies for very long,
instead evolving sophisticated linguistic patterns that overlap and
overlay in complex ways. The TEI guidelines offer a number of
solutions to the problem
of multiple hierarchies—for
example, using a lattice of pointers and targets or linking elements
with location ladders—although none are completely satisfactory.
When dealing with performance works in which intersecting structures
are part of the fabric of the text, marking the individual pieces in
an aggregate fashion and employing the <join>
element
to coordinate them all has proved especially useful. . . . Each
section of the letter is assigned a unique identifier from a1
through
a10,
and any program designed to process the text can reconstruct the
letter as a single unit by using the information in the <join>
element.
(135-136)
A strategy similar to <join>
is
used when representing simultaneous speech and action in a stage
play. Each component receives a unique id
and
the overlapping relation is declared with the corresp
attribute
to the <stage>
element.
. . . The corresp
attribute
provides the processing instructions needed to reconstruct the
performance circumstances in whatever format is required.
THE
WOMEN WRITERS PROJECT: A DIGITAL ANTHOLOGY
JULIA
FLANDERS
(138) To manage the problem of sheer scale, the
anthology, like the canon, exercises a strategic simplification. But
with the digital anthology this same strategic purchase can be
achieved not through exclusion and brevity but through the
intelligence of the data itself, which can enable the reader to
discover the thematic subcollections within a larger assembly of
texts.
Flanders: consider this notion of readerly discovery promoted by Flanders for software and critical code studies.
(139) This emphasis on readerly discovery is part of a crucial shift
that has shaped the digital collection and its editorial
assumptions.
(140) If one result of these developments has been a
tendency to view a digital collection in the spirit of an archive—as
a body of source material on which may be built a superstructure of
metadata, retrieval and analysis tools, and editorial decisions—the
corollary has been an almost ironic interest in the materiality of
the text.
THE WWP CASE STUDY: GENERAL POINTS
(141)
Although it might seem absurd to imagine such a text substituting for
a visit to the physical archive, our goal was to represent all the
linguistic detail that a view of the original would provide and to
capture all the document's contents, even where they were almost
unconnected with the main work (e.g., advertisements).
(141-142)
In transcribing the text, we preserve the readings of the original
text, whether or not they seem correct, explicable, or intended by
the author or printer. Our premise here is, first, that errors may be
significant, whatever their source: they are part of the information
that circulated to readers when the text was first published and are
part of the evidence that literary researchers may wish to view.
Second, in many cases (particularly in earlier texts) it may be
difficult to say with confidence that a given reading is an
error.
(142) We treat authors' specific intentions with respect to
literary meaning as not only largely unknowable but also beside the
point: what we wish to represent is a cultural document, a piece of
historical currency whose modern readers may or may not find in it
insight into the author's mind.
(143) Most important, the header
for each file includes identification of each participant—author,
editor, publisher, printer, and potentially many others—together
with the possibility of demographic information on each.
(143) By
using a vocabulary for describing genre and textual structure that
locates the particular instance within a larger framework, we not
only allow for comparisons across the collection but also potentially
between this collection and others similarly prepared.
TEXT REPRESENTATION
(144)
As we currently present it, the text is displayed in a manner that
preserves the most significant details of its original formatting. .
. . Very shortly we will also be able to offer a display that shows
the original readings and offers the ability to switch between
views.
(144) Most significant, we do not capture any of the
graphic features of the text such as illustrations and ornaments. Our
transcription includes placeholders for such features, and for
figures (images with representational content) we encode a detailed
description of the illustration and a transcription of any words it
may contain.
(145) Because we currently do not represent more than
one copy of a given text, we do not have any apparatus representing
textual variants. But in some texts we do need to represent
manuscript deletions and revisions.
WWP IN THE CONTEXT OF OTHER PROJECTS
(146)
Providing page images is not practicable for us, but our full-text
transcriptions and metadata are encoded in XML following the TEI
guidelines with a degree of detail that is unusual (perhaps even
unmatched) among projects of this sort. We are also unusual, though
not unique, in providing a detailed account of our editorial and
transcriptional methods to the reader as part of our site
documentation.
(146) The WWP prefers to capture any emendation
using XML encoding rather than make silent alterations; as a result
our approach may lend itself more than others to offering readers
alternative versions of the text (concerning the treatment of details
like typographic errors or abbreviations).
PRACTICAL PROCEDURES
(147)
Consistency of encoding is the most difficult to achieve,
particularly with a complex system like the TEI guidelines. . . .
Like most digitization projects, we rely first of all on careful and
extensive documentation that our encoders use both during training
and as a reference while they are transcribing.
(147) The WWP's
example illustrates a few trade-offs that are particularly
significant in the transition to digital editing. . . . By capturing
the text so as to represent its variability as a data structure, we
are able to create a distinct editorial space that stands apart from
the source transcription and from any final editorial result. This
space is accessible to us as editors—it is where the editing proper
can occur—but it is also accessible to readers, enabling them to
inspect the decisions that have been made and choose different
strategies if they wish.
AUTHORIAL TRANSLATION: SAMUEL BECKETT'S STIRRINGS
STILL / SOUBRESAUTS
DIRK
VAN HULLE
(150) This long and complex genesis with more than
twenty versions makes Stirrings
Still a
particularly interesting example to discuss the scholarly editing of
bilingual writings.
EDITORIAL
PRINCIPLES
(152)
While it may seem remarkable that an edition advises its own
reproduction, the idea of a working copy is essential and underlies
the interface design of a digital equivalent of the face-to-face
representation, to be consulted at all times.
(153) Beckett's act
of self-translation has the paradoxical effect of fixing a text by
reproducing it in another language. . . . Since he did not
necessarily stick to one version to make his translations, the idea
of an original or source text is problematized and cannot serve as a
general principle to choose the base text.
(154) It is remarkable
that neither the Guardian
publication
nor the text in the Beckett
Shorts (vol.
11) published by John Calder mentions the dedication “For Barney
Rosset.” It was Beckett's concern for Rosset's situation that
brought about the publication.
(154) As a consequence, the choice
of the limited edition “for Barney Rosset” as the base text is
inspired primarily by this social circumstance: not because it is
deluxe but because it was meant to help a friend. This case shows
that authorial intention and social orientation are not mutually
exclusive.
METHODS OF TEXT REPRESENTATION
Van Hulle: argues transclusive flexibility afforded by not only digital format but nonproprietary format so that it can be machine processed in new ways.
(155) The transcription of the documents in Reading is encoded in TEI-compliant XML. The advantage of this nonproprietary format is the resulting transclusive flexibility of the textual material. Depending on the user's focus, the draft material can be rearranged in several ways: (1) in a documentary approach, based on the catalog numbers; (2) in chronological order; (3) by language; (4) with a focus on translation; (5) in retrograde direction, starting from the published texts.
Van Hulle: proposes Vanhoutte linkable unit linkeme a basic concept of electronic texts (see if Landow covers).
(156) Every paragraph in the reading text can be linked to and compared with other versions of it. Vanhoutte has called this linkable unit a linkeme, “the smallest unit of linking in a given paradigm” (“Linkemic Approach”).
FORM
OF TEXTUAL APPARATUS
(157)
Traditionally the notion of variants applies to variation either
between copies of an ancient or medieval document by scribes or
between different editions of the same work. When dealing with modern
texts, a distinction must be made between transmission variants and
genetic (or composition) variants. The edition of a bilingual work
requires an extra category of translation variants.
(158) The
edition offers users the chance to adapt the size of the textual unit
they wish to compare (large, medium, small)--i.e., the unit of the
section <div>,
the paragraph <p>,
or the sentence <seq>,
which is already a refined form of versioning. But it is possible to
go further and make the edition into a critical genetic edition,
where the editor indicates the genetic variants explicitly.
Van Hulle: discusses versions and variants comparable to source control systems and versioning in word processors to deal with self-generative, algorithmic character of traditional text (McGann).
(158)
Except for the first extant version, a previous version can always
serve as a temporary invariant against which the genetic variants can
be measured, even if the writing was eventually aborted and never
published.
(160) Beckett was well aware of what McGann calls “the
algorithmic
character of traditional text” (Radiant
Textuality 151):
text generates text, and for Beckett translation played a crucial
role in the exploitation of this self-generative power. Authorial
translations give evidence of an enhanced textual awareness. As a
consequence, their textual examination and scholarly editing are a
crucial part of their critical interpretation.
PROSE
FICTION AND MODERN MANUSCRIPTS: LIMITATIONS AND POSSIBILITIES OF TEXT
ENCODING FOR ELECTRONIC EDITIONS
EDWARD
VANHOUTTE
(162) To allow a functional debate on editing and
editions in the electronic paradigm, editors should provide an
explicit definition of an electronic edition as well as the kind of
scholarly edition they are presenting in electronic form. . . . To
avoid confusion among different meanings and types of edition, I
sketch out my definition of an electronic scholarly edition in the
first section of this essay and formulate six requirements that
editors could embrace to ensure that their edition is treated as
such.
DEFINITION
AND AIMS OF AN ELECTRONIC EDITION
(163)
By electronic edition, I mean an edition (1) that is the immediate
result or some kind of spin-off product from textual scholarship; (2)
that is intended for a specific audience and designed according to
project-specific purposes; (3) that represents at least one version
of the text or the work; (4) that has been processed from a
platform-independent and nonproprietary basis, that is, it can both
be stored for archival purposes and also be made available for
further research (Open Source Policy); (5) whose creation is
documented as part of the edition; and (6) whose editorial status is
explicitly articulated.
(163) What Tanselle and Shillingsburg here
seem to overlook is that the practice of creating an edition with the
use of text encoding calls for explicit ontologies and theories of
the text that do generate new sets of theoretical issues.
A
CASE STUDY
Editorial Principles and Markup
(165)
The aim of the [De
teleurgang van den Waterhoek]
editorial project was to explore different ways to deal with textual
instability, textual variation, the genetic reconstruction of the
writing process, and the constitution of a critically restored
reading text.
(166-167) For sociological reasons, the edition
presents two critically edited texts: the versions of the first and
second print editions. . . . In constituting these texts, we applied
the principles of the German (authorial) editorial tradition, which
allows only justified corrections of manifest mistakes. In these two
critical texts, the emendations were documented by the use of the
<corr>
element,
containing a correction, and a sic
attribute,
whose value documents the original reading.
A
Linkemic Approach to Textual Variation
(167)
In order for users of the edition to be able to evaluate what they
see, the facsimiles are accompanied by a full account of the imaging
procedure, including the documentation on the software and hardware
(and settings) used in the project, which I believe is an essential
requirement.
Vanhoutte: addresses new aspects of documenting how the edition was created, such as linkeme methodology, and new ways of reading provided by automagic of sed and awk, tracing cultural boundaries between digital humanities scholarship and IT, which are foregrounded by emphasizing noncritical operations.
(168) Instead of linking the orientation text to a variorum apparatus, the editors opted for what I have called a linkemic approach to textual variation. I define a linkeme as the smallest unit of linking in a given paradigm. This unit can be structural (word, verse, sentence, stanza, etc.) or semantic. In the glossary provided with the orientation text, the linkeme is of a semantic class that can be defined as the unit of language that needs explanation. . . . The architecture was automagically generated from the digital archive by a suit of sed and awk scripts. The linkemic approach provides the user with enough contextual information to study the genetic history of the text, and it introduces new ways of reading the edition.
MODERN MANUSCRIPTS
Vanhoutte: dossier genetique resorts to internal creative process as internal monologue of the editor, and thus a form of speech, points to attempt to reconstruct final software product from long history of revisions and contested negotiations.
(168-169) Despite its strengths, this practice is problematic for a genetic edition based on modern manuscript material. . . . As an alternative I suggest that further research on a methodology and practice of noncritical editing or transcription of modern manuscript material may result in markup strategies that can be applied to the constitution, reading, and analysis of a so-called dossier genetique. My approach to the manuscript as a filtered materialization of an internal creative process, one that is comparable with the process of internal monologue or dialogue and that thus can be considered a form of speech, might be helpful in this respect.
Observation
1: (Non)Critical Editing
(169)
The essential, difficult, and time-consuming step of the
transcription of primary textual sources is not explicitly mentioned
in this outline.
(170) Electronic noncritical editing is concerned
with the twofold transformation from one format into another: first
the transformation from the text of a physical document to the
transcription of that text; second the transformation from one
medium, the manuscript, to another, the machine-readable
transcription.
(170) The reason for the neglect of noncritical
editing in the theory and practice of textual criticism, however, is
frequently the lack of a satisfactory ontology of the text on which a
methodology of noncritical editing can be modeled.
(170) So the
chapter that seemingly deals with noncritical editing in the TEI
guidelines addresses issues that are central in critical editing and
includes in its DTD subset tags to encode them. . . . Whereas they
emphasize the unimportance of noncritical editing in their theories,
the French school of critique
genetique mainly
works with noncritical representations of the documents under study.
Observation 2: Methodology
Vanhoutte: answering what is a text, ontology matters for noncritical operations, such as transcription, especially if it turns out to be non-nesting, non-hierarchical.
(171) Only when a project has a clear argument on the ontology of the text can a methodology for text transcription be developed.
Vanhoutte: argues modern texts often feature non-nesting problems from time and overlapping hierarchies.
(171)
The transcription of modern manuscript material using TEI proves to
be more problematic because of a least two essential characteristics
of such complex source material: time
and overlapping hierarchies.
(172)
Therefore, the structural unit of a modern manuscript is not the
paragraph, page, or chapter but the temporal unit of writing. These
units form a complex network that often is not bound to the
chronology of the page.
Vanhoutte: explores need for better ways to handle temporal elements via spatialization in digital data (Castells), and the choice to use only digital facsimiles acknowledges the limits of TEI, and is punting; multiple versions model of text lends itself to encoding via revision control system as well as TEI.
(172)
The current inability to encode these temporal and genetic features
of the manuscript and the overlapping hierarchies with a single,
elegant encoding scheme forces an editor to make choices that result
in impoverished and partial representations of the complex
documentary source. . . . Therefore, in the electronic edition of De
teleurgang van den Waterhoek,
we opted to represent the complex documentary sources by means of
digital facsimiles only, preserving in that way the genetic context
of the author's dynamic writing process.
(173)
This fear of testing existing transcription systems with modern
manuscript material of a complicated nature in several projects may
signal the fact that a coherent system or methodology for the
transcription of modern material still must be developed and tested
and that an ontology of the text must be agreed on.
Genetic Criticism—Critique Genetique
Vanhoutte: three categories of genetic criticism are transversal, horizontal, vertical.
(174) Therefore, critique genetique does not aim to reconstitute the optimal text of a work and is interested not in the text but in the dynamic writing process, which can be reconstructed by close study of the extant drafts, notebooks, and so on. . . . Rather than produce editions, the geneticiens put together a dossier genetique by localizing and dating, ordering, deciphering, and transcribing all pre-text witnesses. Only then can they read and interpret the dossier genetique. But the publication of genetic editions is still possible.
Putting Time Back in Manuscripts
(175)
These four complexities [process, scriptorial pauses, nonverbal
elements, sub-chronological segmentation] are exactly what the TEI
guidelines consider “distinctive features of speech.”
(176) If
we consider any holograph witness as a filtered materialization of an
internal creative process (thinking) that can be roughly compared to
an internal dialogue between the author and the biographical person,
we may have a basis on which to build a methodology for the
transcription of modern manuscript material. By combining the TEI DTD
subsets for the transcription of primary sources, the encoding of the
critical apparatus, and the transcription of speech, we could try to
transcribe a manuscript and analyze it with tools for the
manipulation of corpora of spoken language. It is interesting in this
respect to observe how critique
genetique describes
authorial interventions like deletions, additions, Sofortkorrektur
or
currente
calamo,
substitutions, and displacements in terms of material or intellectual
gestures, as if they were kinesic (nonverbal, nonlexical)
phenomena.
(176) This approach does not do away with the essential
problem of non-nesting information, which is an inescapable fact of
textual life and even results from a one-way analysis.
Vanhoutte: even considering RCS commits as speech acts, still problem of non-nesting information of BNF-style grammars to keep theorists busy; encoding becomes a form of noncritical close reading, for example Greg Crane describing how the PDLS lookup files mitigate.
(176-177) Creating a noncritical edition-transcription of such a text with the use of encoding is the closest kind of reading one can do. . . . Paradoxically, existent and extant manuscripts generate, by their resistance to current systems of text encoding, new ontologies of the text and new approaches toward that encoding.
PHILOSOPHY CASE STUDY
CLAUS
HUITFELDT
(181) Wittgenstein's
Nachlass:
The Bergen Electronic Edition
was
published at Oxford University Press in 2000. This electronic edition
is the first publication of the Austrian philosopher Ludwig
Wittgenstein's complete philosophical Nachlass.
It contains more than 20,000 searchable pages of of transcription and
a complete color facsimile.
WITTGENSTEIN'S NACHLASS
Huitfeldt: argues Wittgenstein manuscripts provide almost every imaginable complicating variation for textual markup and requiring keen awareness of of nuances of diplomatic reproduction.
(182) Like many modern manuscripts, Wittgenstein's writings contain
deletions, overwritings, interlinear insertions, marginal remarks and
annotations, substitutions, counterpositions, shorthand
abbreviations, as well as orthographic errors and slips of the pen. .
. . Moreover, Wittgenstein had his own peculiar editorial
conventions, such as an elaborate system of section marks,
cross-outs, cross-references, marginal marks and lines, and various
distinctive types of underlining.
(182) These inter- and
interatextual relations, although complicated and by no means fully
known, are of interest to scholars studying the development of
Wittgenstein's thought.
WHY A DOCUMENTARY EDITION?
CONTENTS
OF THE EDITION
(183)
Wittgenstein's
Nachlass:
The Bergen Electronic Edition
has
three main components: a facsimile, a diplomatic transcription, and a
normalized transcription, each providing an interrelated but
independent view of the Nachlass.
(183)
The diplomatic
version record
faithfully not only every letter and word but also details relating
to the original appearance of the text. One might say it acknowledges
that our understanding of the text derives in no small part from the
visual appearance of material on the page.
(183) The normalized
version,
on the other hand, presents the text in its thematic and semantic
aspect.
SOME KEY NUMBERS
Huitfeldt: Wittgenstein Nachlass a forty man-year project, like a modern videogame or other software application, exceeds the capability of any single individual to produce; see Hayles on collaborative aspects of electronic literature.
(186) The Wittgenstein Archives at the University of Bergen spent altogether forty man-years (including text transcription and editing, management, administration, systems development and maintenance, and all other tasks related to the project), to give an average throughput of two pages per person per day, which is high compared with other editorial projects.
AIMS OF THE EDITION
PRIMARY
FORMAT
(187)
Be we decided not to use SGML for this project. Instead, a special
code syntax was developed for the Wittgenstein Archives, and software
that allowed for flexible conversion to other formats was developed.
This syntax and software were called a multielement code system
(MECS).
CONSISTENCY
OF ENCODING
(187)
That the TEI guidelines provide various alternative mechanisms for
the encoding of many (or even most) textual phenomena is one of the
strengths of the guidelines and one of the reasons why they are found
applicable to a large number of widely different projects involved in
text encoding. At the same time, their openness and flexibility
create the danger of inconsistency.
(188) For example,
abbreviations may be encoded in basically two different ways
according to the TEI guidelines.
(189) The Wittgenstein Archives
decided to make a distinction between standard and nonstandard
abbreviations and to represent both as element content.
CONSISTENCY OF EDITING AND INTERPRETATION
BASIC
REQUIREMENTS
(190)
One might say that the aim of a diplomatic representation is to get
every letter of the original right, whereas the aim of a normalized
representation is to get every word and every reading right.
(190)
The Wittgenstein Archives decided to use a less strict definition: a
diplomatic reproduction should reproduce the original, grapheme by
grapheme; contain indication of indentation and the relative spatial
positioning of text elements on the page; and include information
about deletion, interlinear insertion, and a number of different
kinds of underlining. It was not considered necessary to indicate
every line break or allograph variation.
FORMALIZATION AND OPERATIONALIZATION
TRANSCRIPTION METHOD
REPRESENTATION
AND INTERPRETATION
(194)
An intriguing aspect of editing philosophical texts is that the
editorial work itself exemplifies a number of classical philosophical
problems, such as the relations between representation and
interpretation, the subjective and the objective.
PART
II
PRACTICES AND PROCEDURES
DOCUMENT
MANAGEMENT AND FILE NAMING
GREG
CRANE
Crane: overview of design and programming considerations going into Perseus Digital Library System (PDLS) crosses humanities scholarship into philosophical programming.
(277) The PDLS [Perseus Digital Library System] is significant in that is shows concretely which functions one evolving group of humanists felt were valuable and feasible.
PREPARING A
TEI TEXT FOR THE PDLS
Adding Reference Metadata to a TEI
File
Displaying the Contents of the TEI File
ADDING A NEW FILE TO AN EXISTING COLLECTION
PROCESSING
A DOCUMENT IN THE PDLS
Basic Display and Browsing
(281)
Submit a URL and return an unformatted, well-formed fragment,
allowing a third-party system to format or analyze the XML source. We
condier this feature to be critical, since it makes possible for
multiple systems to apply a wide range of analytic and visualization
techniques to the data that we manage.
Processing
Data Files
(281-282)
Convert
SGML to XML.
. . . Extract
core metadate from the XML file.
. . . Aggregate
the metadata for the PDL.
. . . Generate
the lookup table.
(282)
We can use the lookup tables to support overlapping hierarchies,
addressing a well-known drawback of BNF-style
grammars such
as SGML/XML.
From Citations to Bidirectional Links
Crane: argues persistent linking schemes for print citations exemplify pre-digital solutions, whereas PDLS developed abstract bibliographic object concept from which bidirectional links can be generated.
(283)
Their monodirectional nature makes the Web a directed graph and has
profound implications for its topology. In digital libraries,
however, having greater control over content, we can track links
between documents. More important, long before computers were
invented, many formal publications developed canonical schemes that
gave print citations persistent value: there are various ways to
abbreviate Homer
and
Odyssey,
but Hom.Od.4.132 described the same basic chunk of text in 1880 and
1980.
(283) Persistent citation schemes are fuzzy, and this
fuzziness gives them flexibility. The PDLS uses the concept of an
abstract bibliographic object (ABO) to capture the fact that a single
work may appear in many editions.
(284) ABOs are arguably most
exciting when they allow use to convert individual citations into
bidirectional, many-to-many links. . . . Clearly, this service raises
interesting problems of filtering and customization as annotations
encrust heavily studied canonical texts, but we view such problems as
necessary challenges and the clusters of annotations on existing
texts as opportunities to study the problems of managing annotations.
Crane: invitation to studying problems of managing annotations encrusting heavily studied texts; suggests lexicon can become a commentary.
(284) In an online environment, however, the lexicon can become a commentary: that is, the readers of a text can see the words that the lexicon comments on.
Indexing Textual Links in Perseus
INFORMATION
EXTRACTION: PLACES AND DATES
(285)
Information extraction seeks to automate the process of identifying
people, places, things, and the relations among them.
(286) The
output of automatic parsing is imperfect and will vary from corpus to
corpus, but imperfect scalable analysis of large bodies of data can
reveal significant patterns.
Crane: extracting place and date information to identify events hints at Manovich big data analysis.
(286) Information extraction tends to be domain-specific.
(286-287)
The generalized architecture for text engineering (GATE), developed
at Sheffield, provides one model of how to integrate complementary
information extraction modules and may point the way for digital
library systems that incorporate these functions as a matter of
course.
Extracting Places
(287)
We scan all XML files for possible place-names.
(287) We currently
combine geospatial data the Perseus
collected
for Greco-Roman sites with TGN
[Getty
Thesaurus of Geographic Names]
data.
Extracting
Dates
(287)
We scan all XML files for dates. In practice, dates have proved much
easier to identify than place names.
Using
Places and Dates to Identify Events
(288)
Once lists of places and dates are available, it is possible to look
for associations between the two to identify significant
events.
(289) The structures that we add to our documents reflect
elaborate (if often unconscious) cost-benefit decisions not only
about the interests of our audience but also about how future systems
will shape and enable those interests.
RIGHTS
AND PERMISSIONS IN AN ELECTRONIC EDITION
MARY
CASE AND DAVID GREEN
(346) A contract often requires the author to
transfer all copyrights to the publisher and to guarantee that the
work does not infringe the copyright of others. The author must also
agree to indemnify and reimburse the publisher for expenses incurred
if a claim is made that the author has infringed a copyrighted work
and the publisher is sued. The obligation to indemnify normally
exists whether the claim is frivolous of not. The contract also
requires the author to obtain permission for uses of works that go
beyond fair use and to supply copies of each permission to the
publisher.
COPYRIGHT
AND CONTRACTS
A Brief Review of Copyright Law
(347)
As soon as the original expression of a creator is fixed in any
medium, it is protected by copyright. Covered by copyright are
literary works; musical works; dramatic works; pantomimes and
choreographic works; pictorial, graphic, and sculptural works; motion
pictures and other audiovisual works; sound recordings; and
architectural works (sec. 102a). Copyright protection does not extend
to facts, ideas, concepts, procedures, and so on (sec. 102b) or to
works of the government of the United States (sec. 105).
(348) The
most relevant to the creation of a scholarly edition is section 107,
which addresses fair use.
(348-349) In general, for works
published from 1978 on, copyright protection now lasts for the life
of the author plus seventy years (sec. 302a). . . . Copyright term
has expired for works published in and before 1922, which means they
are now in the public domain.
(349) The Digital Millennium
Copyright Act (DMCA) allows copyright owners to control access to
their works through the use of such technological protection measures
as passwords and encryption (sec. 1201).
Implications
for Scholarly Editions
(350)
For works published between 1923 and 1978, the rules become more
complex. But many more of these works are in the public domain than
one might think.
(350) For works published outside the United
States, obtaining legal advice may be a wise investment.
Case and Green: concern raised that extensive monitoring capabilities will make it harder for scholars to secure permissions from publishers; imagine when reach goes into real time, perspectical virtual worlds.
(351) Because copyright owners now use technological means to search the Web to find unauthorized uses of their content, a publisher may be unwilling to expose itself to the cost of responding to potential claims, whether it believes the use is fair use or not.
Case and Green: SCO UNIX case is a prime example of difference between transfer of ownership of object versus copyright.
(352) Copyright law (sec. 202) provides for the distinction between ownership of a material object and ownership of its copyright. The transfer of ownership of an object does not convey ownership of the copyright unless the copyright is explicitly transferred as a part of the agreement.
Editors
and Contracts
(353)
As soon as this editorial work or added content is “fixed in a
tangible medium,” it is protected by copyright.
(353) Publishers
of scholarly works tend to request the exclusive and complete
transfer of copyright, but there is no legal reason that an author
has to accept this condition.
(353) Contracts are used in turn by
publishers in making electronic works available to users, whether to
individuals or libraries. . . . Since the courts are unsettled about
whether licenses can preempt copyright law, libraries have actively
negotiated with publishers to modify licenses to allow fair uses that
would support education and research.
THE
PERMISSIONS PROCESS
Identifying the Copyright Owner
Case and Green: lack of authoritative search for ownership and rights from Library of Congress further complicates transmedia events such as sounds in virtual realities that are generated from copyrighted text via text to speech synthesis.
(354)
Because the Library of Congress catalogs do not include entries for
assignment or other recorded documents, they cannot be used
authoritatively for searches involving the ownership of rights.
(355)
Audio.
Should permission be required to use audio material, the editor
should be aware of the possible need for several layers of
permissions.
Seeking Permission
(356)
You do not have to get permission in writing, but if you get verbal
permission, make sure again that you carefully describe exactly what
your use of the material will be and document your conversation with
the rights holder.
(356) Seeking permission can be a lengthy and
complex process, so consider copyright-related issues early in your
project planning.
COLLECTION AND PRESERVATION OF AN ELECTRONIC EDITION
MARILYN
DEEGAN
THE PRESERVATION OF EDITIONS
(359)
What are editors, publishers, and librarians to do with the conundrum
of preserving for scholars of tomorrow the fluid text of today?
Preserving Traditional Editions
(359)
In the analog world, at one level of abstraction, the physical
format—the carrier—is
the
edition and all that libraries and librarians need to know in order
to collect and preserve it. They are concerned, in Kathryn
Sutherland's terminology, with the vehicular rather than the
incarnational form of the edition (23).
Preserving Editions in a Digital Context
WHAT
ARE THE ISSUES?
(362)
Instability of citation is a critical problem; research and
scholarship are based on a fundamental principle of
reproducibility.
(362) When planning electronic editions, one
should establish standards and working practices that make them
interoperable: able to exchange data at some level with other
systems.
Preserving
Digital Data
(363)
The preservation of digital data has two main components: preserving
the integrity of the bits and bytes and preserving the information
that they represent. . . . As the software that created the
information becomes obsolete, the information becomes more and more
difficult to access unless it is stored in some future-proof format
or reformatted.
Deegan: Fedora project flexible extensible digital object repository architecture proposes new ways of reasoning based on behaviors rather than essential nature; compare to Tanaka-Ishii study of object-oriented programming methodologies.
(364) A new approach to the preservation of complex digital data is
being explored by the University of Virginia and Cornell University,
together with other academic partners: the Fedora project
(flexible extensible digital object repository architecture), one of
a number of repository architectures that have been proposed for use
in digital libraries. . . . Fedora is of particular interest, because
it proposes new ways of reasoning about digital data, based on data
objects and their behaviors rather than on the essential nature of
the data.
(364) A number of projects are looking at the problems
of preserving information on Web sites.
(365) Many of the
experiences in the preservation of Web sites offer insights into the
preservation of networked editions.
What Are Editors to Do?
Deegan: program and interface least durable parts of electronic editions, however this division of intellectual capital that externalizes program and interface dissolves with inclusion of digitally native, electronic literature as the subject of scholarly editions.
(365-366) We might expand the distinction to a fivefold one: data, metadata, links, program, and interface. The first three contain the intellectual capital in an edition; the last two are (should be?) external. However important the programs used to create and deliver the edition and however important the interface through which it is accessed, scholars must always remember that these parts of any electronic edition are the least durable.
Creating Preservable Assets
(366)
One relatively straightforward approach is to produce a fixed edition
on some stable medium at regular stages in its life, as is being done
for The Cambridge Edition
of the Works of Ben Jonson,
described by David Gants.
(367) While this volume deals primarily
with electronic textual editions, such is the power of the medium
that other media can be included, which all must be created according
to standards and stored in a format that is nonproprietary and well
supported. Eaves observes that editors of multimedia editions should
double-edit:
edit first in discrete, then in integrated media.
Applying Data Standards
Deegan: CCS link of cultural bias in encoding recommends ASCII entity references over direct Unicode; see Case and Gee.
(367) For text, the ASCII standard should always be used, with markup
added that is also in ASCII. There has been great progress in the
presentation of special characters through the Unicode standard, but
it is preferable that characters be encoded as entity references that
can be displayed in Unicode than encoded as Unicode itself.
(368)
Image data should be captured at the best quality possible to reveal
all significant information about the original, then stored in a
nonproprietary file format using only lossless compression (if
compression is used at all).
(369) The long-term prospects of
electronic editions are also affected by the naming conventions used.
. . . Work is being done on alternatives to URLs. Uniform resource
names (URNs) identify a piece of information independent of its
location: if the location changes, the information can still be
found. One type of identifier that has been adopted by a number of
publishers is the digital object identifier (DOI). DOIs are
persistent names that link to some form of redirection.
(369) The
problem with using these specifications [Xlink and Xpointer] is that
there is more of a learning curve when linking is done through
pointing and clicking in various kinds of authoring software. Bur for
long-term security, links must be separated from programs.
Unsworth, John, Katherine O’Brien O’Keeffe, and Lou Burnard. Electronic Textual Editing. New York: Modern Language Association of America, 2006. Print.