Notes for Roberto A. Busa “Foreword: Perspectives on the Digital Humanities”

Key concepts: automatic translation, computational linguistics, documentaristic, hermeneutic informatics, humanities computing, informatics, universal language.


Related theorists: Billy Locke, Antonio Zampolli.


Would have been ironic if IBM punch card machinery that started digital humanities reappropriated from occupied Europe or tabulated for the USBSS.

(xvi) During World War II, between 1941 and 1946, I began to look for machines for the automation of the linguistic analysis of written texts. I found them, in 1949, at IBM in New York City.

Gives credit to Zampolli whose work maps plateaus in computational linguistics, his sense of humanities computing.

(xvi) It [Companion to Digital Humanities] continues, underlines, completes, and recalls the previous Survey of Antonio Zampolli (produced in Pisa, 1997), who died before his time on August 22, 2003. In fact, the book gives a panoramic vision of the status artis. It is just like a satellite map of the points to which the wind of the ingenuity of the sons of God moves and develops the contents of computational linguistics, i.e., the computer in the humanities.

Ties humanities computing to discourse of written texts, although admitting machines in the discussion; only a step away from admitting programming texts in future perspectives.

(xvi) Humanities computing is precisely the automation of every possible analysis of human expression (therefore, it is exquisitely a “humanistic” activity), in the widest sense of the word, from music to the theater, from design and painting to phonetics, but whose nucleus remains the discourse of written texts.

Three perspectives experienced over sixty years, unclear whether sequential epochs.

(xvi) I will summarize the three different perspectives that I have seen and experienced in these sixty years.

Technological “Miniaturization”

Perspective of technological miniaturization akin to progression from eight to sixty four bit address widths, here applied to phases of his project from punched cards, magnetic tape, finally CD-ROM.

(xvi-xvii) According to the perspective of technological miniaturization, the first perspective I will treat, the Index Thomisticus went through three phases. The first one lasted less than 10 years. I began, in 1949, with only electro-countable machines with punched cards.

Hilarious thankfulness for invention of magnetic tapes.

All computers used through first two epochs of technological miniaturization of his project were IBM equipment.

(xvii) In His mercy, around 1955, God led men to invent magnetic tapes. . . . I used all the generations of the dinosaur computers of IBM at that time.

Latest version of project fits on a single Hufmann method compressed CDROM.

(xvii) The third phase began in 1987 with the preparations to transfer the data onto CD-ROM. The first edition came out in 1992, and now we are on the threshold of the third. The work now consists of 1.36GB of data, compressed with the Huffman method, on one single disk.

Textual Informatics

Following first perspective of technological miniaturization, making gadgets, the second of textual informatics itself has three branches: documentaristic, which includes media production, editorial, from what critical editions arise in media production, finally hermeneutic, for philosophies of computing.

(xvii) The second perspective is textual informatics, and it has branched into three different currents.

Editorial where most digital humanities of dumbest generation sticks; must keep in mind Busa experienced that technological era along with the dumbest generation.

Busa names first current of textual informatics documentaristic, naming computing centers phenomena I refer to as collective intelligence.

(xvii) I call the first current “documentaristic” or “documentary,” in memory of the American Documentation Society, and of the Deutsche Gesellschaft für Dokumentation in the 1950s. It includes databanks, the Internet, and the World Wide Web, which today are the infrastructures of telecommunications and are in continuous ferment. The second current I call “editorial.” This is represented by CDs and their successors, including the multimedia ones, a new form of reproduction of a book, with audio-visual additions.

Hermeneutics associated with linguistic analysis; potentially territorializable by philosophy of computer programming.

(xvii) I call the third current “hermeneutic” or interpretative, that informatics most associates with linguistic analysis and which I would describe as follows.

Posthumous project of artificial intelligence based on cleverness of human intelligence programmed and built into machines; living writing.

Ultimate critical programming project extending current functionality of Index Thomisticus in current Lessico Tomistico Biculturale, hinting at ultimate communication with AI, calling for attachment to ensoniment projects.

Point that there could be common words as thoughts linking ancients to modern technically informed experts; does this continuity confound uniqueness argument that Johnson associates with the standard account of computer ethics.

Only in a computer could the computation operations Busa describes for his LTB digital humanities project.

The project Busa describes imagines and instantiates forms of that the ancients called, imagining test by LTB apparatus, living writing.

Imagines himself able to think as Thomas to inspire computing project engineering by sharing expressions of genetic DNA programming.

(xvii-xviii) At the moment, I am trying to get another project under way, which will obviously be posthumous, the first steps of which will consist in addiing to the morphological encoding of each single separate word of the Thomistic lexicon (in all there are 150,000, including all the particles, such as et, non, etc.), the codes that express its syntax (i.e., its direct elementary syntactic correlations) within each single phrase in which it occurs. This project is called Lessico Tomistico Biculturale (LTB). Only a computer census of the syntactic correlations can document what concepts the author wanted to express with that word. Of a list of syntactic correlations, the “conceptual” translation can thus be given in modern languages. . . . To give one example, in the mind of St. Thomas ratio seminalis meant then what today we call genetic programming. Obviously, St. Thomas did not know of either DNA or genes, because at the time microscopes did not exist, but he had well understood that something had to perform their functions.

Hermeneutic Informatics

Claims his computing project Index Thomistics an intentional act establishing hermeneutic informatics, and links it to IBM through Watson providing a highly engineered solution.

Riddles and gaps in the state of the art leading to his ultimate project include mother tongue epistemology, universal language grammar function, and implementation of living writing by operating upon both natural human and machine languages, as well as engineering philosophy problems of generating real virtualities.

(xviii) This third sort of informatics was the first to come into being, with the Index Thomisticus project, in 1949. . . . First, everyone knows how to use his own mother tongue, but no one can know “how,” i.e., no one can explain the rules and no one can list all the words of the lexicon that he uses (the active lexicon) nor of that which he understands but never uses (the passive lexicon).

Automatic abstracting along with eventual ensoniment or display tasks for programmed computing machinery answering questions of hermeneutic informatics.
(xviii) Second, there is still no scientific grammar of any language that gives, in a systematized form, all the information necessary to program a computer for operations of artificial intelligence that may be currently used on vast quantities of natural texts, at least, e.g., for indexing the key words under which to archive or summarize these texts to achieve “
automatic indexing – automatic abstracting.”

Reformulating traditional aspects of every language to make computable, for which many terms have been proposed.

Grammars of human languages formed for centuries by sampling.

(xviii) Third, it is thus necessary for the use of informatics to reformulate the traditional morphology, syntax, and lexicon of every language. In fact all grammars have been formed over the centuries by nothing more than sampling.

Project description in geek speak of his time; answers to schematism of perceptibility describing its programming design; probability index at core resembles Socrates discussion of ideal rhetoric in Phaedrus.

(xviii) Schematically, this implies that, with integral censuses of a great mass of natural texts in every language, in synchrony with the discovered data, methods of observation used in the natural sciences should be applied with the apparatus of the exact and statistical sciences, so as to extract categories and types and, thus, to organize texts in a general lexicological system, each and all with their probability index, whether great or small.

Alpac Report convinced biopower to suspend hermeneutic informatics; may have snuffed out projects that may have continued cards from Nazi systems.

Recall automatic translation mentioned by Black, as if Watson fed to Busa: can the programming project survive, should ancient code revisions remain extant, forms ethical and philosophical question place.

Globalization of LTB project rekindled by invention of GPL, linking Stallman and digital humanities through an unlikely common cause.

Automatic translation projects of 1950s may have continued efforts from Nuremberg trials, a global show before the Internet: yes we could if so perverted imagine such games and virtual realities.

MIT launched magazine Mechanical Translation offered as digital humanities study content along with IEEE Annals.

(xviii) Hermeneutic informatics hinges on the Alpac Report (Washington, DC, 1966) and, now, this perspective is perhaps awaiting its own globalization. . . . Shortly afterwards, in the early 1950s, if I am correct, the move toward automatic translation started. The magazine MTMechanical Translation was started at MIT, launched, I think, by Professor Billy Locke and others.

Not worth trying to build shows material conditions of code; the surprisingly forgotten fact that cathedral builders and users had limits too.

Ironically Alpac Report canceled machine translation funding not for technological as in hardware limitations but ontological deficits foreshadowing OOP as in software engineering dooming them: these are ultimately philosophy of computing territories suggesting Socratic question deeply tangling synaptogenesis and technogenesis.

Busa notes shortcomings of philology implying tracy of electracy of his time lacking.

(xix) Unfortunately, in 1966, as a result of the Alpac Report, the Pentagon cut off all funding. This was not because computers at that time did not have sufficient memory capability or speed of access, but precisely because the information on the categories and their linguistic correspondences furnished by the various branches of philology were not sufficient for the purpose. The “machine” required greater depth and more complex information about our ways of thinking and modes of expression!

Future Perspectives

Invokes Delphic know thyself in call for comprehensive global collective cognition heavily afforded by directed informatics, Engelbart Type C activity, rather than using old tools.

Tension between totality as global research and collectively as thoughtful computing system design, doing more than saving time doing the same old things.

(xix) We are far from having exhausted the precept inscribed on Apollo's temple at Delphi, “Know thyself.” It seems, therefore, that the problem must be attacked: in its totality – with comprehensive, i.e., global, research; collectively – by exploiting informatics with its enormous intrinsic possibilities, and not by rushing, just to save a few hours, into doing the same things which had been done before, more or less in the same way as they were done before.

Notes lean funding for his sort of digital humanities projects in renewed war on terrorism; many other projects likely continue by intelligence collection and analysis computing centers, and we could wonder about the neutrality or evil inherent in either group.

(xix) It seems that the attack on the Twin Towers on New York City on September 11, 2001, has brought in an unforeseen season of lean kine.

A Proposal

Nominates as spiritual testament his Strasburg conference presentation, another item of philosophy of computing discourse networks.

Whether prophecy or utopia a common acknowledgment of humility.

(xx) I should like to summarize the formula of a global solution to the linguistic challenge that I presented at the above-mentioned conference at Strasburg, much as if it were my spiritual testament, although I am uncertain whether to call it prophecy or utopia.

AntiBabel system employs as part what is primary programming method employed by Boltanksi and Chiapello to respond to humanities questioning employing Index Thomisticus algorithms.

(xx) I suggest that – care of, for example, the European Union – for every principal language . . . there should be extracted is integral “lexicological system” (with the help of the instruments tested in the Index Thomisticus).

AntiBabel universal language virtual reality proposed as common interlingual system used only by the machines to media human communication via disciplined basic languages, mother tongue of human machine collective intelligence to be theorized as post postmodern network dividual cyborgs.

Telematic use of computer as purpose for developing disciplined basic languages seems like perverse corruption of human being Heidegger feared.

(xx) Thus there would be on the computer a common interlingual system consisting solely of strings of bits and bytes with correspondence links both between convergences and divergences in themselves and between each other. It would be a sort of universal language, in binary alphabet, “antiBabel,” still virtual reality. . . . The number of such correspondences thus extracted (lexicon and grammar) would be a set of “disciplined” basic languages, to be adopted for the telematic use of the computer, to be also printed and then updated according to experience.

Native disciplined language now largely GUI expressions, taking into account forms beyond symbolic.

Input as disciplined native language, output various translations leveraging single sourcing.

(xx) In input, therefore, everybody could use their own native disciplined language and have the desired translations in output. The addressee could even receive the message both in their own language, in that of the sender, and in others.

Philosophical insight influenced from decades spent doing directed, if not yet critical programming in his programs for Latin.

Programs for Latin can be extended to all languages, forming as network effect universal collective language imagined as AntiBabel; notes phonetic script scope does not include ideogram or pictogram based languages, and I wonder if they therefore include procedural machine languages.

(xx-xxi) These thoughts have formed gradually in my mind over the years, starting from the realization that my programs for Latin, which I always wanted broken up from monofunctional use, could be applied with the same operative philosophy to more than twenty other languages (all in a phonetic script), even those that do not descend from Latin, such as Arabic and Hebrew, which are written from right to left. I had only to transfer elements from one table to another, changing the length of fields, or adding a field. (However, I cannot say anything about languages written ideograms or pictorgrams.)

Conclusion

Textual hermeneutics summarized descriptively by three periods, from Index Thomisticus to Alpac fragmentation envisioning global collaborative universal language programming; an emergent branch of philosophy.

Digital humanities have foss hopes to also address parcelization of progress in free research, per my published work and projectively in my dissertation.

(xxi) In conclusion, I will therefore summarize the third perspective, that of textual hermeneutics, as follows. The first period began with my Index Thomisticus and ended, though not for me, with the Alpac Report. The second, after the Alpac Report, is the parcelizaton in progress of free research. The third would begin if and when comparative global informatics begins in the principle languages, more or less in the sense I have tried to sketch here. Those who live long enough will see whether these roses (and thorns), which today are merely thoughts, will come to flower, and so will be able to tell whether they were prophecy or dream.



Busa, Roberto A. “Foreword: Perspectives on the Digital Humanities.” A Companion to Digital Humanities. Eds. Susan Schreibman, Ray Siemens, and John Unsworth. Malden, MA: Blackwell, 2004. xvi-xxi. Print.