Notes for J.D. Applen and Rudy McDaniel The Rhetorical Nature of XML: Constructing Knowledge in Networked Environments
Related theorists: Michael Albers, Berners-Lee, Birkerts, Bogost, Bowker, John Seely Brown, Buchanan, Saul Carliner, Locke Carter, Derrida, Duguid, Ellul, Foucault, W.C. Howell, McGann, Michael Hughes, Johnson-Eilola, Kuhn, Nardi, O'Day, Polanyi, Robert Reich, Ann Rockley, Selber, Star, Joe D. Williams.
XML, Knowledge Management, and Rhetoric
(1) The semantic power of XML, coupled with an understanding of knowledge management and the rhetorical situation, is something that can be harnessed in order to create more humanistic and compelling frameworks for information and knowledge exchange.
(4) So, while it is important to recognize and understand the intricacies of knowledge representation through XML, the distributional and transactional aspects of XML are also important to consider. By distributional, we mean the function performed by software that allows XML data to be distributed from one computer to another, or from a content author to her audience(s) in a computer-mediated fashion. We use the term transactional to refer to the process by which XML content is packaged, processed, and transmitted using collections of logical rules and conditional processing.
1 Knowledge Management and Society: Evaluating the Convergence of
Knowledge and Technology
Understanding Technology and Information
(7) Expansive communication is a recursive process that takes the “user, designer, technology and context” into consideration. Not to see communication technology in this recursive perspective suggests “technological determinism” where technology can only be used to send and receive one kind of message regardless of the needs of the people involved in the process (Johnson-Eilola and Selber 121).
(7) If technical communicators do not understand this and continue to document practices as mere translators, the ability of the user to employ the technology in an expansionist sense will be undermined (Johnson-Eilola 247).
XML for symbolic-analysts doing critical reverse engineering.
(8) Echoing Robert Reich,
that technical writers work to promote themselves as
their skills include the manipulation of information, which requires
a greater understanding of it in the abstract. . . . We feel that one
way of describing the work of symbolic-analysis is: 1) identifying
what constitutes relevant and meaningful information, 2) breaking
this information down into specific elements, 3) providing names for
these elements, and 4) contextualizing these elements of information
to best meet the rhetorical needs of their audiences. While XML
technology by itself does not perform this work, it does serve as a
robust tool that would allow for the storage and transmission of this
kind of work performed by symbolic-analysts, and we will be
demonstrating this in the chapters that follow.
(8) Michael Hughes describes how technical writers can have a greater role in an organization by engaging in “critical reverse engineering.” . . . By “critical reverse engineering” a product, the technical communicator could raise some questions and even challenge some of the software designer's assumptions. These questions might also better allow the designer to more clearly explain the design of the software so it can be better documented for the end user.
(8-9) If technical communicators are engaged in this kind of process, they become part of the overall design team that works towards a final product. . . . This socially constructed knowledge can ultimately lead to a better product for the consumer, the end user.
(9) Wick (2000) challenges technical communicators to claim their role in the knowledge management game by emphasizing their considerable theoretical understanding of rhetoric and ability to communicate within, between, and across different sectors of an organization.
Social Construction and Paradigms
(10) For Kuhn, each scientific paradigm shared by a scientific community is based on four socially constructed elements: . . . “symbolic generalizations” . . . “shared commitments” . . . “shared values” . . . “shared examplars.”
(11) However, Kuhn draws our attention to these elements above as they often go unchallenged, thus producing an invisible set of practices that channel they way scientists think.
(11) Normal science gets in the way when we cling to it and fail to realize that from time to time we are going to have to reexamine our assumptions and modify our theoretical approaches.
(11-12) Similarly, the writers cited in the first section of this chapter ask technical writers to examine the unchallenged assumptions they adhere to as they use communication tools and methods of documentation that explain communication technologies and products to others. For example, Self and Self have suggested that IBM's DOS system leads us to believe that hierarchical ways of organizing knowledge are better than more intuitive methods. . . . If technical writers assume the role of symbolic-analytic workers, they can better understand how what they are told to believe is true is in fact a social construction.
Social construction examples of DOS hierarchy and Microsoft business practices influencing technical tools, beliefs about them, and relationship to tacit knowledge.
(12) Analogously, communication professionals should know how the business practices of Microsoft executives have allowed their Windows operating system and applications to gain a near monopoly as the tools used in government, education, and industry, and why they are now the primary tools used by technical communicators. They should also understand how tacit knowledge can be either captured or excluded by these tools.
(13) Michael Polanyi believes that much of the valuable knowledge we possess is tacit knowledge, which is a kind of knowledge that we cannot really convey to others in totality and perhaps it is a kind of knowledge we take for granted.
(13) Tacit knowledge comes about from the process of indwelling, where we engage ourselves in a problem and cull from large body of information and sensory stimuli what we feel we need to know about it.
(13) Metis is that type of knowledge that is constantly shifting and ambiguous, a knoweldge that comes from a source that is not readily identifiable.
(14) As an instance of metis in action, Baumard cites the Intelligence Newsletter, which has published “difficult-to-obtain insights” about the world of espionage. Without the resources of the CIA, this small French publication bases its research on a careful examination of major newspapers and press releases from government officials throughout the world.
(15) The concept of ba [of Von Krogh, Ichijo, and Nonaka] unifies mental, virtual, and physical spaces and differs from “ordinary” interactions in that it allows for the potential generating “individual and or collective knowledge creation.”
(15) One of the key features of an enabling context is that a non-competitive atmosphere is created.
(16) They [Nonaka and Takeuchi] discuss four different kinds of knowledge transfer: tacit knowledge to tacit knowledge through socialization, tacit knowledge to explicit knowledge through externalization, explicit knowledge to explicit knowledge through combination, and explicit knowledge to tacit knowledge through internalization.
Tacit Knowledge to Tacit Knowledge by Socialization
(16) Experience is how we acquire tacit knowledge, and socialization is the process by which we pass on ideas or shared mental constructs and technical know-how.
Tacit Knowledge to Explicit Knowledge by Externalization
(17) Externalization is where we take the tacit knowledge that we possess and convert it into explicit forms that are expressed in forms such as metaphors, concepts, and models. Perhaps the most common method of externalization is writing.
(18) Both deductive and inductive reasoning methods can inform this process of externalization.
(19) Much of tacit knowledge can be turned into explicit knowledge through the use of figurative language such as analogies and metaphors, and Nonaka and Takeuchi see the progression from metaphor to analogy to a model as the most effective way to achieve this.
Explicit Knowledge to Explicit Knowledge by Combination
(20) Taking explicit knowledge—knowledge that is already known and written down or recorded in some other way—and reconfiguring it for different purposes into other forms of explicit knowledge so it can be more readily used, is what Nonaka and Takeuchi refer to as combination.
Explicit Knowledge to Tacit Knowledge by Internalization
(21) To enable an organization to have the most robust knowledge management culture, all tacit knowledge gained and that which resides in people needs to be passed on to others through internalization.
Technical Communication and Knowledge
Learning progresses from tacit ignorance, explicit ignorance, explicit knowledge, reaching tacit knowledge; technical communicator must harvest information from SMEs to explain for beginners.
(22) There are four major states in
the learning process according to [W.C.] Howell:
“Unconscious incompetence” . . . “Consicous incompetence”. .
. “Conscious competence” . . . “Unconscious competence.”
(22) At one end of the spectrum, tacit ignorance, we have some end users or customers who buy products who do not even know where to start learning about something or what really needs to be known, and technical writers are often challenged to derive information from experts who are at the other end of the spectrum, tacit knowledge, who really know how to do something, but really do not understand that they know this. These subject matter experts just take it for granted that what they know is simple and that everyone else knows it, and it is the job of technical communicators to harvest this information and explain it for beginners.
(23) In this last example, Hughes suggests that if technical communicators produce a style template for documenting knowledge, this template can be adopted by other members in the organization who are not technical communicators.
Deploying Knowledge Management: Intranets and Extranets
(23) An intranet can be likened to an Internet that is deployed within an organization.
(23) An extranet is an intranet that has its protective firewall set up so some people who are not organization members have access to some or all of the content on the corporation's interanet.
Corporate-Wide KM Systems
(26) In their book Information Ecology, Nardi and O'Day expand this concept into complex systems of information in a way to frame the relationship between people with different skills, technology, values, and the individual human relationships that people have with the IT they use every day. What is key here is that all of these elements have an effect on the uses of technology and that they cannot be studied independently to understand how an ecosystem of information functions.
(26) The use of the word tool for a technology is important as it challenges designers of the tools to imagine how the end users might use them when they design them.
(27) Nardi and O'Day point out some of the concerns of Jacques Ellul, who is troubled by the institution of technology.
diminished in network environment according to Birkerts.
(27) Birkerts has posited the idea that individual subjectivity, being aware of ourselves and our thinking relative to others, is diminished in this kind of network environment.
Teachers of new technology are keystone species in information ecology; many connections to other entities.
(28) The metaphor of information
ecology asks that we see that everyone in a system is of value, but
there are certain keystone species that are needed to make the system
work. One example of a keystone species,
according to Nardi and O'Day, would be the teachers who train
employees how to use a newly implemented technology.
(28) Buchanan further refines the notion of keystone species by noting that they are usually connected to many of the other entities in an ecosystem.
(29) When contrasted to the often used term “community,” Nardi and O'Day assert that the ecological metaphor allows them to better illustrate how a technology is employed in an organization.
(30) The going back and forth from client to search technologies, and then back to the client, exemplifies how the information ecology of these organizations works.
(31) Chun Wei Choo describes in greater detail the process through which information professionals can work with subject matter experts to assist in adding to the knowledge of an organization.
Information and its Integration into Social Systems
(33) Echoing Nardi and O'Day, Brown and Duguid are interested not so much in the growth of technology, but rather ask that we pay more attention to how technology is deployed in its social context before we design it, thus making it more useful.
(35) Brown and Duguid see the shift from an information economy to a knowledge economy as one that is a shift towards people or knowers, something from which the process model of business, in seeking greater efficiency, was moving away from.
(35) Several Xerox engineers worked to implement a database system, the Eureka project, which sought to expand the best practices developed by the technicians in their local communities.
(36) If the tip was deemed of value, the validator worked with the technician to make sure that all the relevant information was captured and explained properly (Bobrow and Whalen 51). In the following chapters, we will reveal how communication professionals can construct document type definitions (DTDs) and schema, both essential XML technologies, so that data entered into a content management system (CMS) better meets specific and valid standards.
(37) Following this success, Eureka II was implemented, which was a worldwide deployment of this system via the Internet.
2 Introduction to XML
A Primer on the eXtensible Markup Language
The Value of XML
(41) Essentially, when one is using HTML, one is acting like a typesetter and layout artist of traditional printing mediums like newspapers, books, posters, and pamphlets.
(41) The other great thing about HTML is the way it allows us to make links from one document to another document with such ease.
introduction to XML resembling tutorial marks push for humanities
scholarship towards technical competence, beginning with
differentiation between HTML and XML.
(42) Unlike HTML, XML allows us to design our own markup tags. . . . XML is about designating content, and HTML is about how that content is to be arranged and how it looks
(43) You should start an XML document with an XML declaration.
(43) Elements are the basic coding units for XML documents. An element consists of tags that describe the actual information itself.
Root Elements and Hierarchies of Elements
(44) XML is written in a tree form where there is a hierarchy of elements with some elements nested within other elements.
Other Software for Displaying XML
Writing and Viewing an XML Document
Using Attributes in Elements
(48) We can even give our elements greater specificity by adding attributes within an element.
(49) One advantage of doing this is that we can reduce the complexity of our element tree.
(50) As a general rule, attributes are best for self-referencing metadata, or metadata about particular elements.
(50) Empty elements are used when you want to include information that does not have any text between the element tags but still needs to be included in the XML code to make it valid.
(52) When the organization of your XML elements conforms to the structure of the DTD you or someone else defined, it is considered “valid.” . . . To be well formed, an XML coded document needs to conform to the basic syntactic rules of element design that we have already described.
(52) Valid XML code contains all the designated elements in the order required by the DTD. It forbids tagged elements that are not declared in the DTD, and a DTD describes the hierarchical structure that XML elements must adhere to so it can be read by an XML parser.
(53) DTDs declare which elements must be in a document and the order in which these tags need to be embedded inside one another.
Rules for Designing DTDs
(55) In the lines of the DTD above that do not have (#PCDATA) included, textual information cannot be included; these lines only describe how elements should be organized.
Combining Document Type Declarations, Document Type Definitions, and XML Coded Information
External DTDs on a Website Server
External DTD formation a type of procedural rhetoric, as is use of fixed XML attributes.
(61) The advantage of this external file approach is that you can have easy access to the DTD, and anyone who might also be working with you can access this DTD by just referring to a website where it was stored. We have already pointed out that making a DTD available to a group of people allows for an organization to have a model for a well-designed document.
Including Simple Attributes and Attributes with Unique Values
Another procedural rhetoric.
(64) Additional uses of fixed attributes might include XML code written for a company intranet that would state the name of the company and/or the division within the company.
Defining Attributes with Unique Values using ID Attributes
(66) Relational database systems handle to potential problems of data duplication by using what they define as a “primary key” for each individual row or record in a database. In XML, we can use a similar approach. To do this, we would add “ID” to reference a particular value that we do not want to repeat in the database, one that is unique.
(68) IDREF attributes allow us to see metadata of some of the relationships that might not show up in the tree structure when we view the document through XML software.
(72) To declare a namespace, we need to start with the xmlns attribute which is easy to remember as the “ns” stands for “namespace.” Then we give it a value with a Uniform Resource Indicator (URI). The convention is to use the Uniform Resource Locator (URL) of the organization for this as it is convenient and we can be sure that there is only one of them.
Internal General Entities
(75) The value of this technique (using entities) is that it allows us to more easily represent information that needs to show up on every document.
External General Entities
(76) The power of XML in part derives from the fact that XML documents can be coded in such a way where we can select from many different files that might be in one or more databases to produce a larger document composed of these files. This is one of the real values of XML; we can build from external general entities.
DTDs for Documents with General Entities
Document Type Declaration
Using an External DTD and External Entities on a Website
(86) The advantage of this practice is that it allows members of a large organization to share and extract or point to DTDs and entities with greater ease.
(88) If we had a long list of entities or DTDs that were stored in our hard drives or on an external server, we could pick and choose just what modules we would need to produce a specific document for our needs. This shows how entities can be employed by technical communicators who are drawing from previously written documents, a practice known as single sourcing.
Parsed and Unparsed Entities
3 Semantics and Classification Systems
Single Sourcing and Methods for Knowledge Managers
(95) In this chapter, we will explain how communication professionals can apply theories of semantics to better describe how we can name and arrange XML elements relative to one another.
The Semantic Web
(96) But we could have an even more robust Semantic Web if we included specific metadata that better allowed for more precise and far reaching searches into the databases of organizations that were willing to make their information accessible.
(96) Berners-Lee sees XML as the universal coding language that allows us to search for XML-tagged information based on the semantics used to tag the information, tags such as <barometric_pressure>.
Importance of rhetorical choices about naming and arranging.
(97) It is not just about the technologies; it is about how humans make rhetorical choices about naming and arranging the things they named between each other.
(98) They [cognitive scientists] have created their own term—schema—to apply to this type of template-based model of knowledge storage. Prototype theory follows this same idea. . . . The decisions we make when we design classification systems are often based on prototypes.
(99) Language neatly parses or arranges the universe in a way that gives us the feeling of control over it. Foucault thus sees language as a repressive institution that keeps us from understanding all the things that it cannot adequately convey given its present net of words and the syntactical structures that arrange these words into identifiable constructs of meaning.
(99) All structures of thought—myth, religion, science, philosophy—rely on the ability of language to name and order things. These structures of thought are based on what Foucault calls an “episteme,” the prevailing foundation of language or discourse that propels the “thinking” of an age and extends this foundation into other systems of order.
(100) In conventional terms, an ideal classification is a “spatial, temporal, or spatio-temporal segmentation of the world” that arranges information according to the following criteria (Bowker and Star 10,11): 1. The rules of classification are consistent. . . . 2. There is no overlap between separate categories. . . . 3. A system of classification is designed to cover everything.
(102-103) Classification schemes that have this effect are invisible systems that do not draw attention to their subjective nature because their description of reality is more likely to become uncritically accepted as “true,” but it might exclude ideas that are also of value.
(104) In contrast to the examples above where one group of stakeholders insist on one way of classifying or naming something for their epidemiological or political needs, the creation of boundary objects, objects classified under different categorical headings, enables different communities of stakeholders to use the same piece of data for their own specific purposes (Star “Structure” 50).
(104) Designation or naming practices also contribute to the spin inherent in invisible systems.
(105-106) Bowker and Star refer to these sorting and classifying phenomena as the principle of convergence, the “double process by which information artifacts and social worlds are fitted to each other and come together,” or converge (82). . . . This concept of convergence can be likened to Thomas Kuhn's theories about how the language of science comes into being and shapes the discourse scientists use to explain their ideas and understand nature.
(106) 1. . . To be effective we need to allow for some “ambiguity” in our designations.
(106) 2. It is important to be aware of “residual” categories, categories that are often referred to in classification schemes as “other.”
(106) 3. We need to remember who initially contributed to the system and what might have been some of the initial uncertainties, political tensions, and tradeoffs that went in to them.
(107) The physical repositories of information and data entry artifacts of infrastructures—the hardware, software, GUIs, ledgers, and forms—channel the values and philosophies of organizations.
(107) For an infrastructure to work, it must foster a sense of a democratic community of producers and consumers of information. For this to happen, the minimum technical threshold for its use must be acquired and within the abilities of the audiences who will be asked to use them (Star and Ruhleder 125).
(107) A classification strategy tells a story that identifies its architect(s). Likewise the genres of organization communication indicate the organizing practices of the individuals who use them.
(108) Single sourcing is the practice of using one document for different outputs.
Paradigm shift from document-centered to object-oriented conception of information, demonstrated by four levels of single sourcing.
(109) This paradigm shift means
that we are getting away from thinking about single sourcing as
cutting and pasting from legacy documents that have already been in
use before. Instead, we are moving from a document-centered to and
object-oriented way of thinking about information (Williams 321).
Now, information can be coded in XML and thus packaged in modular
elements that reside in databases, like those we discussed in Chapter
2. These elements can be drawn and arranged as needed with the use of
(110) According to [Ann] Rockley, there are four levels of single sourcing that help illustrate this paradigm shift. Level one single sourcing, “Identical Content, Multimedia,” describes the practice of using content from one medium, say a printed manual, and then taking the same content and putting it in another medium such as a PDF file or an online Web page.
(110) Level two single sourcing, “Static Customized Content,” is the practice of taking single sourced content and then modifying it for different platforms or audiences.
(111) “Dynamic Customized Content” describes level three single sourcing. In this scheme, each user has the content customized for her own needs.
(111) Level four single sourcing is built on an electronic performance support system (EPSS) that builds on level three technologies by providing users with support material such as usage questions and training manuals that are tailored to their needs “before they know they need it” (191).
Primary and Secondary Modules
Ament definition lists, glossaries, procedures, processes, topics.
(112) The first step in producing a single-sourced array of content is to identify the specific modules, both primary and secondary, that will constitute the larger document.
Information Product and Element Models
(114) Rockley uses the phrase “information product model” to describe the basic features of an organization's document.
(114) “Semantic information,” according to Rockley, is information that describes the “specific meaning” of information such as “website address” or “product description.”
Rockley semantic and generic information.
(115) For Rockley, “generic information” refers to descriptions of information that do not tell us anything about the content of the information like “semantic information” does. Rather, generic information tells us about the information's basic form.
Organizing Knowledge as Knowledge Managers
(118-119) The knowledge that a healthy organization possesses can be divided into four categories (Zack 25-7): Declarative . . . Procedural . . . Causal . . . Relational.
(119) In an attempt to bring together employees in large organizations with specific skill sets, Microsoft has employed a “knowledge map” (Davenport and Prusak 75-7).
(121) Malhotra (“Deciphering”) also asks professionals to recognize the value of tacit knowledge, human creativity, and imagination.
(121) We believe that if technical writers and other communication professionals take on the role of knowledge managers, they need to be mindful of the way explicit and tacit knowledge is identified and named within and between different divisions of organizations.
Knowledge Management and Information Science
(122) XML can then be used to directly support knowledge management practices, since we can think of knowledge as data with content.
(122) Information scientists and archivists have understood for some time that the best way to organize data and concepts in information storage and retrieval systems (ISARs) is by indexing and abstracting them.
(124) Moreover, breaking information down into its elementary parts and then asking themselves if they are in fact “objects” also challenges professionals to more critically frame their use of object-oriented languages such as XML (Price 71).
XDA three layer analysis.
(125) One method for using XML
efficiently is to create an architecture that represents how an
organization does business. To do this, knowledge managers need to
see an XML Document Design Architecture (XDA)
as something that is set up in three layers (Simon 130): a conceptual
layer, a logical layer, and a physical layer.
(125) To develop a conceptual layer, knowledge managers need to acquire all of the documents that are currently in use throughout their organization.
(126) To understand the logical layer, knowledge managers need to determine the data that their documents commonly contain, and define how these pieces of information can be set in document data element types.
(126) The physical layer would consist of DTDs that the knowledge managers develop. These DTDs would designate which elements are to be used for each application and how they relate to one another.
4 The Visual Rhetoric of XML
Using CSS and XSL to Format and Display XML Projects
(131) CSS is a style sheet language that is compatible with both HTML and XML, while XSL is a family of style sheet and transformation languages used specifically and exclusively the XML data. These languages are important to understand because they allow one to separate metadata describing the data itself (or its content) and the shape that data should take (or its form) into two different logical files or locations.
(132) The semantic XML tags describe what and the CSS and XSL technologies describe how.
Rhetoric, Imagery, and XML
(133) At the very minimum, style is an element of electronic rhetoric that we cannot forget about as it influences everything from how corporate ethos is presented to how trust is established and maintained with online consumers.
(134) Knowledge managers must be aware that visual content is another powerful layer that can be used (or misused) to communicate with audiences and facilitate information exchange within an organization.
(134) What is particularly fascinating about the Scott and Vargas study is that even abstract images were found to communicate specific features and elicit certain emotional responses from participants.
(135) Another interesting rhetorical dimension of visual design is found when considering the cultural differences of one's audience.
Addressing Visual Complexity
Visual basis of authority learned from web usage; note change from early days of Web 1-0 comparing to attention to visual appearance of print materials (Drucker and McVarish).
(136) Our prior experiences with well-designed and visually appealing websites have trained us to be more willing to engage with and see authority in new online resources with attractive visual styles. For this reason, it is important that we consider some of the supportive visual technologies of XML.
(136) CSS are powerful textual documents used for specifying the layout and formatting of Internet documents. In contrast to XML, which describes only the data, CSS is used to describe how that data should appear when rendered in a Web browser.
(138-139) This class definition contains two CSS triplets that are each made up of a selector, property, and value. In fact, CSS documents are composed entirely of these triplets, which are collectively referred to as the “rules” of the document. Each rule specifies how a given unit of information, such as the color of the font, should be formatted upon encountering a selector of that type within the hypertext document.
Color Codes and CSS Properties
What is the ontic status of RGB color: universal, anthropocentric, culturally specific? Will RGB and 32-bit color be arcane one day, like primitive art?
(142) The RGB color model used by HTML allows a designer to specify precise combinations of reds, greens, and blues in order to generate millions of different colors (assuming the person viewing your site has a 32-bit video card).
(143) Ems are textual units that use the size of the surrounding text as a reference point to adjust the property applied to a selector.
(147) This combinatory feature is what makes CSS so powerful. It is the cascading part of CSS.
(147) The important thing to remember is this: any original properties that are not redefined further downstream will remain in place when the bottom of the waterfall is reached. Only subsequent CSS properties applied later in the CSS definition will alter the appearance and layout of elements already imbued with formatting instructions. In this way, document-wide consistency can be maintained at the beginning of the document and further customization is made possible as sub-selectors, classes, and special elements pick up their customized and individualized instructions further down the waterfall, or further down the CSS hierarchy.
and SPAN for Web 2.0 Applications
(149) DIV and SPAN elements are useful for designing “Web 2.0” XML applications that continuously refresh the browser in the background, without explicit instructions from the user.
(150) In order to use positioning in CSS, one must pair the position property with offset values for left, right, top, or bottom margins.
(153) If this video game enthusiast were to encode her video game collection using XML, she could develop a series of elements such as GAME, TITLE, DEVELOPER, and GENRE to better classify and organize her collection by building facets important to her own informational needs. She could then develop XML tags for these elements along with an external style sheet named “gamestyle.css” that she could gradually edit in order to influence how the information looked when it was retrieved from her XML database and displayed on her computer screen.
Gradual editing evolving data display by modifying style sheets after initial classification and organization by definition of XML tags; relate to McGann making intellectual discoveries through iterations of structure of archive.
(161) Several rhetorical questions can be generated from this side-by-side comparison [of visual presentations].
(162) Although using CSS with XML will work just fine, it is also useful to learn to use XSL as that technology was developed from the ground up by the W3C specifically to provide an accompanying style sheet language for XML.
(162) Unfortunately, the same process in XML is slightly more cumbersome, if only for the number of acronyms and abbreviations one must be familiar with in order to implement XML style sheets: XPath, XPointer, XSL, XSLT, and eXtensible Style Sheet Language Formatting Objects (XSL-FO) are just a few of the most important ones we discuss here.
(162) XPATH is a language that is used to access and describe certain parts of an XML document.
(162) XSL-FO is to XML what CSS is to HTML. XSL-FO . . . is now simply known as XSL.
(163) XSLT is perhaps the most powerful subset of XSL because it deals with transformations.
(164) We transform XML documents by creating an XSL document that searches our XML content for patterns and then applies a template to replace XML tags with HTML tags.
Additional Online Resources
Advanced Concepts in XML
Namespaces, Schemas, XLink, XPath, XPointer, DITA, and DocBook
(174) DITA [Darwin Information Typing Architecture] and DocBook are used for authoring, organizing, and delivering structured technical information. They operate using DTDs or schema for validation and are useful in single sourcing applications and for building CMS.
Chunking information into discrete units using DITA and DocBook represent alternative form of writing that requires developing appropriate rhetorical skills; needs to be distinguished from Bogost unit operations.
(174-175) DITA and DocBook also require authors to chunk information into discrete units; this practice helps to develop the types of rhetorical skills necessary for this alternative form of writing and the technical skills necessary for working in structured writing environments.
(175) Recognition and collision problems both contribute to a sense of rhetorical ambiguity in that they obfuscate meaning and complicate the process of working with combined documents.
Interesting contrast to Derridean play of ambiguities and collision problems that are avoided using namespaces, questions of dissemination for traveling XML documents, and involvement of working groups evolving specific RFC standards for imposing structural constrains on the language.
Without a means of linking a traveling XML document to its original
starting point, the original context of meaning from which that
document emerged is impossible to recognize.
(177) In the case of an ambiguous reference, the XML namespace definition defines the context in which a particular element or attribute exists.
(177-178) The value of this xmlns attribute will be a URI, which is defined by The Internet Society Network Working Group in the RFC 3986 standards document as a sequence of characters that identifies an online or physical resource.
(178) While these details are important to document experts working on Internet technologies or network protocols, most professionals will do well simply to remember that URI is a more general form of URL, or that a URL is a more specific instance of a URI that includes a mechanism for network location.
(178) The W3C Recommendation of August 16, 2006 specifies that XML namespaces are composed of the xmlns attribute, a namespace prefix, and a URI value.
Ideally, machine readable documentation.
(179) Ideally, a URI would lead to a page with documentation describing the allowable actions a parser can take and the rules it must follow when executing a transformation on an XML database.
(179) The XML Schema Language, also known as the XML Schema Definition (XSD), provides a means for describing the structure and required contents for an XML document.
XSD uses the same XML syntax to describe data, a sort of reflexivity.
(180) Unlike DTDs, however, schemas are unique in that they are written in the same XML syntax used by the data they describe.
Simple and Complex Types
(180) XML schemas differentiate between complex types, which are elements that contain other elements, and simple types, which are elements that do not contain other elements.
Complex and simple typing and enforcing sequencing imposes structural constraints on XML-based texts (DOMs), supporting or engendering OCHO hypothesis; also apparent that basic XML syntax is based on English, for example minOccurs.
(182) Following this same example, we next use the <xs:sequence> tag to specify the sequential sub-elements (or children) associated with that parent element. These are listed in the order in which they must appear in the XML document.
Enforcing Data Types
Validation using Schema
(185) Cardinality, or the number of times in which an element may occur, is enforced using the minOccurs and maxOccurs attributes.
(186) Another powerful feature of XML schemas is found in their ability to inherit from primitive data types (such as strings) and then extend these data types into new forms by adding restrictions or additional parameters.
Schema transformation via XSL represents another inroad for machine cognition into textual tasks performed by humans.
(189) Another useful side benefit of using a native XML format is that XSL transformations can now be applied to schema, making it easy to transform one schema into another or even to display the schema using HTML elements such as tables or lists.
(188) XPath refers to the elements and attributes within an XML file as nodes. . . . In the XPath data model (XDM), there are seven different inds of nodes that can be used to classify data. These seven types are called document, element, attribute, comment, namespace, text, and processing nodes.
(188) In addition to noes, XPath uses components known as axis specifiers to specify the direction in which an XML document should be traversed.
(189) This representation of XML elements in a hierarchical form is also known as the Document Object Model, or DOM.
Linking in XML
(191) The XML Linking Language, or XLink, is a set of standards that defines rules for linking XML elements. . . . This ability to link to unforeseen document media types as well as to an enormous amount of existing textual documents is what makes hypertext so powerful.
Linking in HTML
Relative and Absolute Linking
Linking Images and Multimedia
(194) In addition to the simple links supported by HTML, XLink supports extended links for tying multiple resources together.
(195) Unfortunately, due to some political problems between XML developers and XML enthusiasts and problems with backwards compatibility, XLink has not made as much progress as other associated XML technologies like XSD and XPath.
(195) The XML Pointer Language, or XPointer, is a notation system for XML that is even more specific than XLink. It allows one to access nodes that may be buried deep within XML databases, using an addressing system with precise syntax.
(197) XPointer is the equivalent of a named anchor, but for XML documents.
XLink and XPointer still nascent technology due to cultural, political, and legal issues, including a Sun patent; compare to Engelbart hyperscope.
(200) A 2001 article by Leigh Dodds suggests that a Sun Microsystems patent is partly to blame for the slow implementation of XPointer (online). This legal issue is an excellent example of the complex rhetorical space constantly being negotiated between open source web technologies and large technical corporations.
DocBook and DITA
(201) Perhaps most importantly, both technologies have contributed to the progress of structured writing as a viable communication strategy in software documentation and technical writing.
(202) One of the primary benefits of using DocBook or DITA is that you can take advantage of standardized tags at a more general level while still having some amount of flexibility and customization available at a more specific level. Often this is accomplished through inheritance relationships.
(203) The DITA is an XML-based architecture named in part after the famous natural scientist Charles Darwin. Darwin used many classification techniques over his long career as a scientist and author of biological and geological texts. DITA, developed by IBM in early 2001 and originally introduced as a series of technical articles, is used for writing and managing information using a predefined set of information structures that are broken down into topic types named tasks, concepts, and references. . . . Specialization is DITA's answer to the process of inheritance as it allows one to inherit base elements and then specify new elements according to particular informational needs.
(203) An important feature of DITA is the standardization of XML element names.
(203) Elements within DITA topics are further specialized into three different types: concepts, tasks, and references. In each of these data structures, the root element (formerly named topic) is renamed to concept, task, or reference.
(208) Once a collection of DITA topics has been authored, it is arranged using a DITA map. DITA maps contain nested lists of topicrefs, which are links to DITA topics.
(208) DocBook is a validation tool (available in both DTD and schema format) maintained by the Organization for the Advancement of Structured Information Standards (OASIS), a not for profit consortium that is involved with advancing several sets of standards related to XML and XML languages.
(209) DocBook uses elements and organizational strategies derived from printed text, so it is a popular tool for authoring books or documentation projects with complex content (such as software language manuals or computer hardware reference books).
(209) The full specification of DocBook can be overwhelming for beginners. . . . This version, known as the Simplified DocBook Document Type, contains a smaller number of elements and was originally designed to have the same number of tags and the same expressive power as HTML (OASIS online).
(210) Like DITA, DocBook has numerous XSLT style sheets already created for it, so translating DocBook documents into other formats such as XHTML or PDF is common practice.
Machine transformations by XSLT connection familiar human textual practices with automation and computer programming, representing a point at which software takes command of language in a very literal sense by replacing pattern matching and transformation operations done by humans in the textual production process, a parallel to the original takeover of basic arithmetic operations by the first nonhuman computers. Example is schema transformation via XSL.
(211) By using such tools as namespaces, schema, XLink, XPath, and XPointer, we are giving computers the same tools for recognition, seeking, searching, and verification that we ourselves use to evaluate the credibility and accessibility of our information sources.
Strategy and justification of programming customer parsers as tutor texts.
(211) To explore the concepts behind XML processing, we will build some basic XML applications using a custom parser written in an open source Internet scripting language. Though this process is more cumbersome than using a prepackaged XML parser, a standard Web browser, or existing validating architectures, such as DocBook or DITA, it ultimately gives the designer even more control and flexibility when using XML for a specific purpose.
Additional Online Resources
Using PHP to Design Custom Parsers for XML Projects
(215) This chapter discusses XML parsers and then introduces three examples of XML parsers that can be used to process and act upon particular XML elements and attributes as they are scanned from a file. The first and simplest example uses existing XML files—structured as Really Simple Syndication (RSS) newsfeeds—as source content for a basic news display page that is used to update website visitors about news or events in a streaming fashion.
(215) When we discuss our second parser example, we outline a process for creating a a CMS for keeping track of digital assets.
(215) The final parser example involves building a single sourcing system for a software documentation project. . . . this is an example of what Ann Rockley calls a level three, or “Dynamic Customized Content” single sourcing system.
(215-216) In order to create a useful XML parser, we must understand the general problem space as well as the rhetorical implications of our informational decisions. In addition, to truly understand the informational context of a metadata system, we must work on both sides of the equation: as information designers, producers, or information architects; and as hypothetical consumers or end users of the system.
Cookbook approach embracing dual scope of producer and consumer involves substantial working code; different type of digital literacy beyond reading code is writing code for machines, for example XML parsers. Admits challenge of reading code, offers commented version in appendices. Thus examples are designed to require only modification of a few variables to extend to other types of projects.
(216) Like the aesthetic dimension we
discussed in Chapter 4, the programmatic demands of building custom
parsers often require a different type of thinking than we are used
to. For instance, since machines are now an audience we must serve,
we need to figure out how to write
XML data rather than just how to read it.
(216-217) While we believe that studying these coded examples will provide much insight into the computational side of the interactive cycle between human users and XML databases, we also want to point out that the coded examples in this chapter will be challenging for readers without a background in computer programming. . . . Rather than reducing the complexity of these examples, we chose to include them, and also to provide significant commentary in areas which might be difficult for some readers to understand. The unabridged code listings in Appendices C and D are also heavily commented (marked up with additional explanatory text) throughout. . . . Our intent is for this chapter to serve as a cookbook of sorts for the theorist-practitioner or symbolic-analyst who wishes to take advantage of XML techniques in order to advance and improve knowledge management techniques in his or her own organization.
(217) We choose to discuss examples of custom-defined parsers in order to demonstrate the flexibility and customization possible when one builds their own non-validating or validating XML parser.
(217-218) The parser is vitally important for an information designer to understand because this alone determines what the end users, readers, or audiences actually see based on their interface decisions. . . . As Michael Albers notes in his introduction to the edited collection Content and Complexity, the best interface is the one that disappears, leaving the information clearly defined without distractions from the interface (6).
Comparing parser to ancient Greek rhetor, which means that sensitivity must be built into the design.
(218-219) By specifying how, when, and under what circumstances data can be extracted from elements and attributes, the parser is analogous to the ancient Greek rhetor. . . . Parsers specify the expressive and rhetorical potential of XML documents. More robust and flexible parsers have more rhetorical potential.
(219) We include these three different strategies in order to demonstrate the unique pairings that emerge when different applied technologies are paired with different rhetorical perspectives.
Traditional contractive versus process intensive communication on both sender and receiver roles; meaningful examples of Derrida comparison of good and bad writing as theme of Phaedrus.
(219-220) Though it is overly reductive and simplifies the complex social nature of information, the traditional communicative paradigm is a useful construct for understanding the role of the parser in a rhetorical act. The model described here uses a “contractive” view of technology wherein an information “receiver” is seen in a relatively passive role and information itself is chunked into discrete and unambiguous units. When communicating using XML, an information “sender” is responsible for thinking carefully and logically about how to structure data in a fashion that facilitates the extraction of useful information from a data source. The message itself then resides as potential within the XML document that this sender creates.
(220) Because of its affordability, availability, relative ease of use, support for XML, and popularity, we chose to use the server-side Internet embedded scripting language known as PHP for the three project exercises in this chapter.
(222) The installation program we will be using to set up our development computer is called XAMPP, which is an integrated installation package developed by a group of individuals calling themselves the Apache Friends Network.
(227) PHP is an Internet-embedded scripting language. PHP evolved from a language known as PHP-FI, which was written by Rasmuf Lerdorf in 1995 (Php.net online). PHP-FI originally stood for Personal Home Page Tools/Forms Interpreter.
(227-228) As of PHP version 5, the SimpleXML application programming interface (API) was introduced. SimpleXML makes formerly tedious XML tasks much easier to accomplish, and, as a result, we will use this API for the first XML parser we build.
Strategy for incorporating substantial amount of working code in a humanities oriented text is judicious choice of PHP and extensible sample code.
(228) We have designed the examples to be modular and portable; they should only require the modification of a few variables and XML data sources in order to be implemented and extended for additional types of projects. For this reason, we do not spend a great deal of time discussing the programming syntax of PHP.
(229) With up to four (or more) languages residing in a single document, there needs to be some mechanism for differentiating between them so that the Apache software can properly process the document files. This is why delimiters are so important.
(229) One must be very careful with semicolons as missing semicolons are the source of much frustration for beginning programmers.
(230) Programmers do this because spaces are not allowed in PHP variable names, but multiword variable names are more descriptive and easier to understand.
(230) Since PHP uses what is called dynamic typing, the types of data that are assigned to specific variables are determined by the symbolic composition of the data themselves.
Arrays and Loops
(230) In PHP, you can mix data of different types and store them all in the same array.
Functions, Arguments, and
(233) Variable scope, or the areas of a script in which variables can be “seen” and accessed, is an important concept to understand when thinking about functions and variables.
Functions and Default Arguments
Calling Functions with Default Arguments
PHP and XML
SimpleXML and Object-Oriented
(238) SimpleXML is a popular extension to PHP which provides a toolkit for mapping XML documents into objects that can be directly manipulated by the PHP scripting language. . . . Objects are particularly handy for us as document designers because, as we mentioned in Chapter 3, it has been predicted that we will be moving from document-centered to object-oriented ways of thinking about information and information design (Williams 321).
(241) Regardless of the level of technical difficulty, each project requires a similar amount of rhetorical consideration in both the preproduction (planning) phase as well as in the postproduction (revision and fine tuning) stage.
Project 1: RSS Parser
Ad hoc rhetorical approach for first project using a questionaire form and personas to answer them for imaginary information context.
(241-242) While it can be useful to begin this rhetorical inquiry
from a particular perspective, perhaps by considering the classical
rhetorical canons or using a rhetorician's theoretical model as a
starting point, it can also be advantageous to simply take a step
back and consider the informational context based on one's prior
(242) A general list of rhetorical considerations is outlined in the Ad Hoc Rhetorical Analysis of XML (RAX) form included in Appendix B.
(242) For this project, we are going to create personas to help us visualize an imaginary information context. Personas are fictitious characters that we can create in order to help us visualize the demographic characteristics and informational needs of a typical user. In this exercise, we will create personas for both the designer and for his audience.
(243) Based on Joe's responses, we can extract the following four design parameters from this rhetorical analysis exercise.
(244) In terms of technical implementations, the first thing we will do for this project is build an XML file that can be used for testing.
(245) Though this is a perfectly acceptable XML file, it does not yet meet the schema requirements of a valid RSS document. RSS is a particular type of XML file that uses a collection of specialized tags to present information in a standardized format.
(246) Next, we need to build the parser in PHP.
(251) The first component that is necessary to add XML writing capability is an HTML form, which will be composed of text input fields and a mechanism for sending data from those fields to a processing page.
(252) After the HTML form is designed, we need to build a script that will take data from the form's text input fields and store these values as variables.
(253-254) If one uses the POST method, these variables are not appended to the URL and are instead passed to the script behind the scenes. GET is handy when the inner page of a website needs to be bookmarked for later use, or when the variables being passed to an XML parser should be transparent. . . . POST is useful when the page should not be able to be bookmarked, or when large amounts of data need to be passed from a form to a script or parser.
(256) Unfortunately, there is no easy way to preserve formatting using the SimpleXML API.
(258-259) Andreas Mauthe and Peter Thomas describe a CMS as one that bundles both the essence of the data, or the raw data itself, and the metadata that describes that content (4-5). . . . Oftentimes, CMSs will combine human knowledge with technological support systems.
(259) The particular type of CMS we will build for this project is a digital asset management system.
(259-260) This system can be implemented in a distributed corporate setting. Rather than storing files separately on employee computers, individuals working on various projects can use such a system to store all assets in a centralized location with meaningful metadata to facilitate location and retrieval. The metadata can also be used to allow various groups within the organization to recognize the different types of information produced by each group and to better understand ways in which information can be exchanged between units.
Rhetorical analysis in second project using Carliner physical, cognitive, affective information design framework.
For the purposes of building a CMS, we can apply a model that has
been specifically developed for information design. Saul Carliner's
physical, cognitive, and affective framework is well-known for
breaking information design problems down into three
(261) Carliner's framework is useful here because it can be used in a generative fashion to produce rhetorical questions related to the ways in which users physically interact with, think about, and feel about information. These questions can then be used as guides to help a designer make informed decisions about how the XML parser will function.
Physical Design Dimension
(261) Since the physical dimension is concerned with characteristics such as page layout and design, this dimension is mostly focused on the ways in which the user will move through the asset management system.
Cognitive Design Dimension
(262) Here, paying careful attention to best practices from usability research will help us to create a more intuitive and less cognitively demanding CMS.
(263) First, designers must analyze needs. . . . Our use case diagram therefore provides a detailed and unambiguous sketch of which tasks are likely to be performed by which users.
(263-264) These goals can be business or content related and should include an evaluation component to ensure that these goals are being met through the information system.
(264) After goals have been set, we must choose the form of our communication project. . . . Although genre is something traditionally associated with literary canons or stylistic conventions, we can also have genres associated with locations and products.
(265) The next step in the cognitive design process is to prepare the design of our communication project. . . . For the CMS, we will devise an information map that shows the relationship between different files and the audiences that will need to use these files.
(265) To prepare our information map, we should consider our audiences.
(265) Finally, we must set project and produce guidelines. Carliner notes that product guidelines include editorial guidelines, production specifications, and technical specifications, while project guidelines include questions of schedule, budget, and staff (51).
Affective Design Dimension
(266) Affective design elements deal with issues such as motivation, attention, and satisfaction. In other words, even if the information is available and accessible, will users fell like using it?
(266) In order to address these affective issues, we should design our interface to provide clear instructions. . . . Additionally, when our parser eventually sorts through our XML file and creates a navigable list of assets, it should clearly demonstrate the value added by having such a content repository.
(267) For real world applications of this system, graphic designers need to be involved early in the design process.
Preproduction Design Tasks
(267) Since we are building a working CMS to store typical production assets, we need to move from an abstract and conceptual idea of our CMS to a more applied blueprint by using the information from our rhetorical analysis. We will use this information to construct a rudimentary Web form that allows us to gather data from our production archivists.
(267) As in the prior example, we can fashion a rough algorithm to help us with the sequencing and design of our XML writer and parser.
(268) Now that we have precisely defined the steps needed for the storage and retrieval of assets, we can concentrate on additional preproduction tasks by specifying how various pages will be constructed and making production decisions concerning our file and directory structures. Here we will combine guidelines for the physical dimension with the cognitive dimension by crafting a table which specifies parameters for the project. . . . These files are listed in roughly the same order in which they will be presented by first the XML writer and then the XML parser.
(268) Based on this table, we see that we will have a total of eleven files, three of which are dynamically generated. We can now add logic to these files by writing our scripts in PHP.
Building the Interface
(272) The idea is to construct a running buffer file that can then be accessed incrementally when it is time to create the final XML document.
(277) After it is created, this XML file can then be directly transformed using an XSL transformation (Figure 6.21) or parsed using our more robust CMS parser (Figure 6.22).
(277) In fact, it is useful to know that much of the hard work involved with building a customized parser can be minimized by using an XSL transformation as we do here. . . . The primary advantage of using our own custom parser in this instance is that we can take advantage of PHP's built-in functions to perform additional error checking and validation on our data.
3: Single Sourcing System
(280) [Joe D.] Williams writes that single sourcing is “using a single document source to generate multiple types of document outputs; workflows for creating multiple outputs from a document or database” (321).
(281) Like Rockley, Locke Carter agrees that single sourcing can provide benefits in terms of cutting costs, boosting revenue, creating more efficient means of production and distribution, and adding flexibility to the document design process.
Single sourcing may disrupt traditional craftsman process of earlier media practices as noted in third project whose bottom-up rhetorical approach seems like system-centric rather than task-oriented design..
(281) On the other hand, Carter also warns that document designers
must be cautious of single sourcing technology because this process
disrupts the traditional craftsman process of designing documents
individually, for a specific context and audience, from start to
(281) XML makes it simple to precisely define which modules of data can be reused across documents and how that data should appear in each individual document. Our primary goal in single sourcing is to write our source content once, then provide access to different configurations of this content using customized views.
(284) In a bottom-up approach, we start with the data rather than the predicted informational needs of our audience. This type of rhetorical analysis is more concerned with finding the appropriate level of granularity with which to surround a unit of text and finding the means of combining and repurposing these textual units in a manner that is compatible with our informational needs.
(293) Note how the informational needs of a beginning user are different from those of an advanced user and how the system attempts to anticipate and meet the needs of both users according to our bottom-up rhetorical analysis.
(293-294) By building these custom parsers, we were able to put to use both the ideas about knowledge management and rhetoric we have been discussing in the first half of the book as well as some of the technical skills we discussed in the second half. . . . Though we considered both top-down rhetorical strategies (Projects 1 and 2) and a bottom-up rhetorical strategy (Project 3), in reality, most designers will find that the best approach will incorporate both of these strategies into the design process.
(294) The theorist-practitioner model was stressed throughout our three examples because it is so important to the professional communicator working with XML technologies. On the theoretical side, one must recognize that the humanistic elements of information management and design are often overlooked for the sake of technical efficiency or simplicity. . . . Humans are emotional, fallible, and oftentimes unpredictable, and our software programs need to be designed to take these factors into account.
Theorist-practitioner model combines technical and humanities competencies, with emphasis on leveraging custom code to explore and meet overall requirements derived from rhetorical analysis.
On the practitioner side, we need to recognize that by relying on
pre-existing parsers, our creative potential and expressive
capacities are limited by the designs of other companies or other
by immersing ourselves in the low-level programming of XML parsers
can we truly design an interactive system for dealing with XML code
in exactly the way we want.
(294-295) they also must pay attention to the larger rhetorical context of the information transfer process. . . . The document designer or information architect with skills in audience analysis and other forms of rhetorical acumen will inevitably find themselves in greater demand as new Internet technologies increasingly push us closer to Berners-Lee's vision of a distributed, yet integrated, Semantic Web.
Sample Group Project: Hi-Tek Inc.
Additional Online Resources
7 XML and Your Career
XML and Knowledge Management at Work in Interdisciplinary Contexts
XML and Your Career
Technical communicator, technical editor, digital media practitioner, library scientist and interdisciplinary professional or researcher are professions in which XML likely to be important.
(304) We interviewed a college professor, two technical communicators, a software engineer, and one individual who is both a professor and a software developer for a large video game development company.
(304) The ten interview questions asked respondents to reply with information about ways in which they have personally used XML to solve problems for particular types of tasks.
Source Code for CMS
Source code does not contain any copyright or license declaration, even in the final copyright credits section.
Source Code for Single Sourcing Demonstration
Applen, J D, and Rudy McDaniel. The Rhetorical Nature of Xml: Constructing Knowledge in Networked Environments. New York: Routledge, 2009. Print.