David Ness
Mind / Matter

Text: De- and Re-Construction

By David Ness
Tuesday, February 19, 2002

Composing Documents

To an increasing extent, documents these days may well be presented in different kinds of media. They are also made up of a number of different kinds of components. Of these the most important type is probably text. Much of the text that we write these days appears, perhaps in somewhat altered states in these various forms.

An Example

Different pieces of technology may well come into play in order to produce these documents. For example, if a document is to be presented on a screen, then it is quite likely that we need to produce our text surrounded by HTML so that it can be displayed on a screen via a browser. On the other hand, if we are want to prepare the text to be printed on a PostScript printer then we might use TeX to handle the job.


This causes some problems. While we will consider these problems in detail elsewhere, we can mention some here to provide context. A simple example is in the representation of bold or italics text. In HTML text is marked by begin and end markers. For example Bold text may appear as Bold when you are marking it up for HTML display, while it might look like {\bf Bold} if it is being marked up for TeX processing into PostScript.

Relationship to MarkUp

So there is a problem with how markup is represented. There can even be a problem with how the scope of the markup is delimited. This was also demonstrated in our very simple example. Markup in HTML tends to be delimited by begin and end tags that are similar, except that the end marker generally has a slash in addition to the string that denoted the beginning. In TeX, on the other hand, most markup is delimited by curly braces.

Most of the discussion of how markup can be handled is left for other places, however. Here, all we need to do is to recognize that we need to be able to handle the markup that occurs in the text that we deconstruct and reconstruct.

Categories of Purpose

There are several different categories of purpose for text de-/re-construction. The most obvious, well described above is that of display. But, in addition, there is the problem of managing text in order to edit it. There are also other processes which we might want to have pass our text. Let's look at some of these


The display problem has been discussed to some extent above. Mark up in TeX differs to a marked degree from the mark up for HTML display. There are also lots of other forms of display, and each of them may have their own driver languages. So there are many possible construction/deconstruction problems here.


Editing is another context. Here the technology which is used can vary all the way from rather elaborate `What you see is what you get' editors down to very straighforward ASCII-based technology which allows the very direct and simple manipulation of the text in the text bases.

Other Processes

There are other processes which we might want to run on our text as well. One example might be a spell checker. Or a concordance builder. Or a program which will scan the text and automatically add information (looking up and adding telephone numbers might be an easy one to picture).

Elements of Text Composition

The task of composing documents from text is essentially one of pulling the text into a structure that is appropriate for the document. How this task is performed depends on the purpose for which we are constructing the document.

The simplest of these structures is probably to talk about preparing text for ASCII editing. This is particularly simple because, for the most part, we do not need to perform any of the more elaborate transformations that might be needed by the more complicated problems.

Rules for Construction

Let's look at what kind of technology we can build to handle the simplest of these problems, that of preparing the text for ASCII editing. This is a particularly nice problem to look at because it is the most straightfoward of the problems, and it doesn't require that we spend much time setting things up in an elaborate fashion.

ASCII editing generally requires that our text be constructed into a simple linear sequence, and the only substantial complication has to do with how the hierarchical relationship of the text should be represented so that we can operate on it in effective and efficient ways. For the most part this is a pretty simple problem.

Drop-in Blocks

Structure and Glue

Construction Technology

Rules for Deconstruction

Structure Stripping

Recognizing Similarities

Signature / Privacy / Encryption Issues

PGP for Signature / Encryption

PGP provides us with some effective technology that allows us to support both signature and encryption. This technology has two particularly nice characteristics, it is mostly open-source and it is generally free. It is also supported by a rather substantial pre-existing infrastructure that we can make use of without much trouble.

Preparing for PGP Use

Integrating PGP with Our Technology


Deconstruction Problems

David Ness' summary of work can be found at http://mywebpages.comcast.net/dness

These days we have lots of options about how and where we publish documents. One effect of this is that our text can now be composed into a number of different documents which may bear some strong similarities to one another, and yet differ in other important ways.

If we treat the problem of producing these documents as constructing them out of text data bases we may find some of our tasks eased. In doing so, however, we also create the problem of deconstructing them back into the text data bases.