Super-Literate Programming

By David Ness
Friday, December 7, 2001

copyright ŠDavid Ness, 8 November 2001

This note is about 2/3 `done'. Many of the ideas have been worked out, but there remains a lot of clean up to be done down near the end.

Purpose

11/8/2001 2:19 AM

The purpose of this note is to begin discussion of some of the problems associated with presenting information in several different visual forms. The note introduces a concept called Super-Literate Programming, an extension of the Literate Programming ideas advanced by Knuth and many others.

The Problem

The problem considered here has both specific and general characteristics. The general aspects of the problem relate to any programming language. In addition there are some particular characteristics of J that present some special problems.

In addition, there are at least two distinct problem domains. One is a rather broad category of problems that people use J to discuss. J is, for example, used to develop course material in several different problem areas that could be called computational mathematics. In these domains we are probably particularly concerned with the way that J output might be displayed.

In addition, there is the problem of presenting J code itself, both for expository and documentation purposes. Here we have the problem of how to present code. We are probably less concerned with J output presentation and more concerned with how J syntax is presented.

General

Increasingly, these days, we want to present information not only on paper, but also on a computer screen. The problem is that this is a more complicated proposition than it might appear to be at first glance.

The complications have to do with two principal causes.

There can be a huge difference between the bandwidth of paper and that of screens; and
Documents can be interactive.

These each impose some burdens on a design which tries to let us accomplish
many different objectives without a lot of extra work.

Bandwidth

It may not be conventional to think of paper and screens in terms of bandwidth, but it is instructive. A typical page of typescript is about 100 square inches, and each square inch, printed on a medium grade laser printer, will contain about 100,000 bits---assuming that we are only concerned with black and white images (no gray-scale). This is about 10,000,000 pixels.
By contrast, a typical computer screen is about the same size but only has about 1/4 as many pixels in each dimensions. And, in addition, there are also some contexts where very small computer screens are used to display information. Palm devices, for example, have screens that are only 160x160 pixels (about 25,000 or 1/4 of 1% of a full printed page).

Differences of this magnitude imply more than just a change in degree. They actually imply a change in kind. And yet we would like to be able to have our information presented in many different ways without having to make a special adaptation for each different presentation medium.

Interactivity

Tutorials are sometimes best presented in action. And computers afford us the opportunity to manage this interaction in an effective and instructive way. However, even in circumstances where we have access to interactive tutorial technology, we may still want to be able to make some static presentations. And we would like to be able to construct paper or static screen representations without having to do a massive amount of hand adaptation.

Random Access

Documents of any type may be accessed in other than a front-to-back fashion. While such random or perhaps better non-sequential access may be rare in novels and other prose, it is not uncommon in tutorial circumstances. While the potential of non-sequential access may have some influence on how we expose some particular set of issues in a document, it becomes a particularly complex issue if there is executable code involved in such a process.

No Automation

The computational milieu also affords us the opportunity to automate processes. This is something that conventional printed documents don't allow. However, worrying about this would introduce so much complexity at this stage that this opportunity will be neglected for the purposes of this document.

Extensions

Structured documents could also be used to generate other kinds of visual presentations. Slide presentations come immediately to mind, but there are probably other kinds as well. In this early stage of consideration we will not explicitly consider this kind of presentation, but we should keep this, and other possible alternative forms of display, in mind as we structure the approach.

Specific to J

J presents some special opportunities and special problems. J is typographically an easy language to typeset, perhaps as a response to having had so much trouble with the typesetting of APL, a predecessor of many of the ideas in J.

J Boxed Output

However, J's use of boxed output presents some challenges. The fact that this is a problem area for J may be signaled by the fact that there are two modes of J execution as far as output is concerned. They are called ASCII and Linedraw, In ASCII mode the boxes that are sometimes needed in output displays are created out of conventional ASCII characters, hyphens, vertical bars, plus signs, etc. In linedraw mode the boxing characters which have been a part of the computer milieu since the early days of PCs are used to draw much nicer cleaner looking boxes.

The tradeoff is straightforward. I can't imagine anyone would prefer the look and feel of the ASCII characters, but they exist in virtually every computational environment and are supported by all operating systems. Thus if ASCII characters are chosen, displays are guaranteed to be adequate, from an appearance standpoint, so long as the font chosen is monospaced. On the other side, the linedraw characters are very attractive, but they don't exist in many fonts, and the output produced is particularly bizarre looking if a font happens to get called into play that has other characters in the key positions. In many fonts these characters are the accented vowels, and obviously output is very odd looking indeed if they happen to appear instead of the line characters.

J Syntax Presentation

Presenting the syntax of J code should not be a particularly complicated problem as J is a relatively simple language. One beneficial side-effect of using literate programming tools is that it is easy to regularize the presentation of code fragments, as the parsing rules enforced in the code presentation process can be made to conform to standard.

J Tutorial Facilities

J also has a tutorial mechanism. This facility is integrated into the IDE, and allows for the easy execution of fragments of J code. However its facilities for text display are quite limited, and very simplistic in comparison with that available in some other languages.

An Approach

Some of the problems of information display, particularly in contexts associated with computer programming, have been treated under the name Literate Programming following the lead of Donald Knuth.

Knuth invented Literate Programming to deal with the presentation of his code magnum opus, TeX, a system designed to help specify the typesetting of mathematics.

TeX is a very complex program, and since Knuth is interested not only in mathematics and typesetting, but computer programming as well, he was concerned with describing aspects of his complex program in a way that would allow them to be used not only for their primary purpose, but also in a `tutorial' role as exemplary computer programs. And they are exemplary programs indeed. In order to solve this problem, Knuth invented a particular programming style that has been successfully applied in several different circumstances. It has also been taken up, and taught, in a number of different places.

Literate Programming

Literate Programming is a `style' of recording both computer code and its documentation in one single document. The fundamental construct of literate programming is the paragraph which consists of one (or more) paragraphs of descriptive text followed by a block of code. This is
a slight oversimplification of the actual situation, but not in any way which is material to the discussion here.

Central to the notion of literate programming is a breaking down of the ordering of the document that is quite normal when computer code is involved. Most code needs to be presented in some fairly carefully managed order. This order may be a reasonable one for expository purposes, but it clearly need not necessarily be so. Knuth built his Web (Now that the word `Web' has become common because of the Internet, there is often confusion between Knuth's use of the term (which well pre-dated the Internet use---they essentially have nothing to do with one another.) out of blocks which can be ordered for expository purposes, but nevertheless processed into an order appropriate for the execution of the text.

The Role of Tangle

Knuth calls the processor that produces executable code from a web Tangle. This processor removes the expository sections of a Web and structures the code, assuming of course that the Web is good code, into a legal program. This program can then be handed to the appropriate compiler to produce an executable computer program.

The Role of Weave

The other processor that can be applied to a Web is called Weave. This processor takes a Web and produces readable documentation by typesetting the descriptive paragraphs and carefully composing the code into a standard, readable, form.

Managing a Web

Knuth's concept of a Web which can be processed in to essentially very different ways is somewhat unusual in programming. Of course, there is no such thing as a free lunch, so there is no magic to this approach, but it does allow the information about a computer program to be collected in a particularly effective way. It is quite natural to divide code into units which are small enough to be comprehensible, but also large enough to accomplish something significant enough to mention. The paragraphs of text and the blocks of code are the appropriate size to manage for this kind of purpose.

Super-Literate Programming

The idea of Super-Literate Programming is to extend this concept into a slightly broader domain. Not only are we concerned with descriptive text and code, we are also interested in presenting executions of the code along with the corresponding output. This is not a problem in the context that Knuth has used for Literate Programming. Nevertheless the concepts are tantalizingly close enough to one another to suggest that there may be something in at least considering how the basic concept of literate programming might be extended.

The Document

The main body of the document that drives this process consists of at least four parts, each of which may occur many times. They are the name, descriptive body, input segment and output display. We will call each occurrence of each one of these a fragment. The overall document consists of a document header followed by any number of fragments. Each fragment consists of a name, descriptive body, input segment and output display, at least one of which is not null. In my current view it would be unusual for a fragment not to have a name, but at this stage I wouldn't want to rule anything out until there is a practical rendering of these ideas.

Name

The name of a fragment is a title for the fragment (paragraph, input code, output).

Hierarchical Structure

Descriptive Body

Visual Element

Input Segment

Code Context

Program Input

Output Display

Generated by Input

Handling the Problems

Next, we need to describe just how the structure of the document that has been proposed can be used to solve the problems presented by each of our target areas. This involves figuring out how to process the source documents into the appropriate kind of object documents. It also requires that some decisions be made about what tool set is going to be used to build the necessary processor functions.

Producing Output

[Note: This section will require some thinking and working out.] In particular the way that this might all fit into the domain of Blogs will require some special hard thinking.

Producing Tutorials

Producing Blogs

Producing Wikis

Producing Screens

Producing Paper

Producing an Outline

Tools for Production

J

Perl

REBOL

Related Worlds

REBOL

Almost Free Text

Resources

Some literate programming resources are:

Literate Programming by Donald E. Knuth (Stanford, California: Center for the Study of Language and Information, 1992), xvi+368pp. (CSLI Lecture Notes, no. 27.) ISBN 0-937073-80-6 Japanese translation by Makoto Arisawa, Bungeiteki Programming (Tokyo: ASCII Corporation, 1994), 463pp

Nelson Beebe's bibliography http://www.math.utah.edu:8080/pub/tex/bib/ index-table-l.html

Tex and Latex : Drawing and Literate Programming/Book and Disk (McGraw-Hill Programming Tools for Scientists & Engineers) by Eitan M. Gurari (Hardcover December 1993) Limited Availability

Weaving a Program: Literate Programming in Web, Wayne Sewell Out of Print--Limited
Availability Computational character processing : character coding, input, output, synthesis, ordering, conversion, text compression, encryption, display hashing, literate programming : bibliography by Conrad Sabourin Out of Print--Limited Availability

David Ness' summary of work can be found at http://mywebpages.comcast.net/dness
This starts with a notion that was important to Don Knuth as he developed TeX. While working on this large project he became concerned with the question of how TeX's code was to be documented, and he developed the idea of Literate Programming to deal with this task.

Home