|Mind / Matter|
By David Ness
Tuesday, December 11, 2001
J and K
J and K share some common heritage. Both are, at least in some sense, reconsiderations of APL. The languages thus share in the space that might, at least loosely, be called array processing languages because both of the languages perform highly efficient array operations. However, as we will see in a moment, each of the languages comes at this computational domain from a rather different viewpoint. Perhaps we can best begin to explain some of that distinction by making an oversimplification that might contribute to our understanding.
J has a fundamental underlying structure that treats data elements as arrays of arbitrary dimension. Each element of an array can be a number(there are several kinds), a character, a symbol or a boxed entity. Scalars, vectors and matrixes of arbitrary dimension can be boxed, and thus placed in arrays of boxes. In J the elements within a box must be homogenous, but boxing effectively `hides' the nature of the elements within (until they are unboxed, that is) so they can thus be put into arrays.
K has a fundamental underlying structure that treats elements as numbers, symbols or characters which can be aggregated into lists, which in turn can be aggregated into higher orders of list to an arbitrary depth. K detects, and treats as a special case, the situation where a list happens to be rectangular (i.e. where each item on some list contains the same number of sub-items). There are common operations which can be performed on rectangular arrays which have no significance on irregular arrays (transpose is probably the most straightforward example).
SimilaritiesLet's first describe some of the ways in which the languages are quite alike one another.
First, and most obviously, for dealing with homogenous arrays both of the languages are quite alike and quite natural. And for matrixes with conforming dimensions, both of the languages express the basic mathematical operations almost identically, the central notion of both languages being that a statement like a+b implies a matrix operation if the elements involved are matrixes.
Both of the languages also, in some circumstances, extend the dimension of arguments in order to make them conformable. The details of how this is accomplished are different in the two languages, but not in such a way as to influence choice between them much.
Both languages are also pug ugly.
Well, I know everyone's taste in beauty is different. And some find beauty in the smoke pouring out of the stacks in a coal shrouded Pittsburgh of the late 1940s. But, while some---and I am often included in the group---will like and enjoy reading J and K, it would stretch credulity to regard as beautiful any language that has as a major element the occurrence of an unbalanced right parenthesis in column one of source text (as J does). And K is similarly superficially unattractive, having been compared to odd hieroglyphics that you get when the spam you receive from the Far East encounters a western computer with no Kanji or Hirigana character sets.
But both languages are also generally terse. This has two major advantages in my view:
Both languages make it easy to use monadic and dyadic operations (operations on one variable and operations on two variables). The way function arguments are handled is, as we will see in a moment, quite different. But both languages make dealing with monadic and dyadic operations quite straightforward.
- You get a lot of code on a page, and it's easy to get an overview of a page. It's hard to `spot' anything in a program that is 50 pages long. You at least stand a chance if something significant fits on a page.
- Both languages are interpreted. And for an interpreter, less is more. Terseness contributes to speed.
Simple character strings are also treated in much the same way in both systems. Basic character strings are seen as vectors of (scalar) single characters. Once one gets beyond the simple treatment of character strings, however, into topics like insertion of sub strings or dealing with conventional ASCII files with lines of variable length, things begin to get different.
Symbols (as a data type) are structured much the same way in each of the languages. As we will see again, in a moment, symbols in K have quite a different implementation effect than symbols in J, but their structure and the concepts behind them (a quickly searchable form of information, for example) are quite similar.
Both J and K have on-line help facilities that are, in my experience, leagues better than those available in any other system I have used over the past several years. Most on-line help is both depressing and idiotic. If it's not Window's infamous `dancing paper clip' leering at you from the corner of your screen, it's some `idiot box' telling you that `Input File' is `The name of the file of input'.
Both J and K have help which is, remarkably enough, actually helpful. However, the help is of quite different kinds and magnitudes, and these differences belong more appropriately in the next section of this note, so we will take the topic up again there.
And, both languages make heavy use of special methods internally. By this I mean that there are circumstances where the interpreter `notices' something about how its intermediate results are going to be used and this may allow it to take substantial short cuts in the computational activity. An example (using J here) might be when long integers are raised to a huge power in modular arithmetic. Instead of calculating by repetitive simple multiplication (which would involve intermediate results with a horrendous number of digits), various mathematical rules can be imposed and the calculations shortened dramatically.
Another example of special methods might have to do with sorting or table lookup. For sorting, as an example, different sorting technologies are appropriate for different scales of problem and differing types of argument. For example the technology that might sort a binary vector containing 1,000,000 elements is probably different from that to sort three strings.
Enough of `What's similar?'. Let's turn to the differences, as it is there that our choice of language is likely to be determined.
DifferencesIt seems to me that the most important difference between J and K is at the root of their data structures. I think it seems fair to say that the natural data structure of J is based on arrays while that of K is based on lists. This difference has important implications for some classes of very common problems.
For example, ASCII data files usually consist of ragged lines, By this I mean that most such files have lines that are marked by the character(s) that end each line, and that the lines themselves are often of differing lengths. Standard text usually has this property. For K this causes no problem, it is quite happy to load the file into a data structure (list) that has each line stored as one element.
For J the choice is not quite as clear. One way of loading the data would be to `box' each line and then create a vector of boxes to represent all of the data in the file. This works fine, but whenever we want to work with the characters within a line, we'd have to unbox it before doing so. An alternative would be to load the file as a character array, but this would necessitate `squaring up' the data, padding each of the lines out to match the length of the longest single line, thus producing a rectangular matrix. While either of these choices could be made to work, they generally seem, to me at least, to be somewhat cumbersome in comparison with K's much more straightforward treatment.
Another dramatic difference between J and K has to do with their output. With K you can produce crude but effective graphs and simple tables with almost bizarre ease. If you have data you want plotted, you can give it the attribute `chart' and then show it and you'll see an unattractive but remarkably effective simple plot of the data. This is extraordinarily useful when you want to just probe around in some data with understanding being the goal rather than producing plots to impress somebody else.
J, on the other hand, does a wonderful job of outputting tabular data in a very nice form. If you are able to use linedraw characters, that is. I have to make this qualification because there has been a move afoot in the J community to de-commit from linedraw characters to a character representation which is more universally supportable across operating systems (particularly, in J's case, to Unix). While this is understandable from a cross-system compatibility point of view, it also destroys one of J's principal advantages over K and other languages.
K also generalizes the valence of its functions, while J does not. J functions are monadic or dyadic, and are `infix' operations, the operator occurring between the arguments if it is a dyad. K functions can take any number of arguments and (except for the conventional built-in operations) are generally invoked with a more conventional function followed by (bracketed) argument list notation.
K is also very flexible about binding arguments to functions. For example, it is easy in K to take a five variable function and bind two arguments to particular values in order to produce a function that requires three variables. This isn't as straightforward in J, at least from an appearance standpoint.
Locale / Dictionary
The two languages also treat the all important issue of `scope' rather differently. I should perhaps say at the outset of this discussion, that I am not a heavy user of these facilities in either of the two systems. It is not that such considerations are unimportant, but rather that they are really very important, but mostly so in large complex problems, larger and more complex than I usually encounter in my work. As a result my description here should be taken with a grain of salt as I may well misunderstand some important details.
In J, variables can be created in a locale. This allows us, for example to mix my code with your code without much concern for problems such as conflict of variable names so long as my work is put in one locale and your work is put in another. We need only come to an agreement about things that are imported into and exported out from each of our locales.
In K, the notion of dictionary allows similar freedom. However, in K's implementation dictionary really means a great deal more, as the entire contents of a dictionary are easily available as a data structure which can be manipulated just like any other data structure. All dictionaries have a consistent structure and behave the same way. Thus one doesn't need special internal functions to access elements in the system's dictionary, they are quite available to conventional operations just as a matter of course.
J and K have rather different ways of acquiring session input. J has it's own IDE. Although I personally do not like IDEs much, I would say that J's IDE is about as good as they can get. It is easy to cut and paste into J's IDE, and it is generally fairly easy to extract code from an interactive session if you want to save it for some later re-investigation.
K takes a different approach. No IDE. In fact just about nothing, other than the suggestion that you might want to run K on a console inside something like EMacs. Strangely, it works for me. Incredibly low overhead. No new learning about the IDE necessitated. Old tricks and habits remain good tricks. And a relatively stable world, not subject much to creeping featurism.
J and K have rather a different look. Once you get past a few of the odd quirks of J (right parenthesis in column one of source text to end functions, for example) it looks pretty much like a lot of conventional programming languages. I mean there are `if' statements, `break' statements, `while' statements, etc. So if you are used to reading C or perl you may have to clean your glasses, but you won't have to go out and buy a new pair.
K is a completely `other' matter. Symbols are hugely overloaded. Depending on context, for example, a colon can stand for at least assignment, return or if. Mostly the language is symbols, although there are a few words and some visually disconcerting (at least I find them so) `function calls' such as `_jd' (converts YYYYMMDD into a `Julian' date).
I find I have to be more careful reading K, but then K is enough shorter, even than J, that I'm not sure that overall I spend more time and energy trying to understand it. Over time I have found that I don't have a profound preference for either language on this score. Rather I like them both a good deal better than just about anything else I use---with the exception of perl, I suppose, which I like a lot for other reasons.
There are considerable differences between the two worlds with respect to the subject of the kinds of numbers that are easily represented and dealt with.
Most important, I don't think it is being terribly unfair to K to say that it just about doesn't deal with numbers other than floating point. While this is not quite literally true, K pretty much only condescends to deal with integers when they might be used in the context of subscripting a reference in a data array, and while it seems to be happy reporting that 2^30 is 1,073,741,824 it coughs on 2^31 (reporting it to be 0N, a representation of `integer null'). Needless to say it is quite happy, however, to deal with floating point (even exact floating point) numbers that are much larger than that, so in reality this causes little trouble.
J, on the other hand, allows you to deal directly and earnestly with an almost bewildering number of different numeric data types. First, there is a built-in understanding of complex numbers. With no announcement, declaration or special ado, you can say 1j1^2 (J's way of saying `(1 + i) squared') and it will respond with 0j2 (2i).
Then there are also rational numbers, J's way of dealing with exact rationals. If you type 1r2+1r3 (one half plus one third) you'll get 5r6 (five sixths). This is wonderful for some problems.
And there are exact integers of arbitrary length. Typing 2^1000x produces an exact 302 digit long number representing two taken to the one-thousandth power. This is wonderful for working on public key encryption and areas related to that.
It seems to me that, in general, K is dramatically faster than J. I have only run very simple-minded benchmarks of these products, but in areas where both are fairly natural ways to attack a particular problem, K usually seems to do it faster. To be fair I should point out, however, that the last time I ran direct benchmarks was some time ago, and I am under the impression that the innards of J may well have been tweaked more than the innards of K (to the outside world it looks as if lots of K effort has been being invested in the data base system).
I should point out, though, that most of my problems are so small that speed is not an important consideration for me in my choice of solution vehicle.
J also has some special deals for you if you happen to have some particular problems. If you type p: 10^8x you'll get 2,038,074,751 because it is the 100,000,000th prime. You can also factor numbers into prime factors (within reason). And there are some facilities for dealing with polynomials that are built into the system as well. Since I haven't used these facilities to any real extent I am too ignorant to discuss them here, but mention them because if you have a problem involving any of these things, you definitely should take a look at J to see if it is of help.
The special deals built into K's world are no where near as profound. What you are more likely to encounter, however, may be some subroutines that deal with mundane, but very important, things like counting the number of working days between any two dates, which becomes a complicated deal when different countries and different holidays are involved. In the financial world such things can be very important, and huge deals may swing on mis-calculations associated with the precise terms of financial instruments.
I don't find either J or K to be profoundly good at handling strings, particularly in regular expression or string substitution contexts. I almost always end up doing those kind of tasks in perl, although lately I will confess I have been intermixing K and perl to handle tasks where K seems to be particularly effective at pulling the original data apart into searchable pieces and then handing off to perl to do its work searching and re-forming the pieces, only to return to K for dust up and clean up.
The size of the modules, particularly in an age so dominated by the diarrheic code that gushes out of Redmond, is impressively small (for J) and incredibly impressively small (for K). J's core is about 1.5mb, and the current distribution, along with its extensive support materials, comes in a 3.8mb ZIP file.
K is even more startling. The whole of KDB, along with K comes in a 270KB (yes, that's right, KB not MB) ZIP file. K's DLL is 177K.
So both qualify on the easy on your storage dimension. K is way smaller than J, but both are so small that it probably doesn't matter much in today's world of very cheap storage.
J makes a serious attempt to provide a lot of information in its on-line help facilities. This information is very effective at providing immediate help. Whether it is actually helpful to have all of the documents, which at some point and for some people may become distracting, is something that is argued from time to time.
K's help is incredibly terse, but---a bit to my surprise---I also find it incredibly helpful. It has been a help to me from day one of my use of K, and every time I use it I find I end up learning a tiny bit more of K because while this help is terse, it is also profoundly deep.
J has a lot of tutorial material that help teach a wide range of topics. J's tutorials range from discussions of particular topics in mathematics thru tutorials on how to use J's plotting and graphing sub environments all the way to highly instructive tutorials on how to use J's socket functions to communicate with the Internet.
K doesn't have any tutorial materials, although there are some quite informative examples of K available for download.
Characteristic J K Irregular Data Not as Good Very Good Rectangular Data Just Fine Just Fine Simple IO Good for Tables Good for Plots Rational Numbers Strong Weak Integers Involved Strong Weak Data Base Weak Strong Speed Not so Fast Fast Primes Special Deal Weak
Choice of Language for ProblemSome of the choices are easy, and others are pretty much a toss-up. Let's treat them one by one.
First, integer work strongly suggests J be chosen. K has only a very limited integer capabilities, so there isn't much question about this choice. Examples of integer work might include public key encryption or digital signature calculations where exact integer calculations of at least several hundred digits in length are commonplace.
Second, on the other side, data base work is pretty much a natural for K. K is a base language for KDB, a full-scale data base language. Indeed, the major market for K has been in the financial industry where it is used to work with huge data bases that often consist of stock market transaction (`tick') data.
Third, while I am happy with either J or K if my data is naturally rectangular array oriented, I am much happier with K for irregular data (and that counts typical ASCII files as irregular data).
Fourth, this leads to (perhaps it's a corollary) the notion that I find K very useful for text work. Indeed, I sometimes think of K, in this mode, as a sort of string APL. In other words it is a very convenient interactive language for working with text data.
Fifth, in the (admittedly relatively rare) circumstances where my natural problem formulation would lead me to functions of more than two variables, I find K more comfortable than J.
Sixth, in the (again admittedly rare) circumstances I would need to work on complex or rational data, I'd choose J in a heartbeat over K. Practically speaking, however, I don't have much experience with this as I don't have much of this sort of problem.
Finally, if simple graphing and plotting are involved, I invariably choose K. Since I am only rarely in a situation where I need to impress others, I generally only graph and plot to aid in my own understanding, and K seems to me to be much simpler for this purpose.
Closing: Benchmarks and Costs
I am a at a bit of a loss how to wrap this note up. Perhaps talking a little about benchmarks and costs might summarize matters.
First, benchmarks are important, but they are only important to you if computation time is, itself, important. Most of my jobs are so small that they seem to perform instantaneously in either language. And so whether one takes one blink of the eye, while the other takes two is not of much consequence to me.
That said, in my experience K seems to generally run faster than J, so if I do have a problem where time matters, I am tempted to at least consider K first. As time passes I may decide to add some actual benchmark figures to this note, but I wanted to release the note before waiting to do all of that work. And I should be clear that the last time I ran serious head-to-head comparisons of the languages was some years ago, and each of the languages has evolved from that time.
Finally, there's cost. [Costs for J changed as of September 30, 2002: Consult the J site for current costs] For my personal use the cost of K is: free. J used to be free, but now would cost me $100/year if I were to use it. It's not worth it to me. If I was going to use a language in a commercial setting, then the entry cost of J appears to be substantially less than the entry cost of K, but I am loathe to state this too definitively as this area is replete with special deals that are often negotiated with suppliers to introduce software and get it absorbed into an organization. I would recommend that anyone interested in the application of either language should talk with the suppliers in order to clarify their needs and in order to get reasonable figures for deployment costs. That said, the last time I looked at published prices, J was $100 for individual and $600 for the Commercial version, while K doesn't have `published' prices but appears to cost many thousands of dollars).
David Ness' summary of work can be found at http://mywebpages.comcast.net/dness
J is the direct descendant of APL, developed by Ken Iverson and Roger Hui. It is available for $100/year non-professional use from JSoftware Inc. It is available for commercial use at $600 / year.
K is Arthur Whitney's language. It is sold by Kx Systems, Inc. It expands on many APL-like ideas, and carries them forward particularly in the direction of large scale data base management. A (somewhat) restricted version of K is available free for non-commercial use. K is expensive for commercial use (tens of thousands of dollars).