Monday, August 29, 2016

Interpreted programming languages

Last night I drifted off while reading a Lisp book.
xkcd 224.

It's finally registered on me that much of the past half century of misunderstandings and confusions about the semantics of Lisp, of quotation, and, yes, of fexprs, can be accounted for by failure to recognize there is a theoretical difference between an interpreted programming language and a compiled programming language.  If we fail to take this difference into account, our mathematical technology for studying compiled programming languages will fail when applied to interpreted languages, leading to loss of coherence in language designs and a tendency to blame the language rather than the theory.

Technically, a compiler translates the program into an executable form to be run thereafter, while an interpreter figures out what the program says to do and just does it immediately.  Compilation allows higher-performance execution, because the compiler takes care of reasoning about source-code before execution, usually including how to optimize the translation for whatever resources are prioritized (time, space, peripherals).  It's easy to suppose this is all there is to it; what the computer does is an obvious focus for attention.  One might then suppose that interpretation is a sort of cut-rate alternative to compilation, a quick-and-dirty way to implement a language if you don't care about performance.  I think this misses some crucial point about interpretation, some insight to be found not in the computer, but in the mind of the programmer.  I don't understand that crucial insight clearly enough — yet — to focus a whole blog post on it; but meanwhile, there's this theoretical distinction between the two strategies which also isn't to be found in the computer's-eye view, and which I do understand enough about to focus this blog post on.

It's not safe to say the language is the same regardless of which way it's processed, because the language design and the processing strategy aren't really independent.  In principle a given language might be processed either way; but the two strategies provide different conceptual frameworks for thinking about the language, lending themselves to different language design choices, with different purposes — for which different mathematical properties are of interest and different mathematical techniques are effective.  This is a situation where formal treatment has to be considered in the context of intent.  (I previously blogged about another case where this happens, formal grammars and term-rewriting calculi, which are both term-rewriting systems but have nearly diametrically opposite purposes; over t‍ha‍r.)

I was set onto the topic of this post recently by some questions I was asked about Robert Muller's M-Lisp.  My dissertation mentions Muller's work only lightly, because Muller's work and mine are so far apart.  However, because they seem to start from the same place yet lead in such different directions, one naturally wants to know why, and I've struck on a way to make sense of it:  starting from the ink-blot of Lisp, Muller and I both looked to find a nearby design with greater elegance — and we ended up with vastly different languages because Muller's search space was shaped by a conceptual framework of compilation while mine was shaped by a conceptual framework of interpretation.  I will point out, below, where his path and mine part company, and remark briefly on how this divergence affected his destination.

The mathematical technology involved here, I looked at from a lower-level perspective in an earlier post.  It turns out, from a higher-level perspective, that the technology itself can be used for both kinds of languages, but certain details in the way it is usually applied only work with compiled languages and, when applied to interpreted languages, result in the trivialization of theory noted by Wand's classic paper, "The Theory of Fexprs is Trivial".

Universal program
Half a century's worth of misunderstandings and confusions

I'll explore this through a toy programming language which I'll then modify, starting with something moderately similar to what McCarthy originally described for Lisp (before it got unexpectedly implemented).

This is a compiled programming language, without imperative features, similar to λ-calculus, for manipulating data structures that are nested trees of atomic symbols.  The syntax of this language has two kinds of expressions:  S-expressions (S for Symbolic), which don't specify any computation but merely specify constant data structures — trees of atomic symbols — and M-expressions (M for Meta), which specify computations that manipulate these data structures.

S-expressions, the constants of the language, take five forms:

S   ::=   s | () | #t | #f | (S . S)   .
That is, an S-expression is either a symbol, an empty list, true, false, or a pair (whose elements are called, following Lisp tradition, its car and cdr).  A symbol name is a sequence of one or more upper-case letters.  (There should be no need, given the light usage in this post, for any elaborate convention to clarify the difference between a symbol name and a nonterminal such as S.)

I'll assume the usual shorthand for lists, (S ...) ≡ (S . (...)), so for example (FOO BAR QUUX) ≡ (FOO . (BAR . (QUUX . ()))); but I won't complicate the formal grammar with this shorthand since it doesn't impact the abstract syntax.

The forms of M-expressions start out looking exactly like λ-calculus, then add on several other compound forms and, of course, S-expressions which are constants:

M   ::=   x | [λx.M] | [M M] | [if M M M] | [car M] | [cdr M] | [cons M M] | [eq? M M] | S   .
The first form is a variable, represented here by nonterminal x.  A variable name will be a sequence of one or more lower-case letters.  Upper- versus lower-case letters is how McCarthy distinguished between symbols and variables in his original description of Lisp.

Variables, unlike symbols, are not constants; rather, variables are part of the computational infrastructure of the language, and any variable might stand for an arbitrary computation M.

The second form constructs a unary function, via λ; the third applies a unary function to a single argument.  if expects its first argument to be boolean, and returns its second if the first is true, third if the first is false.  car and cdr expect their argument to be a pair and extract its first and second element respectively.  cons constructs a pair.  eq? expects its two arguments to be S-expressions, and returns #t if they're identical, #f if they're different.

Compound unary functions, constructed by λ, are almost first-class.  They can be returned as the values of expressions, and they can be assigned to variables; but as the grammar is set out, they cannot appear as elements of data structures.  A pair expression is built up from two S-expressions, and a compound unary function is not an S-expression.  McCarthy's original description of Lisp defines S-expressions this way; his stated purpose was only to manipulate trees of symbols.  Trained as I am in the much later Lisp culture of first-class functions and minimal constraints, it felt unnatural to follow this narrower definition of S-expressions; but in the end I had to define it so.  I'm trying to reproduce the essential factors that caused Lisp to come out the way it did, and, strange to tell, everything might have come out differently if not for the curtailed definition of S-expressions.

(One might ask, couldn't we indirectly construct a pair with a function in it using cons?  A pair with a function in it would thus be a term that can't be represented directly by a source expression.  This point likely doesn't matter directly to how things turned out; but fwiw, I suspect McCarthy didn't have in mind to allow that, no.  It's entirely possible he also hadn't really thought about it yet at the preliminary stage of design where this detail had its impact on the future of Lisp.  It's the sort of design issue one often discovers by playing around with a prototype — and in this case, playing around with a prototype is how things got out of hand; more on that later.)


Somehow we want to specify the meanings of programs in our programming language.  Over the decades, a number of techniques for formal PL semantics have been entertained.  One of the first things tried was to set up a term-rewriting machine modeled roughly on λ-calculus, that would perform small finite steps until it reached a final state; that was Peter Landin's notion, when Christopher Strachey set him onto the problem in the 1960s.  It took some years to get the kinks out of that approach, and meanwhile other techniques were tried — such as denotational semantics, meta-circular evaluation — but setting up a term-rewriting calculus has been quite a popular technique since the major technical obstacles to it were overcome.  Of the three major computational models tied together in the 1930s by the Church-Turing thesis, two of them were based on term-rewriting:  Turing machines, which were the convincing model, the one that lent additional credibility to the others when they were all proven equivalent; and λ-calculus, which had mathematical elegance that the nuts-and-bolts Turing-machine model lacked.  The modern "small-step" rewriting approach to semantics (as opposed to "big-step", where one deduces how to go in a single step from start to finish) does a credible job of combining the strengths of Turing-machines and λ-calculus; I've a preference for it myself, and it's the strategy I'll use here.  I described the technique in somewhat more depth in my previous post on this material.

Small-step semantics applies easily to this toy language because every intermediate computational state of the system is naturally represented by a source-code expression.  That is, there is no obvious need to go beyond the source-code grammar we've already written.  Some features that have been omitted from the toy language would make it more difficult to limit all computational states to source expressions; generally these would be "stateful" features, such as input/output or a mutable store.  Landin used a rewriting system whose terms (computational states) were not source-code expressions.  One might ask whether there are any language features that would make it impossible to limit computational states to source expressions, and the answer is essentially yes, there are — features related not to statefulness, but to interpretation.  For now, though, we'll assume that all terms are source expressions.

We can define the semantics of the language in reasonably few rewriting schemata.

[[λx.M1] M2]   →   M1[x ← M2]
[if #t M1 M2]   →   M1
[if #f M1 M2]   →   M2
[car (S1 . S2)]   →   S1
[cdr (S1 . S2)]   →   S2
[cons S1 S2]   →   (S1 . S2)
[eq? S S]   →   #t
[eq? S1 S2]   →   #f     if the Sk are different.

Rewriting relation → is the compatible closure of these schemata; that is, for any context C and terms Mk, if M1 → M2 then C[M1] → C[M2].  Relation → is also Church-Rosser:  although a given term M may be rewritable in more than one way, any resulting difference can always be eliminated by later rewriting.  That is, the reflexive transitive closure →* has the diamond property:  if M1 →* M2 and M1 →* M3, then there exists M4 such that M2 →* M4 and M3 →* M4.

Formal equality of terms, =, is the symmetric closure of →* (thus, the reflexive symmetric transitive compatible closure of the schemata, which is to say, the least congruence containing the schemata).

Another important relation is operational equivalence, ≅.  Two terms are operationally equivalent just if replacing either by the other in any possible context preserves the observable result of the computation.  M1 ≅ M2 iff for every context C and S-expression S,  C[M1] ↦* S  iff  C[M2] ↦* S.  (Fwiw, relation ↦ is what the computation actually does, versus → which is anything the rewriting calculus could do; → is compatible Church-Rosser and therefore nice to work with mathematically, but ↦ is deterministic and therefore we can be sure it does what we meant it to.  Another way of putting it is that → has the mathematical character of λ-calculus while ↦ has the practical character of Turing-machines.)

Operational equivalence is exactly what must be guaranteed in order for an optimizing compiler to safely perform a local source-to-source transformation:  as long as the two terms are operationally equivalent, the compiler can replace one with the other in any context.  The rewriting calculus is operationally sound if formal equality implies operational equivalence; then the rewriting calculus can supply proofs of operational equivalence for use in optimizing compilation.

Before moving on, two other points of interest about operational equivalence:

  • Operational equivalence of S-expressions is trivial; that is, S1 and S2 are operationally equivalent only if they are identical.  This follows immediately by plugging the trivial context into the definition of operational equivalence, C[Sk] ≡ Sk.  Thus, in every non-trivial operational equivalence M1 ≅ M2, at least one of the Mk is not an S-expression.

  • All terms in the calculus — all M-expressions — are source expressions; but if there were any terms in the calculus that were not source expressions, they would be irrelevant to a source-to-source optimizer; however, if these non-source terms could be usefully understood as terms in an intermediate language used by the compiler, an optimizer might still be able to make use of them and their formal equalities.

Universal program

McCarthy's Lisp language was still in its infancy when the project took an uncontrollable turn in a radically different direction than McCarthy had envisioned going with it.  Here's what happened.

A standard exercise in theory of computation is to construct a universal Turing machine, which can take as input an encoding of an arbitrary Turing machine T and an input w to T, and simulate what T would do given input w.  This is an extremely tedious exercise; the input to a Turing machine looks nothing like the control device of a Turing machine, so the encoding is highly intrusive, and the control device of the universal machine is something of an unholy mess.  McCarthy set out to lend his new programming language mathematical street cred by showing that not only could it simulate itself with a universal program, but the encoding would be much more lucid and the logic simpler in contrast to the unholy mess of the universal Turing machine.

The first step of this plan was to describe an encoding of programs in the form of data structures that could be used as input to a program.  That is to say, an encoding of M-expressions as S-expressions.  Much of this is a very straightforward homomorphism, recursively mapping the non-data M-expression structure onto corresponding S-expressions; for our toy language, encoding φ would have

  • φ(x) = symbol s formed by changing the letters of its name from lower-case to upper-case. Thus, φ(foo) = FOO.
  • φ([λx.M]) = (LAMBDA φ(x) φ(M)).
  • φ([M1 M2]) = (φ(M1) φ(M2)).
  • φ([if M1 M2 M3]) = (IF φ(M1) φ(M2) φ(M3)).
  • φ([car M]) = (CAR φ(M)).
  • φ([cdr M]) = (CDR φ(M)).
  • φ([cons M1 M2]) = (CONS φ(M1) φ(M2)).
  • φ([eq? M1 M2]) = (EQ φ(M1) φ(M2)).

(This encoding ignores the small detail that certain symbol names used in the encoding — LAMBDA, IF, CAR, CDR, CONS, EQ — must not also be used as variable names, if the encoding is to behave correctly.  McCarthy seems not to have been fussed about this detail, and nor should we be.)

For a proper encoding, though, S-expressions have to be encoded in a way that unambiguously distinguishes them from the encodings of other M-expressions.  McCarthy's solution was

  • φ(S) = (QUOTE S).

Now, in some ways this is quite a good solution.  It has the virtue of simplicity, cutting the Gordian knot.  It preserves the readability of the encoded S-expression, which supports McCarthy's desire for a lucid encoding.  The main objection one could raise is that it isn't homomorphic; that is, φ((S1 . S2)) is not built up from φ(S1) and φ(S2).

As McCarthy later recounted, they had expected to have plenty of time to refine the language design before it could be implemented.  (The FORTRAN compiler, after all, had been a massive undertaking.)  Meanwhile, to experiment with the language they began hand-implementing particular functions.  The flaw in this plan was that, because McCarthy had been so successful in demonstrating a universal Lisp function eval with simple logic, it wasn't difficult to hand-implement eval; and, because he had been so successful in making the encoding lucid, this instantly produced a highly usable Lisp interpreter.  The sudden implementation precipitated a user community and substantial commitment to specifics of what had been a preliminary language design.

All this might have turned out differently if the preliminary design had allowed first-class functions to be elements in pairs.  A function has to be encoded, homomorphically, which would require a homomorphic encoding of pairs, perhaps φ((M1 . M2)) = (CONS φ(M1) φ(M2)); once we allow arbitrary M-expressions within the pair syntax, (M1 . M2), that syntax itself becomes a pair constructor and there's really no need for a separate cons operator in the M-language; then CONS can encode the one constructor.  One might then reasonably restrict QUOTE to base cases; more, self-encode () #t and #f, leaving only the case of encoding symbols, and rename QUOTE to SYMBOL.  The encoding would then be fully homomorphic — but the encodings of large constant data structures would become unreadable.  For example, fairly readable constant structure

would encode through φ as
That didn't happen, though.

The homomorphic, non-QUOTE encoding would naturally tend to produce a universal function with no practical potential for meta-programming.  In theory, one could use the non-homomorphic QUOTE encoding and still not offer any native meta-programming power.  However, the QUOTE-based encoding means there are data structures lying around that look exactly like working executable code except that they happen to be QUOTE‍d.  In practice, the psychology of the notation makes it rather inevitable that various facilities in the language would allow a blurring of the line between data and code.  Lisp, I was told when first taught the language in the 1980s, treats code as data.  Sic:  I was not told Lisp treats data as code, but that it treats code as data.

In other words, Lisp had accidentally become an interpreted language; a profoundly different beast than the compiled language McCarthy had set out to create, and one whose character naturally suggests a whole different set of features that would not have occurred to someone designing a compiled language in 1960.  There were, of course, some blunders along the way (dynamic scope is an especially famous one, and I would rate the abandonment of fexprs in favor of macros as another of similar magnitude); but in retrospect I see all that as part of exploring a whole new design space of interpreted features.  Except that over the past three decades or so the Lisp community seems to have somewhat lost track of its interpreted roots; but, I'll get back to that.

Of interest:

  • In S-expression Lisp, all source expressions are S-expressions.  It is no less true now than before that an operational equivalence M1 ≅ M2 can only be nontrivial if at least one of the Mk is not an S-expression; but now, if the Mk are source expressions, we can be absolutely certain that they are both S-expressions.  So if the operational equivalence relation is restricted to source expressions, it's trivial.  This isn't disastrous; it just means that, in order to have nontrivial theory, we are going to have to have some terms that are not source expressions (as Landin did, though for a different reason); and if we choose to compile the language, we won't be allowed to call our optimizations "local source-to-source" (any given optimization could be one or the other, but not both at once).

  • This is the fork in the road where Muller and I went our separate ways.  Muller's M-Lisp, taking the compiled-language view, supposes that S-expressions are encoded homomorphically, resulting in a baseline language with no native meta-programming power.  He then considers how to add some meta-programming power to the resulting language.  However, practical meta-programming requires the programmer be able to write lucid code that can also be manipulated as data; and the homomorphic encoding isn't lucid.  So instead, meta-programming in Muller's extended M-Lisp uses general M-expressions directly (rather than their encodings).  If an M-expression turns out to be wanted as data, it then gets belatedly encoded — with the drawback that the M-expression can't be rewritten until the rewriting schemata can tell it won't be needed as data.  This causes difficulties with operational equivalence of general M-expressions; in effect, as the burden of meta-programming is shifted from S-expressions to general M-expressions, it carries along with it the operational-equivalence difficulties that had been limited to S-expressions.


McCarthy hadn't finished the details of M-expressions, so S-expression Lisp wasn't altogether an encoding of anything; it was itself, leaving its implementors rather free to invent it as they went along.  Blurring the boundary between quoted data and unquoted code provided meta-programming facilities that hadn't been available in compiled languages (essentially, I suggest, a sort of flexibility we enjoy in natural languages).  In addition to QUOTE itself (which has a rather fraught history entangled with first-class functions and dynamic scope, cf. §3.3.2 of my dissertation), from very early on the language had fexprs, which are like LAMBDA-constructed functions except that they treat their operands as data — as if the operands had been QUOTE‍d — rather than evaluating them as code (which may later be done, if desired, explicitly by the fexpr using EVAL).  In 1963, macros were added — not the mere template-substitution macros found in various other languages, but macros that treat their operands as data and perform an arbitrary computation to generate a data structure as output, which is then interpreted as code at the point of call.

But how exactly do we specify the meanings of programs in this interpreted S-expression language?  We could resort to the meta-circular evaluator technique; this is a pretty natural strategy since an evaluator is exactly what we have as our primary definition of the language.  That approach, though, is difficult to work with mathematically, and in particular doesn't lend itself to proofs of operational equivalence.  If we try to construct a rewriting system the way we did before, we immediately run into the glaring practical problem that the same representation is used for executable code, which we want to have nontrivial theory, and passive data which necessarily has perfectly trivial theory.  That is, as noted earlier, all source expressions are S-expressions and operational equivalence of S-expressions is trivial.  It's possible to elaborate in vivid detail the theoretical train wreck that results from naive application of the usual rewriting semantics strategy to Lisp with quotation (or, worse, fexprs); but this seems to be mainly of interest if one is trying to prove that something can't be done.  I'm interested in what can be done.

If what you want is nontrivial theory, that could in principle be used to guide optimizations, this is not difficult to arrange (once you know how; cf. my past discussion of profundity index).  As mentioned, all nontrivial operational equivalences must have at least one of the two terms not a source expression (S-expression), therefore we need some terms that aren't source expressions; and our particular difficulty here is having no way to mark a source expression unmistakably as code; so, introduce a primitive context that says "evaluate this source expression".  The new context only helps with operational equivalence if it's immune to QUOTE, and no source expression is immune to QUOTE, so that's yet another way to see that the new context must form a term that isn't a source expression.

Whereas the syntax of the compiled M-language had two kinds of terms — constant S-expressions and computational M-expressions — the syntax of the interpreted S-language will have three kinds of terms.  There are, again, the "constant" terms, the S-expressions, which are now exactly the source expressions.  There are the "computational" terms, which are needed for the actual work of computation; these are collectively shaped something like λ-calculus.  We expect a big part of the equational strength of the rewriting calculus to reside in these computational terms, roughly the same equational strength as λ-calculus itself, and therefore of course those terms have to be entirely separate from the source expressions which can't have nontrivial equational theory.  And then there are the "interpretation" terms, the ones that orchestrate the gradual conversion of source expressions into computational expressions.  The code-marking terms are of this sort.  The rewriting rules involving these "interpretation" terms will amount to an algorithm for interpreting source code.

This neat division of terms into three groups won't really be as crisp as I've just made it sound.  Interpretation is by nature a gradual process whose coordination seeps into other parts of the grammar.  Some non-interpretation terms will carry along environment information, in order to make it available for later use.  This blurring of boundaries is perhaps another part of the flexibility that (I'll again suggest) makes interpreted languages more similar to natural languages.

I'll use nonterminal T for arbitrary terms.  Here are the interpretation forms.

T   ::=   [eval T T] | ⟨wrap T⟩ | e   .
Form [eval T1 T2] is a term that stands for evaluating a term T1 in an environment T2.  This immediately allows us to distinguish between statements such as
S1 ≅ S2
[eval S1 e0] ≅ [eval S2 e0]
∀e, [eval S1 e] ≅ [eval S2 e]   .
The first proposition is the excessively strong statement that S-expressions Sk are operationally equivalent — interchangeable in any context — which can only be true if the Sk are identically the same S-expression.  The second proposition says that evaluating S1 in environment e0 is operationally equivalent to evaluating S2 in environment e0 — that is, for all contexts C and all S-expressions S, C[[eval S1 e0]] ↦* S  iff  C[[eval S2 e0]] ↦* S.  The third proposition says that evaluating S1 in any environment e is operationally equivalent to evaluating S2 in the same e — which is what we would ordinarily have meant, in a compiled language, if we said that two executable code (as opposed to data) expressions Sk were operationally equivalent.

The second form, ⟨wrap T⟩, is a wrapper placed around a function T, that induces evaluation of the operand passed to the function.  If T is used without such a wrapper (and presuming T isn't already a wrapped function), it acts directly on its unevaluated operand — that is, T is a fexpr.

The third form, e, is simply an environment.  An environment is a series of symbol-value bindings, ⟪sk←Tk⟫; there's no need to go into gory detail here (though I did say more in a previous post).

The computational forms are, as mentioned, similar to λ-calculus with some environments carried along.

T   ::=   x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩   .
Here we have a variable, a combination, and two kinds of function.  Form ⟨λx.T⟩ is a function that substitutes its operand for x in its body T.  Variant ⟨εx.T⟩ substitutes its dynamic environment for x in its body T.

Form [combine T1 T2 T3] is a term that stands for combining a function T1 with an operand T2 in a dynamic environment T3.  The dynamic environment is the set of bindings in force at the point where the function is called; as opposed to the static environment, the set of bindings in force at the point where the function is constructed.  Static environments are built into the bodies of functions by the function constructor, so they don't show up in the grammar.  For example, [eval (LAMBDA X FOO) e0] would evaluate to a function with static environment e0, of the form ⟨wrap ⟨λx.[eval FOO ⟪...⟫]⟩⟩ with the contents of e0 embedded somewhere in the ⟪...⟫.

Putting it all together,

T   ::=   x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩ | [eval T T] | ⟨wrap T⟩ | e | S   .

The rewriting schemata naturally fall into two groups, those for internal computation and those for source-code interpretation.  (There are of course no schemata associated with the third group of syntactic forms, the syntactic forms for passive data, because passive.)  The computation schemata closely resemble λ-calculus, except with the second form of function used to capture the dynamic environment (which fexprs sometimes need).

[combine ⟨λx.T1⟩ T2 T3]   →   T1[x ← T2]
[combine ⟨εx.T1⟩ T2 T3]   →   [combine T1[x ← T3] T2 T3]   .
The interpretation schemata look very much like the dispatching logic of a Lisp interpreter.
[eval d T]   →   d
        if d is an empty list, boolean, λ-function, ε-function, or environment
[eval s e]   →   lookup(s,e)     if symbol s is bound in environment e
[eval (T1 T2) T3]   →   [combine T1 T2 T3]
[combine ⟨wrap T1⟩ T2 T3]   →   [combine T1 [eval T2 T3] T3]   .
(There would also be some atomic constants representing primitive first-class functions and reserved operators such as if, and schemata specifying what they do.)

Half a century's worth of misunderstandings and confusions

As I remarked earlier, Lisp as we know it might not have happened — at least, not when and where it did — if McCarthy had thought to allow first-class functions to occur in pairs.  The thing is, though, I don't think it's all that much of an "accident".  He didn't think to allow first-class functions to occur in pairs, and perhaps the reason we're likely to think to allow them today is that our thinking has been shaped by decades of the free-wheeling attitude fostered by the language that Lisp became because he didn't think to then.  The actual sequence of events seems less unlikely than one might first suppose.

Researchers trying to set up semantics for Lisp have been led astray, persistently over the decades, by the fact that the primary Lisp constructor of first-class functions is called LAMBDA.  Its behavior is not that of calculus λ, exactly because it's entangled with the process of interpreting Lisp source code.  This becomes apparent when contemplating rewriting calculi for Lisp of the sort I've constructed above (and have discussed before on this blog):  When you evaluate a LAMBDA expression you get a wrapped function, one that explicitly evaluates its operand and then passes the result to a computational function; that is, passes the result to a fexpr.  Scan that:  ordinary Lisp functions do not correspond directly to calculus λ-abstractions, but fexprs do correspond directly to calculus λ-abstractions.  In its relation to Lisp, λ-calculus is a formal calculus of fexprs.

Much consternation has also been devoted to the perceived theoretical difficulty presented by Lisp's quotation operator (and presented in more extreme form by fexprs), because it presents a particular context that can distinguish any two S-expressions placed into it:  (QUOTE S1) and (QUOTE S2) are observably distinct whenever the Sk are distinct from each other.  Yet, this observation only makes sense in a compiled programming language.  Back in the day, it would have been an unremarkable observation that Lisp only has syntax for data structures, no syntax at all for control.  Two syntactically distinct Lisp source expressions are operationally non-equivalent even without any quotation or fexpr context, because they don't represent programs at all; they're just passive data structures.  The context that makes a source expression code rather than data is patently not in the source; it's in what program you send the source expression to.  Conventional small-step operational semantics presumes the decision to compile, along with a trivial interpretive mapping between source expressions and internal computational terms (so the interpretive mapping doesn't have to appear explicitly in the rewriting schemata).  It is true that, without any such constructs as quotation or fexprs, there would be no reason to treat the language as interpreted rather than compiled; but once you've crossed that Rubicon, the particular constructs like quotation or fexprs are mere fragments of, and can be distractions from, the main theoretical challenge of defining the semantics of an interpreted language.

The evolution of Lisp features has itself been a long process of learning how best to realize the flexibility offered by interpreted language.  Fexprs were envisioned just about from the very beginning — 1960 — but were sabotaged by dynamic scope, a misfeature that resulted from early confusion over how to handle symbol bindings in an interpreter.  Macros were introduced in 1963, and unlike fexprs they lend themselves to preprocessing at compile time if one chooses to use a compiler; but macros really ought to be much less mathematically elegant... except that in the presence of dynamic scope, fexprs are virulently unstable.  Then there was the matter of first-class functions; that's an area where Lisp ought to have had a huge advantage; but first-class functions don't really come into their own without static scope (The Art of the Interpreter noted this) — and first-class functions also present a difficulty for compilers (which is why procedures in ALGOL were second-class).  The upshot was that after Lisp 1.5, when Lisp splintered into multiple dialects, first-class functions went into eclipse until they reemerged in the mid-1970s when Scheme introduced static scope into the language.  Fexprs held on longer but, ironically, were finally rejected by the Lisp community in 1980 — just a little ahead of the mainstream adoption of Scheme's static-scope innovation.  So for the next twenty years and more, Lisp had static scope and first-class functions, but macros and no fexprs.  Meanwhile, EVAL — key workhorse of meta-linguistic flexibility — was expunged from the new generation of mainstream Lisps and has had great difficulty getting back in.

The latter half of Lisp's history has been colored by a long-term trend in programming language design as a whole.  I've alluded to this several times above.  I have no specific sources to suggest for this; it's visible in the broad sweep of what languages were created, what research was done, and I've sensed it though my interactions with the Lisp community over the past thirty years.  When I learned Lisp in the mid-1980s, it was from the Revised Maclisp Manual, Saturday Evening Edition (which I can see a few feet away on my bookshelf as I write this, proof that manuals can be well-written).  Maclisp was a product of the mentality of the 1970s.  Scheme too was a product of that mentality.  And what comes through to me now, looking back, isn't the differences between those languages (different though they are), but that those people knew, gut deep, that Lisp is an interpreted language — philosophically, regardless of the technical details of the language processing software.  The classic paper I cited above for the relationship between first-class functions and static scope was one of the "lambda" papers associated with the development of Scheme: "The Art of the Interpreter".  The classic textbook — the Wizard Book — that emerged from the Scheme design is Structure and Interpretation of Computer Programs.

But then things changed.  Compilation had sometimes intruded into Lisp design, yes (with unfortunate results, as I've mentioned), but the intrusion became more systematic.  Amongst Scheme's other achievements it had provided improved compilation techniques, a positive development but which also encouraged greater focus on the challenges of compilation.  We refined our mathematical technology for language semantics of compiled languages, we devised complex type systems for use with compiled languages, more and more we designed our languages to fit these technologies — and as Lisp didn't fit, more and more we tweaked our Lisp dialects to try to make them fit.  Of course some of the indigenous features of Lisp couldn't fit, because the mathematical tools were fundamentally incompatible with them (no pun intended).  And somewhere along the line, somehow, we forgot, perhaps not entirely but enough, that Lisp is interpreted.  Second-class syntax has lately been treated more and more as if it were a primary part of the language, rather than a distraction from the core design.  Whatever merits such languages have, wholeheartedly embracing the interpretation design stance is not among them.

I'm a believer in trying more rather than less.  I don't begrudge anyone their opportunity to follow the design path that speaks to them; but not all those paths speak to me.  Second-class syntax doesn't speak to me, nor recasting Lisp into a compiled language.  I'm interested in compiling Lisp, but want the language design to direct those efforts rather than the other way around.  To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see.  To do that, it seems to me we have to recognize that it is a distinct path, the distinct philosophy of interpretation; and, in company with that, we need to hone our mathematical technology for interpreted languages.

These are your father's parentheses
Elegant weapons
for a more... civilized age.
xkcd 297.


  1. I enjoyed this blog post and I followed the technical developments, but I wouldn't be able to describe concisely and precisely after reading it what is the difference between interpretation and compilation that you are referring to -- if it is a "theoretical" difference I would expect a concise, formal expression of it.

    1. (argh; trouble getting the comments interface to work right.)

      Finding clear explanations is hard. Okay, here's another try:

      In the compiled mindset, the form of the source-code is understood to directly encode the intended computational behavior, and the form of expression has no other significance. The rewriting schemata explicitly express only computation, while translation of source-code to computation is embodied only by the selection of which schema to apply — which must be independent of context if the step relation is to be compatible.

      In the interpreted mindset, the process of translation from source-code to computation is understood to involve behavioral awareness of the form of expression. This requires the rewriting schemata to explicitly address the translation as well as the computation; and, since the significance of source-code is now inherently dependent on context (because of behavioral awareness of form of expression), nontrivial operational equivalence needs the reducible terms to be non-source-code.

  2. I think what you are calling compiled vs. interpreted languages are already known by other names. It's common to call them static vs. dynamic languages, though this gets tangled up with the orthogonal question of type: Scheme is a static language with dynamic types, and I was reading a while back about a dynamic language that has statically typed records. Going further back, static languages are fundamentally non-self-modifying languages running on Harvard machines, whereas dynamic languages are self-modifying languages running on Von Neumann machines.

    1. My first instinct is that the interpreted/compiled distinction I'm after is simpler than dynamic/static, involving less of the machinery of language (and probably being of lower dimensionality). It's a tricky point, though; lots of food for thought there.

  3. "To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see."

    I wholeheartedly agree. To me, one of the main attractions of interpretation is that it puts the focus back on runtime: this is important because, on the Internet, "it's always runtime somewhere". One machine's code is always another machine's data.

    I think Alan Kay understood this deeply with Smalltalk and its view of the 'net as made of small organic objects that can update at any time. And I think this understanding was almost completely lost with C++ and Java and their focus on compilation.

    Really fascinating to see some of this history, though I knew a little of it - in particular, that M-expressions used case to distinguish between symbols and variables. Is that where Prolog inherited that particular (unfortunate) quirk from?

    But I'm confused by this:

    "In S-expression Lisp, all source expressions are S-expressions... if the Mk are source expressions, we can be absolutely certain that they are both S-expressions. So if the operational equivalence relation is restricted to source expressions, it's trivial."

    How can we be 'absolutely certain'? Didn't we just establish that in 'S-expression Lisp', the φ-encoding translates S-expressions (from original M-expression Lisp) into (QUOTE) expressions? And M-expressions into non-QUOTE forms of S-expressions?

    So we're not now dealing with a language of M-expressions and S-expressions - we're dealing with a language of non-QUOTE-expressions and QUOTE-expressions. So shouldn't the triviality rule in S-expression Lisp actually substitute the words "QUOTE-expression" for "S-expression"? As in:

    "Operational equivalence of QUOTE-expressions is trivial; that is, Q1 and Q2 are operationally equivalent only if they are identical."

    And isn't that (at least part of) the source of the unjust banning of FEXPRS this whole time?

    Forgive me if I'm missing something blindingly obvious here. Why do we need the machinery of 'some terms that are not source-expressions'? Don't the source-expressions encode the M/S difference sufficiently?

  4. Ah... is the problem that in S-expression Lisp, the entire program can be itself wrapped in a (QUOTE) expression, and so the φ-encoding step would have to be performed, selectively, on all (QUOTE)-expressions that might be fed to an (EVAL)? And that it's hard to figure out in advance (without some machinery like that of vau-calculus) which blocks of data might end up in an EVAL?

    "The context that makes a source expression code rather than data is patently not in the source; it's in what program you send the source expression to."

    This seems to be the key to everything to me. And working programmers keep something like this in mind all the time, even when we're looking at 'just data'.

    'Operational equivalence' isn't just a property that belongs to 'code'. It belongs to 'raw data' as well. Consider whitespace, and case sensitivity, to take two examples. The idea of whether two chunks of text 'are equal' depends on the context in which we think this text is going to be evaluated. Is it going to a case-insensitive tool? Then all forms of upper and lower case are equivalent. Is it going to be subjected to standard tokenisation and whitespace-stripping? Then the whitespace between tokens can be disregarded for equality purposes.

    Can we tease something out of this programmer's intuition and formalise it?

    1. The entire program could be wrapped in a (QUOTE); yes :-). Some folks have pointed out that, if all you have to worry about is QUOTE, not fexprs, you can still figure out at compile-time when passages are QUOTEd. I think those folks miss something important, though, because if you're serious about metaprogramming, you don't want your ability to reason about things to be banned from ever going inside a QUOTEd expression. If your opotmization is completely defeated by QUOTEd expressions then you'll be highly motivated to avoid them and that will mean avoiding metaprogramming. (Anyway, the trick where you figure out at compile-time which subexpressions are QUOTEd doesn't work with fexprs. In a language with fexprs, determining which operands are to be evaluated is undecidable, like the halting problem.)

  5. This comment has been removed by the author.

  6. (I wish I could edit without deleting).

    I guess what I'm trying to ask is: can we apply some of the thinking that's gone into compilation, and apply it to compilation-like preprocessing of *data*?

    We already have an idea a little like 'compilation of data': 'normal forms', as in SQL. Is there perhaps some middle ground between techniques developed for optimising of code, and techniques databases use for optimising queries and indexes?

    Could some kind of 'data compilation' help sort out some of the mess that parsing is in, where we have lots of ad-hoc parsers just crawling with terrifying security vulnerabilities, wired straight into the Internet?

    An idea that's easy to say but hard to formalise, I guess.

    1. (I agree it's annoying not to be able to edit comments; working on wikis has spoiled me.)

      There are several different ways to get at the trivialization-of-theory effect; I'm glad one of them clicks for you. I do like that one: the difference between code and data expressions is in the program it's sent to, and therefore of course it's not to be found in the expression itself.

      The other way I was getting at, to do with encoding φ. In the M-language, the source expressions are the M-expressions; but in the S-language, the source expressions are the φ-encodings of the M-expressions — and all φ-encodings are S-expressions. Therefore, in the S-language, all source expressions are S-expressions. But nothing has changed about the reasoning that said an operational equivalence between S-expressions can only be trivial (reflexive). So it's formally necessary that, in order to have nontrivial operational equivalence for the S-language, you have to have some terms that aren't source expressions. That's the flip side of the intuitive observation that the difference between code and data isn't in the source expression but rather in the choice of what program to send it to.

      I hadn't thought of it in terms of milder forms of parsing. Interesting; could be something there. Stuff like that is often hard to pin down; myself, I tend to collect not-quite-pinned-down insights like that, writing down my thoughts on each and revisiting them from time to time to see if I can glimpse a bit more in them. (Lately they take the form of partial drafts for blog posts; then I've got lots of older ones that are partial drafts of LaTeX documents.)

    2. For a while now I've wanted to read your uncompleted drafts, even if they inscrutable and disjointed, because I think that there are some partial ideas that get held back by the publication boundary. You yourself said that "Fundamental Concepts in Programming Languages" said slightly different things as lecture notes than it did as a polished paper.

    3. @patchworkZombie: The other thing that annoys me about this interface is how easy it is to reply to the post as a whole when you mean to reply to the last comment; see below. :-P

  7. @patchworkZombie: Sometimes the long-term drafts do get completed and posted, but often the one that gets posted was the most recently created, while older ones get salvaged for parts. There's awkward overlap between some current drafts, as I struggle to wrap my head around some very big ideas. Fwiw, here are some of the more promising working titles; the list started in reverse-alphabetical order by file name, and I kept that order because it makes a surprising amount of sense.

    The twisted hourglass. The imho beautiful side of category theory; heavily visual. My main current interest, being delayed by technical challenge of image files.

    Pitfalls of alternative science. How to pursue alternative ideas without sounding like, or being, a crackpot. Tiny bits of this sometimes show up in posted entries. The whole may never materialize, because it requires discussing my experience some years ago trying to get a paper peer-reviewed on the theory of fexprs, and I still haven't figured out how to talk about it without bitterness (which would defeat the intent of this draft).

    Non-sapience and anti-sapience. What isn't sapient, what sapience doesn't do, what actively suppresses sapience. Aspires to consider how to promote humanity in a sapience-hostile civilization.

    Self-organizing civilization. "How are we, as a civilization, to make decisions?" Aspires to subsume politics and economics.

    Refactor — eliminating lexical nouns and verbs. My other conlang project. I see ways to eliminate nouns/verbs either lexicially or grammatically, but not both as once.

    The quantum approximation. About whether it's mathematically possible to account for quantum mechanics using a physics TOE such as I described in my cohygiene post.

    Nabla. Trying to work out the, er, quintessence of quaternions.

    Markup languages and escaping the macro tarpit. Proposing markup languages should be designed using fexprs instead of macros.

    Lisp, mud, wikis, and reimagining programming. Radically crowd-sourced programming that maximizes the sapient element rather than suppressing it.

    How do we know it's right?. Discusses a "three-fold path": mechanistic reasoning (such as elaborate type systems), testing, and self-evidence.

    Church's 1932/3 logic. I haven't seen a good exposition of it on the internet, so decided to write one. Bogged down over what to do with the battery of axioms whose individual significance is obscure.

    1. "Nabla. Trying to work out the, er, quintessence of quaternions."

      Mmmm. This is relevant to my interests! In a hobbyist sense, anyway.

      I've been wondering for a long time about what a true 4-dimensional "quaternion nabla", in Q rather than relativistic Minkowski-space, would be. I mean, given Q = S + V, I think quaternion nabla is well-known to be:

      (dS/dt - div S) + (dV/dt + curl V + grad S)

      ... but what, if anything, does that physically *mean*? It's easy enough to see that (dS/dt + dV/dt) is a '4-delta'... and it's also easy to see that (-div S + curl V) is the natural result of treating -div+curl as a single operation (as an angle between two vectors reduces, the less curl, the more div, etc - and in fact -div, as 'convergence', was the orginal symbol)... and with a bit of blinking it's also easy to see that (dS/dt + grad S) makes a nice natural '4-grad' operation - take the gradient in three dimensions and then add the *change over time* to get an arrow in 4D pointing into the future or past to see where 'the most stuff' is...

      ... but adding a curl to a grad to a delta? Ugh! Not only is that nasty and unnatural to think about, that's adding a pseudovector to a vector and breaks all our maths. And yet it's there baked into the simple quaternion equation! What's going on?

      "Markup languages and escaping the macro tarpit. Proposing markup languages should be designed using fexprs instead of macros."

      "Lisp, mud, wikis, and reimagining programming. Radically crowd-sourced programming that maximizes the sapient element rather than suppressing it."

      Both of these sound wonderful, and match up with my personal intuition (coming from growing up in the 1980s with all the 1970s microcomputer-hippie magazines) that computing should be about personal mental freedom and _expanding our intelligence_, and that handing more control to the user is almost always the correct design decision.

      I'm very concerned that, globally, the IT industry is heading in exactly the OPPOSITE direction. A reason why I think your research is very important for the future.

    2. Here's an idea I've been playing with recently, which is very small and simple but may have some merit.

      I've been interested for a while in the problem of the simplest possible syntax for representation of arbitrarily-structured data. At the moment, we have formats like XML (very heavyweight, a complex data model, syntactically nasty for humans to parse); JSON (a much simpler data model, but still syntactically too heavyweight if you're trying to build a domain-specific language, and its objects/dictionaries are way too limited because you can't have objects as keys).

      I mention 'syntax' as a problem because, although to most programmers and mathematicians it's not that important, when you're trying to transfer data between different programming languages each with their own custom syntax, you quickly start getting round-trip issues and faults. Even adding just one character to the reserved syntax adds exponential complexity. I'm thinking here about the problem of how to store data for 1,000 years - and transfer it between 1,000 systems.

      (This is a similar kind of problem that Curtis Yarvin's 'Urbit' is trying to address, though I disagree with almost all of his design decisions.)

      S-expressions are really, really nice for a lot of things! Very syntactically lightweight, so great as a transfer serialisation syntax. But because they're only designed to handle lists, not object-like data, they need to be augmented somehow.

      I've come up with an idea that might be a tiny step forward:

      1. Take ordinary S-expressions
      2. Reserve one symbol, say '|', but it could be anything.
      3. In our new modified S-expressions, any list with CAR of | is now a 'logical term' - just like Prolog term structure. The CADR is the 'functor', the CDDR is the 'body'. If the CAR is not |, then it's just an ordinary list.

      What this does is it effectively gives us an 'escape' from nesting level, which lets us make any list become a term - ie, a typed chunk of arbitrary structured data, that can be used exactly like a symbol. In fact, we could recursively define symbols or numbers *as* such terms, all the way down to the bare metal!

      Eg a string "ABC" could be a term like (| string 65 66 67).

      This maps almost exactly onto Prolog term structure - except for two things: 1. we no longer need the separate [] syntax for lists, and 2. we can append or splice a structured term into a list just like we could a number/symbol with dotted-pair notation.

      That last is important because it means we could have a term (| set 1 2 3) representing a set of entries - we could then splice that onto a list as (a b c |
      set 1 2 3) and now we have something that looks like a dictionary. Which is really, really hard to express otherwise in S-expressions without some kind of universal convention like this.

      Effectively this turns Prolog terms inside out, and undoes a small but annoying piece of 'abstraction inversion' - where the Prolog list [A,B,C} is structure like '.(a, .(b, .(c, .())))' , the equivalent expression would literally just be '(a b c)'

      Do you think this idea could be useful?

    3. Eg, could this perhaps be relevant to the M-expression -> S-expression conversion issue?

      We could define M-expressions in this new syntax as any term beginning with (|M ...) Then we wouldn't necessarily need a separate category for S-expressions - they would just be 'all list structure, including terms that don't start with |M'.

      It would then be obvious that the original triviality rule about S-expressions was too hasty - if M-expressions *are a subset of S-expressions* then it cannot be true that 'all S-expressions are trivial' but simply 'we can prove by this rule that all M-expressions are non-trivial; other S-expressions we have not yet proved by this rule to be non-trivial. There may be other non-triviality rules'.

      Similarly, a lambda form might be (|lambda..) rather than (lambda..) - a slight change, but it would mean we could now embed lambdas directly into *pair* structure rather than being limited to being cells within *list* structure. A lambda could then, eg, perhaps act directly as a lazy cons cell - (1 2 3 |lambda...) meaning 'the entire rest of this list is generated by this function'.

      This comes back, I think, to your thoughts about where Lisp might have gone if there'd been time to rethink the M/S expression calculus before release. As you mentioned, we'd immediately lose a lot of the need for CONS and quoting. Maybe not all, but a lot.

    4. I realise that current Lisps give us the ability to put lambdas (or any other non-pair object) in CDR place at runtime using CONS. What I think we would gain from a | syntax is the ability to automatically and transparently _serialise and deserialise_ such structured objects as S-expression structure. That means these objects would no longer be opaque and inexpressible intermediate steps but could be exposed as data and acted on by other language layers. That may be a good thing or a bad thing, but it feels intuitively to me as if it unsnarls quite a few things which are currently tangled.

  8. I also realise that we still do need *something* like quoting - eg, source code has to exist as data somewhere, and if we're Lisp people

    Trying to wrap my head around the idea of "terms not in the source expression", are those the "eval, wrap and e" terms? And is that linked to the idea of markers for explicit evaluation?

    Is the reason why a marker like 'eval' cannot occur in a source expression because it would immediately be acted upon by the interpreter and replaced with something else? Am I on the right track or way off?

    I share your conviction that explicit evaluation is more correct than implicit, but I have quite a mental block about the idea of *where* evaluation occurs in an explict-evaluation model. And how to stop it happening before you want it to.

    I have this vague idea in the back of my head that I've not yet been able to make come forward and fully identify itself: that a nice computing model would be like a spreadsheet, with a lot of 'cells' which have both an interior (eg expression / formula / definition) and an exterior (eg computed value). And that it might be useful to hold BOTH of those 'values' at once, and allow a visitor to decide at runtime (subject to some kind of security access check) whether they want to read the 'inside' or the 'outside' of the cell. As, eg, the user of a spreadsheet can do.

    The main idea, I think, being that this wouldn't be quite a 'term-rewriting' system because the definition-side of the term wouldn't vanish; it would still be there even after it was 'computed'. Probably, though, with all the intermediate steps being thrown away.

    This would be very similar to how we keep 'source files' and 'object files' of all our compiled software - except that we'd do it for ALL computation, at a fine-grained level, not just the large chunks we call 'programs'.

    It would also be similar to how Unix filesystems can have 'symbolic links' (or Windows can have 'shortcut files') : the symbolic link is like a function definition, the file it redirects to is like a computed value. We'd maybe like to store both, and allow the user to read one or the other at will. If we can come up with one universal mechanism for solving this problem - since it recurs in so many contexts - that would be a win.

    I wish I could convince my brain to be clearer about what it means by this, or how to do it.

    1. It's entirely possible for you to be on to something, and at the same time be struggling to get it straight in your own mind; sometimes the fact that it's hard to fully wrap your head around it is a sign it's especially worth pursuing (even /after/ I came up with the vau device it took some serious thinking to convince myself it would really work). I've found playing around with implementing things like this can sometimes be a great aid to thinking about them.

  9. Ug. "Source code has to exist as data somewhere, and if we're Lisp people, we'd like that source code data representation to be homoiconic - as much like its final form as possible."

    1. Really dumb idea, but... what if, in the same way that Lisp storage is made out of CONS cells which have CAR and CDR...

      What if we extended this notion of 'cell' to include LAMBDAs. So we have a CAR-equivalent which reads the definition, and a CDR-equivalent which applies it to an argument and reads the computed value?

      And as we have a (pair?) function which tests if a cell is a pair, similarly we'd have a (lambda?) function which tests if a cell is a lambda.

      And then a program could be entirely in charge of its own computation... do it in small steps or big steps or none at all. When reading something you know to be source code? Read the definition, all the way. When you just don't care and want an answer? Say 'compute it all, recursively, with infinite storage and compute time'. Or maybe you want 'the first 100 values, then give me a lambda that computes the rest', so you can feed that into a GUI display window and wait for the user to click 'next page' before you render the next lot.

      Taking 'explicit eval' to the verge of insanity, perhaps. But... don't we actually do that already? Time-slicing OS schedulers, interrupts, etc already make decisions about how much of a program to execute and when to override even machine code. And cloud hypervisors, etc, etc. What if we cut out all those middlemen and just let a language be its own interpreter/scheduler?

    2. I think it's well worthwhile to think deeply about variants on the Lisp core model, both because of what new things one might discover and because of the insights one might get into Lisp itself. For example: Another draft blog post that I didn't list above, because it got sort-of-abandoned, was something I'd thought up because I was looking for a way to do "keyword parameters" that would appeal to my sense of elegance (I don't like second-class syntax). My idea was that in Lisp, there are just a few data types involved in the evaluation algorithm, each with a dedicated purpose: environments provide symbol bindings, continuations provide behavior, pairs provide syntactic structure. Since keyword parameters seem to be intruding on the syntactic territory of pairs, thus compromising the boundaries between the types, perhaps one could restore balance by making pairs just a special case of environments (so, no longer a primitive type at all). I didn't pursue it too far because I had doubts about losing the orthogonality between namespaces and syntax; but I never deleted the draft, in case someday I wanted to explore it more after all.

    3. "Since keyword parameters seem to be intruding on the syntactic territory of pairs, thus compromising the boundaries between the types, perhaps one could restore balance by making pairs just a special case of environments "

      This is pretty much an idea I've been pursuing for a few years: that if we extend pairs to binary relations, we get a superset of pairs, sets and dictionaries. One data type to rule them all!

      Unfortunately it gets a little hairy precisely *because* Lisp pairs + sets are a proper superset of binary relations. I think it's perfectly sensible myself, but it does mean you end up with a data type that's bigger than functions and bigger even than relations (eg, it includes bare sets of atoms), which gets awkward because maths hasn't really standardised how to handle stuff like that. Eg, you can _model_ a set as a one-place relation, but it's not _actually_ the same thing: if you're using Lisp pairs and atoms as primitives, the set {X} is fairly obviously not the same as the set {(X . nil)}. This sort of fussy off-by-one problem tends to matter in programming. What exactly is the arity of the bare set {X}? It can't be '1' because arity-1 is {(X)}, with that extra nil. Is it arity 0? Can we even apply the idea of arity to a set of things other than pairs? The normal maths answer here is just 'you can't apply that abstraction here' which is unhelpful if you're sitting at an actual computer looking at an actual set of data parsed from the Web, containing a bunch of atoms and a bunch of pairs. Maths or no maths, it exists and now you have to describe it somehow.

      But I like the idea of thinking about a program text as potentially a structured object rather than just a list of tokens. Eg a source file of function definitions might perhaps be an object where the keys were the function names and the values were the definitions.

  10. @Nate Cull:

    "I'm very concerned that, globally, the IT industry is heading in exactly the OPPOSITE direction. A reason why I think your research is very important for the future."

    Yes, it's heading in the opposite direction. Which connects to the draft Non-sapience and anti-sapience. Basically, entities like governments and mega-corporations would find people easier to deal with if people were less sapient; and the role of those entities in deciding the direction of technological development connects to draft Self-organizing civilization. Which is the sort of thing I had in mind when I mentioned struggling to wrap my head around very big ideas: that's four drafts all on aspects of one very big idea.

    (A third annoying thing about this comments interface is that comments can only be nested one layer deep.)

  11. I found an interesting sentence in Teitelman's phd thesis PILOT (1966) page 35
    "In LISP, all data are in the form of symbolic expressions, or S-expressions. S-expressions are of indefinite length and have a branching tree structure in which subexpressions can be readily isolated. LISP computations are also written in the form of S-expressions. This makes LISP especially adaptable for our purposes. Like machine languages, and unlike most other higher level languages, one can write programs in LISP which will generate programs for further execution. Furthermore, it is possible to execute data as programs, and conversely treat programs as data."

    1. Nice. Speaking of "computing should be about personal mental freedom and expanding our intelligence". :-)

  12. I'm baffled by your epsilon abstraction: does it only come into play in the interpreter? In the term

    [combine ⟨εx.T1⟩ T2 T3] → [combine T1[x ← T3] T2 T3]

    does T1[x <- T3] reduce to a lambda abstraction? If not, what is it doing? I don't want to waste your time, I would be obliged to you if you would take the time to explain it to me. Thanks.

    1. Given this schema for epsilon, yes, at some point a lambda will be needed to discharge the combine. I described epsilon more in an earlier post.