Universal Lambda

This universal lambda language is based directly on John Tromp's idea and design with a few minor changes to make it more suitable for golfing while still being extremely simple in nature.

Listed below are all resources needed to learn how to write and run universal lambda code. Download all files here (only runs on old version of ruby, see update below).

Update: Jan 11, 2020: Lynn has pointed out that this code has broken with changes to ruby and possibly Haskell. Lynn has updated the code and is maintaining it here. I defer to Lynn's code rather than maintaining it here.

Last code update: Nov 14, 2008, see readme file for change log. Apologies to Irori and hinoe, because this makes the problems hello world and fizzbuzz much easier after they made good entries in previous version. Your code is still nice though and I hope if you submit over it you alter your name so it does not get removed. Next version (hopefully soon) will be much more efficient at handling church numbers and operations on them.

1.1 Core Language

1.2 Lambda Calculus

1.3 Assembler Tutorial

1.4 Example

1.5 Possible Future Improvements

1.6 Golfing Ideas

Core Language

Lamda Binary files (*.lamb) represent the lambda calculus by acting as a list of bits.

00 denotes λ abstraction
01 denotes application, equivalent to ` in unlambda
1's followed by 0 denotes the argument of the nth outer λ's argument, where n is the number of 1's before the first 0 (De Bruijn index in unary).

In this manner the language is the same as Tromp's except the bits are just packed into bytes.

Input and output are done by treating your program as a function which takes a string (the input), and returns a string (the output). This is very similar to lazy-k, except that nil is used to terminate the strings instead of a char with value 256. Lazy-k is also a lazy language, making it very similar and a useful resource.

Lambda Calculus

See wikipedia's articles on Lambda Calculus and Church encoding to learn this.

Strings are represented as a list of characters. Where a character is just the church numeral corresponding to its ascii value and a list is a nesting of pairs. Here are some useful definitions:

 nil = λa b.b        #end of list
 cons = λa b x.x a b #place a at the front of list b
 true = λa b.a
 false = λa b.b
 head = λl. l true   #first element of a list
 tail = λl. l false  #all but the first element
 null = λl. l (λa b c. false) true #test if a list is empty
 Y = λf. (λx.x x)(λx.f (x x))      #y combinator

Note the slightly non standard choice of nil, chosen by Tromp, because it allows for simpler list processing.

Assembler Tutorial

Since it is cumbersome to code in binary, included is an assembler which accepts a much more readable description of a lambda expression (*.lam).

(M N) = function application, M(N)
\a.M = abstraction, λa->M
a = argument lookup

Application is left associative, so M N O = (M N) O
λ expressions go until ), so \a.a b = (\a.M N), and not (\a.M) N
Multiple arguments can be specified in the same λ expression, which just means to curry them: (\a b c.b a c) = (\a.\b.\c.b a c)
# comments out the rest of the line
Assignment can be used with the = operator and ended with a newline, it can then be used like a bound variable. Keep in mind however that this is purely syntactic sugar. For example:

 a=x y z
 a b a

Is converted to:

 (\a.a b a) (x y z)

Thus all assignments are final, and all assignments have their own scope. This would lead to larger programs if the variables were never used or used only once, but the assembler also attempts some size optimization by looking for applications which can be made at compile time resulting in a shorter size. For large programs it is impossible to make this optimization perfect, but in general it is pretty good with a basic algorithm. Note however that currently the algorithm does no checking to see if rearranging assignments (or arguments in curried functions) would result in shorter size, so place commonly used variables closer in scope to where they are used.

The lamb interpreter takes all command lines as files, concatenating them along with the standard input at the end. It does not distinguish between source file and input; it parses everything until all applies have two arguments, then uses all the remaining bytes as the input. One thing this allows you to do is prepend text to the standard input by placing it at the end of the source file. To do this with a lam file, end your program with a ' or ", and immediately follow it with the text. ' is for raw string, " is to first escape \n and such. For example the hello world program can be written as:

 (\a. a) "Hello, world!\n

Example

Let's make a program which prints out all of the permutations of an input string. Based on this pseudo code:

 perm(pre, s)=
   if s is empty
     output pre, newline
   else
     for i=0 to s.size-1
       perm pre+s[i], s[0...i]+s[i+1..-1]
 perm "",input

First all side effects must be removed, so we must make perm return a string instead of printing it as it goes. The for loop has been moved into its own recursive function with sums the results as it goes:

 inject(init,a,block)=
   if a is empty
     ""
   else
     block(init,a)+inject(init+a[0,1],a[1..-1],block)
 perm(pre,s)=
   if s is empty
     pre+"\n"
   else
     inject "",s, λ left,right . perm(pre+right[0,1],left+right[1..-1]
 output perm("",input)

Now we will start writing in lam notation, defining some useful functions along with the inject function. You can ignore foldr and inject for now, the explanation of perm should cover all the important things in them.

 false=\a b.b
 true=\a b.a
 cons=\a b x.x a b #place a character at the front of a string
 Y=\f.(\x.x x)(\x.f (x x)) #Y combinator
 foldr=\f init.Y(\self list.list (\lhs rhs init.f lhs (self rhs)) init)
 concat=\a b.foldr cons b a #concatenate two strings 
 rcons=\a b.concat a (cons b false) #append a character to a string
 null=\list.list (\a b i.false) true #return true if list is empty
 inc=\n f x.f (n f x) #add 1 to a church numeral
 pow=\a b.b a
 2=(inc \a.a)
 3=inc 2
 10=inc (pow 3 2) #ascii value of newline
 inject=Y \self init a block.
   a (\lhs rhs empty.
     concat (block init a) (self (rcons init lhs) rhs block)
   ) false

Named recursion is impossible since the function would need to pass itself as an argument to itself, which it cannot do since it does not know what itself is. To get around this the Y combinator is used, it takes a function and returns a function, passing the result to the function.

 perm=Y \self pre s.
   null s (rcons pre 10) (
   inject false s \left right.
     self (rcons pre (right true)) (concat left (right false)))
 \input.perm false input

Here self will be the same as perm, allowing the recursion. Important things to note are if statements are simulated by just passing 2 arguments to a boolean function and that the first element of a list is list(true), and the rest is list(false).

Straight compilation of our program results in a 717 bit result, but the assembler finds that it can make things shorter by making some of the function applications now instead of at run time. This results in a 458 bit program. Some more bits can be saved by noticing that (\input.perm false input) is the same as just (perm false), saving 6 more bits. The final result is

 010001000100010001011111000000001010101100000000000100000110010111111011001111
 100101111100010011111001111100010010101011111111000000000010111000000001011111
 111111111001011111011111101111100101011111111001011111111111110111111011101101
 111000001000001010000001011111100101111111101111001100000110010111111111011001
 100000100000100000000111001011110110100000010111101100001011011000001000000101
 111000000101100000000001011011110011111110111011101100001000110100001110011010

Not a pretty sight, but slightly more pretty after trying the disassembler on it:

 > lama perm.lam
 size = 717 bits
 optimized = 458 bits
 > lamd perm.lamb
 (a = (\a.
   (b = (\b.a (b b))
   b b)
 )
 (b = (\b c.a (\d e.e (\f g h i.i f (d g)) c) b)
 (c = (\c d.b c (\e.e d (\f g.g)))
 (\d.a (\e f g.g (\h i j k l.l) (\h i.h) (c f (\h i.h ((j = (\j k.j (j (j k)))
 j (j h)) i))) (a (\h i j k.j (\l m n.b (k i j) (h (c i l) m k)) (\l m.m)) (\h i.i) g (\h i.e (c f (i (\j k.j))) (b h (i (\j k.k)))))) (\e f.f) d))))
 > echo -n 123 | lamb perm.lamb
 123
 132
 213
 231
 312
 321

This result can probably be improve a great deal, but after just this initial attempt its score of 56 bytes is not bad at all, considering first place python is 79 bytes and it doesn't have to build up all its utility functions from scratch, then again it doesn't get to compact instructions down to 2 bits either.

Possible Future Improvements

Write a faster interpreter. Currently it is based off of the shortest beta normalizer and pretty fast, but it could be about 8 times faster if lamb files were converted to Haskell and then compiled, instead of just interpreted. However due to a bug in ghc sometimes the compilation process never terminates.
Make the interpreter deduce when functions are church numerals and use native +, -, *, etc to make math operations on large church numerals plausible. Not sure if is actually possible to make all the deductions needed in a reasonable manner.
If non list of characters sent to output, instead of bad error message, print out a prettied remaining lambda expression to stderr
In assembler size optimizer:
- Check for things of form \a.M a -> M
- Check for changing argument order / scope of arguments
- Increase brute force search depth
- Try applying the greedy search in different orders
Better error reporting in assembler
Allow for infix of operators and such in lam parser
Allow for indentation based parenthesis in lam parser
Minimally parenthesize disassembly output

Golfing Ideas

These are ways to make the code for most problems much shorter, but the language would lose its purity as such a simple language if they were made. For now none will be made.

Predefine a few functions, such as the Y combinator, string to int, int to string, etc. These would be accessed in the same way free variables are, so it is backwards compatible.
Use something like fibonacci coding instead of unary to specify var lookups. This would complement having more predefined functions, but it is not backwards compatible.

If you would like to contribute helping with the improvements or just making suggestions about any ideas (or bugs) email me at: Image:email.png