The story so far.. a small journal about my first compiler. I think it
will be fun to document the intermediate designs, since I anticipate a
lot of rewriting and about-turning. Especially if people read this and
help put me on the right track.. :-)


The idea is to write a compiler that translates Core Erlang (from the
HiPE guys) into sensible-looking Common Lisp. Fun is the name of the
game. This is my crack at immitating the wonderful style of people
like Guy Steele, particularly his _Programming Language Based on
Constraints_ and _Connection Machine Lisp_ -- two great documents
available on the web.

The compiler's basic specification is to take a Core Erlang program as
input and return an equivalent Lisp program as output.

----------------------------------------
Circa March 2003:

Rough goal: Run the factorial program.

Rough plan: implement each Core Erlang construct with a Lisp macro, so
that Core Erlang parse trees can be directly macro-compiled into
Lisp. I'm using CMU Common Lisp for this work.

I implemented this by first writing factorial in Erlang, compiling it
to Core Erlang with "erlc +dcore", and then hand-translating the
resulting code into a convenient sexp syntax - a hypothetical parse
tree. With a few macros, this program was easily compiled to CL and
executed via the shell.

The language support was as minimal as possible for the factorial
program. Only partial pattern matching, and nothing resembling
concurrency.

Notable features:

  An Erlang module compiles into a Common Lisp package, and each
  Erlang function into a defun'd Lisp function.

  Using CL's multiple-values support to match Core Erlang's - a very
  nice fit!

  Atoms are represented as symbols in a special "atom" package.


----------------------------------------
April 2003:

Rough goal: Add a parser, then compile and execute the "real"
factorial program generated by erlc.

Rough plan: Snarf a lex and yacc clone. Use them to automatically
translate core erlang programs into sexps, which can be compiled with
the previously defined macros.

I found a suitable lexer (clex.lisp) and parser (lalr.lisp), and used
them to do an approximate Core Erlang parser. However, I changed my
course while writing the parser: instead of returning a straight parse
tree, the compilation is done directly in the parser actions.

For example, when encountering "foo/1 = fun (X) -> ... end" (a Core
Erlang function definition), the parser might be expected to return
something like:

  (FUNCTION-DEF foo/1 (X) ...)

But I decided to "cut to the chase", and instead make the parsing
action return the translation into Common Lisp:

  (DEFUN foo/1 (X) ...)

And so most of the compilation is done in parser actions. Not all of
it though: pattern matching is written as a macro called MATCH-CASE,
and the parser simply returns (MATCH-CASE ...) forms for later
macro-expansion.

Amusingly, since the parser is bottom-up (LR), this is perhaps a
"recursive ascent" compiler. Having the parser dictate control flow in
the program seems awkward at a glance, but so far it has not been a
problem. The Core Erlang design seems to very neatly permit an
expression to be compiled all by itself without reference to its
parents or children.

So far I'm enjoying this approach of compiling directly in the
parser. I like the fact that my "parse trees" contain straight
readable Common Lisp code, with just enough use of macros where it
improves readability. I don't think it would be as nice with pure
macros: my parse tree would show no details of the compilation, and I
don't know a way to macro-expand "just enough" of a program to really
see how it works. (I don't think there's an easy way..)

Here is an example of a function going from Erlang to Core Erlang to
Lisp:

Erlang:

  fact(N) when N > 0 -> N * fact(N-1);
  fact(0)            -> 1.

Core Erlang (from erlc):

  'fact'/1 =
      fun (_cor0) ->
	  case _cor0 of
	    <N> when ( call 'erlang':'>'
			   (N,
			    0)
		       -| [5] ) ->
		let <_cor1> =
		    ( call 'erlang':'-'
			  (N, 1)
		      -| [5] )
		in  let <_cor2> =
			apply 'fact'/1
			    (_cor1)
		    in  ( call 'erlang':'*'
			      (N, _cor2)
			  -| [5] )
	    <0> when 'true' ->
		1
	    <_cor3> when 'true' ->
		primop 'match_fail'
		    ({'function_clause',_cor3})
	  end

Lisp (from CLAW):

   (DEFUN |fact|::|fact/1| (VAR::|_cor0|)
     (MATCH-CASE VAR::|_cor0|
		 ((VAR::N) (CALL 'ATOM::|erlang| 'ATOM::> VAR::N 0)
		  (MULTIPLE-VALUE-BIND
		      (VAR::|_cor1|)
		      (CALL 'ATOM::|erlang| 'ATOM::- VAR::N 1)
		    (MULTIPLE-VALUE-BIND
			(VAR::|_cor2|)
			(|fact|::|fact/1| VAR::|_cor1|)
		      (CALL 'ATOM::|erlang| 'ATOM::* VAR::N VAR::|_cor2|))))
		 ((0) 'ATOM::|true| 1)
		 ((VAR::|_cor3|) 'ATOM::|true|
		  (PRIMOP "match_fail"
		   #('ATOM::|function_clause| VAR::|_cor3|)))))

The 'fact' module becomes a Lisp package called "fact". Lisp
package-qualified symbol names mean that e.g. the symbol
|fact|::|fact/1| refers to "The symbol 'fact/1' in the 'fact'
package." Similarly, 'ATOM::* means "The symbol '*' in the ATOM
package", and likewise for VAR. The |bars| are to tell Lisp that the
symbol is case-sensitive.

On the one hand we have here clear distinctions between function
names, variables, and atoms. On the other, it's awfully verbose to
look at.

Notable features:

- Now also using a separate Lisp package called VAR for variables
  appearing in Core Erlang code. All Erlang variables are represented
  as symbols in that package, which avoids any risk of naming
  collisions with the code in the CLAW package that the compiler
  generates. This means I can use regular symbols in macro-expansions
  instead of ugly GENSYMs, just so long as the compile is
  careful not to have naming clashes in its own code.

- Merged/rewrote everything into a single file called "claw.lisp".

- Pattern matching syntax:

    (MATCH-CASE E
      ((PATTERN GUARD BODY)
       ...))

  Where PATTERN is a sexp (matched by structure, symbols are
  variables, numbers matched by value, etc.) GUARD can be any
  expression, and BODY is only executed if the guard is true.

----------------------------------------
April 2003

Rough goal: make the generated code more readable.

Rough plan: use reader-macros and pretty-printing to create some
syntactic sugar.

The first step for more readable output is to use some special syntax
for atom and variable references. I took the character @ to mark atoms
(they kinda rhyme), and $ to mark variables. So:

  @FooBar <=> 'ATOM::|FooBar|
  $FooBar <=> VAR::|FooBar|

Note that @atoms are implicitly quoted, and that in each syntasx the
symbol is treated case-sensitively without the need for |bars|. This
is implemented by straight-forward hackery of "reader macros" and the
pretty-printing framework. I have never looked at Common Lisp's pretty
printer before, and I have to say I'm impressed!

The revised factorial function (heck, whole program) looks like this:

 (PROGN
  (DEFPACKAGE "fact" (:USE) (:EXPORT "fact/1" "module_info/0" "module_info/1"))
  (IN-PACKAGE "fact")
  (DEFVAR |fact|::|*attributes*| 'NIL)
  (DEFUN |fact|::|fact/1| ($_cor0)
    (MATCH-CASE $_cor0
		(($N) (CALL @erlang @> $N 0)
		 (MULTIPLE-VALUE-BIND
		     ($_cor1)
		     (CALL @erlang @- $N 1)
		   (MULTIPLE-VALUE-BIND
		       ($_cor2)
		       (|fact|::|fact/1| $_cor1)
		     (CALL @erlang @* $N $_cor2))))
		((0) @true 1)
		(($_cor3) @true
		 (PRIMOP "match_fail" #(@function_clause $_cor3)))))
  (DEFUN |fact|::|module_info/0| ()
    (MATCH-CASE (VALUES)
		(NIL @true NIL)
		(NIL @true (PRIMOP "match_fail" #(@function_clause)))))
  (DEFUN |fact|::|module_info/1| ($_cor0)
    (MATCH-CASE $_cor0
		(($_cor2) @true NIL)
		(($_cor1) @true
		 (PRIMOP "match_fail" #(@function_clause $_cor1)))))
  "fact")

Passing this to EVAL actually defines the program, so long as the
MATCH-CASE pattern-matching macro is defined. The program's exported
functions can then by called directly from the Lisp shell like this:

  * (|fact|:|fact/1| 10)
  3628800

It's not the prettiest code around, but it is improving.

Notables:

- The current program is 595 lines of code, excluding the lex/yacc
  clones that I downloaded.


----------------------------------------
4th of May, 2003

Started trying to compile lists.erl from stdlib, which flushed out a
bunch of bugs straight away. Now it compiles without warnings, and
some functions are working fine:

  * (|lists|:|map/2| #'1+ '(1 2 3))
  (2 3 4)
  * (|lists|:|foreach/2| #'print '(1 2 3))
  1 
  2 
  3 
  ATOM::|ok|

though runtime support is obviously lacking in various other cases:

  * (|lists|:|flatmap/2| #'list '(1 2 3))
  Error in KERNEL:%COERCE-TO-FUNCTION:  the function ++ is undefined.

I also have CLAW generating actual Lisp source files, and slightly
tweaked the pretty printing. Here's the transformations of the
standard lists:foreach/2 function, first in Erlang:

  foreach(F, [Hd|Tail]) ->
      F(Hd),
      foreach(F, Tail);
  foreach(_, []) -> ok.

then Core Erlang:

  'foreach'/2 =
      fun (_cor1,_cor0) ->
	  case <_cor1,_cor0> of
	    <F,[Hd|Tail]> when 'true' ->
		do  apply F
			(Hd)
		    apply 'foreach'/2
			(F, Tail)
	    <_cor4,[]> when 'true' ->
		'ok'
	    <_cor3,_cor2> when 'true' ->
		primop 'match_fail'
		    ({'function_clause',_cor3,_cor2})
	  end

and into Lisp:

  (defun |foreach/2| ($_cor1 $_cor0)
    (match-case (values $_cor1 $_cor0)
		(($F ($Hd . $Tail)) @true
		 (progn (funcall $F $Hd) (|foreach/2| $F $Tail)))
		(($_cor4 nil) @true @ok)
		(($_cor3 $_cor2) @true
		 (primop "match_fail" #(@function_clause $_cor3 $_cor2)))))

