Toward the Bootstrap

 Prev   Top   Next 

As explained in another part of this report, the algol2MMIX compiler was written in a stylized subset of the C language for which it would be possible to (semi-)automatically generate the corresponding Algol68Nix code. This document describes in some more details this process, and reports on what worked and what went wrong when trying to make the algol2MMIX compiler compiling itself.

The basic reason for the use of stylized C code to implement the "pre-egg" version of the compiler is that both C and Algol68Nix are descendants of the Algol68 algorithmic language: hence, there must have been enough commonalities to make the intersection language powerful enough.

Indeed, most of the differences are syntactical, in the sense that the two language use different lexemes for the same token; some differences were more substantial --- like the use of the semicolon as a terminator (in C) rather than as a separator (in Algol68Nix), or the need of the lexeme return in C to specify the value returned by a given function (whereas no special syntax is required in Algol68Nix). Still, such differences were easy to come around (as it was also noticed in class).

On the contrary, two differences needed manual care to be accommodated (and that's why the C-to-Algol68Nix translation is semi-automatic): user-defined mode declarations and nested procedure declarations. In particular, the absence of nested procedure declarations in C conflicted with the prerequisites of modularity and expressiveness of the design of the Algol68Nix language.

The easiest solution out of this conflict was not to make use of non-local variables in writing the compiler (so that the absence of nested procedures couldn't hurt too much), while still allowing for nested procedure and non-local variable access in the implemented language. In other words, nested procedures are supported and correctly handled by the algol2MMIX compiler, but the code for the compiler itself does not make use of this facility, and the .a68 code that is automatically generated from the .C version must be massaged a bit, adjusting user-defined mode declarations and moving all procedure declarations within the body of a unique, "main" procedure (to conform to the rule that an Algol68Nix program consists of a single procedure.)

The building process for the algol2MMIX compiler is carried out by the relative Makefile. In particular, the C versions can be compiled into object files by typing (in a shell):

$ make objects

The corresponding .a68 files can be generated by typing:

$ make a68

Notice, however, that doing this will not produce working Algol68Nix files right away: as explained above, a manual pass is need to correct the user-defined mode declarations, and to move all the procedures and the global variables inside the main. The .a68 files bundled with this report are already up-to-date: generating them from scratch is not suggested, as this manual pass could be annoying to perform.

Once the objects have been built, it is possible to compile the scanner:

$ ./nix-compile nix-scanner

Here, nix-compile is a batch file that calls in turn the scanner, the parser and the analyzer on the file given as argument (without the .a68 extension), and if everything succeeds it creates a file with the same name and .mms extension (the standard extension for MMIX assembly files).

Before getting MMIX object code, it is necessary to assemble the file nix-scanner.mms using mmixal:

$ mmixal nix-scanner.mms

Now it is possible to run the MMIX version of the scanner nix-scanner.mmo on any Algol68Nix file, like nix-scanner.a68 itself:

$ mmix -fnix-scanner.a68 nix-scanner.mmo > nix-scanner.tok

This will produce a stream of tokens for the given .a68 file. To verify that this has been done correctly, it is possible to check the file nix-scanner.tok against the output of nix-scanner:

$ ./nix-scanner < nix-scanner.a68 > nix-scanner.check.tok
$ diff nix-scanner.check.tok nix-scanner.tok

The diff command should now reply with a new command line, indicating that no differences were found. Repeating this on a couple of .a68 files will show that the .mmo version of the scanner is functionally equivalent to the object version produced before --- a partial bootstrap.

Next, the same steps can be repeated on the parser:

$ ./nix-compile nix-parser
$ mmixal nix-parser.mms
$ mmix -fnix-scanner.tok nix-parser.mmo > nix-scanner.tree
$ ./nix-parser < nix-scanner.tok > nix-scanner.check.tree
$ diff nix-scanner.check.tree nix-scanner.tree

This will show that the .mmo version of the parser parses the scanner exactly in the same way as the initial object version of nix-parser. Of course, it is possible to try this again on the file nix-parser.a68 , but it will take considerably longer (the second line will require about 45 minutes!):

$ mmix -fnix-parser.a68 nix-scanner.mmo > nix-parser.tok
$ mmix -fnix-parser.tok nix-parser.mmo > nix-parser.tree
$ ./nix-parser < nix-parser.tok > nix-parser.check.tree
$ diff nix-parser.check.tree nix-parser.tree

After a considerable amount of effort, nix-parser.mmo is able to parse correctly the source code for itself --- another partial bootstrap.

As for the reasons for such long execution times, the problem is twofold: on one hand, executing .mmo files on the MMIX simulator is already pretty slow; on the other hand, the MMIX assembly code produced by the algol2MMIX compiler is sub-optimal (see the next section for possible improvements).

It is now time to consider the last module of the algol2MMIX compiler, i.e. the semantic analyzer and code generator. After compiling and assembling it...

$ ./nix-compile nix-analyzer
$ mmixal nix-analyzer.mms

...it is possible to try it on the parse tree for the file nix-scanner.a68...

$ mmix -fnix-scanner.tree nix-analyzer.mmo > nix-scanner.mms
^C

...but on a 900Mhz Pentium III it kept running for over 24 hours before being killed. It is hard to say whether the analyzer is just incredibly slow, or if something went wrong and it enter an infinite loop.
However, on simple programs like helloWorld.a68 or indices.a68, the bootstrapping version of the analyzer and code generator is able to do its job:

$ mmix -fhelloWorld.a68 nix-scanner.mmo > helloWorld.tok
$ mmix -fhelloWorld.tok nix-parser.mmo > helloWorld.tree
$ mmix -fhelloWorld.tree nix-analyzer.mmo > helloWorld.mms
$ mmixal helloWorld.mms
$ mmix helloWorld.mmo
Hello World!
...
$ mmix -findices.a68 nix-scanner.mmo > indices.tok
$ mmix -findices.tok nix-parser.mmo > indices.tree
$ mmix -findices.tree nix-analyzer.mmo > indices.mms
$ mmixal indices.mms
$ mmix indices.mmo let us see if you know how to revert a list of strings
strings
of
list
a
revert
to
how
know
you
if
see
us
let

Here, helloWorld.a68 is the classic Hello World! program, while indices.a68 is a simple program that reverses its command line arguments.

In conclusion, the algol2MMIX compiler proved to be able to correctly compile non-trivial programs, like a 1,500-line scanner and a 2,700-line recursive descent parser. The algol2MMIX compiler couldn't achieve a complete bootstrap; however, partial bootstrap results were obtained: both the scanner and the parser accomplished their own "little bootstrap". Moreover, the compiler was able to process the nix-analyzer.a68 module, producing a .mmo version that, although not bootstrapping, was still able to correctly generate code for simple programs like helloWorld.a68 and indices.a68.


Last modified: 08/30/02
Antonio Nicolosi