Possible Improvements

 Prev   Top   Next 

The final section of this report discusses some possible improvements that could enhance the performance and quality of the algol2MMIX compiler, without changing its design and internal architecture too much.

A first note is about run-time support for memory management: as the full Algol68 language, Algol68Nix was supposed to be a garbage-collected language. Although the design of the compiler included the subdivision of the memory assigned to the running program in three parts (the text segment for the code, the data segment for the control stack , and pool segment the for the heap), no garbage collector was actually implemented (nor does the language provide instructions like free or dispose to return heap-allocated space to the memory manager). An easy solution could be to implement a simple mark-and-sweep garbage collector to reclaim unused memory when the heap pointer reaches the end of the allocated heap; alternatively, if temporary suspension of program execution is undesirable, it is possible to use a reference counting garbage collector, that incrementally collects the unused space as soon as it becomes garbage. However, using such approach within the current design could lead to inefficient use of the heap space. This is because reference counting garbage collection needs to attach a counter to each heap-allocated object: using less that an octa per counter would create alignment problem (since MMIX requires all memory references to be aligned to proper boundaries), resulting in the necessity of adding some space-wasting padding; on the other hand, allocating an octa (i.e. 8 bytes) per object is an overhead that is probably too high in almost all circumstances.

An important change that could make the compiler more useful would be to add a more sophisticated error reporting facility.
Right now, the compiler halts upon encountering the first syntax error, and reports it by simply notifying the line at which the error was detected. To be more user-friendly, the compiler could try to figure out what token is missing or was mistyped (in the case of syntax errors), and suggest a possible correction. One possible approach to achieve this kind of error diagnostic could be to hard-code into the compiler some knowledge about the most common types of errors that programmers usually do: using syntax from other popular programming languages, placing extra semicolon at the end of a clause, or forgetting to use closed-clause-tokens like FI to end a conditional construct or OD to close a FOR...WHILE...DO loop.
Another direction along the same lines would be to do error recovery, i.e. trying to patch somehow the buggy piece of code and go on processing the rest of the program. While this requires perhaps an even greater amount of work compared to what is needed to do plain error diagnostic, it is not clear how much the programmer would benefit from such facility: more often than not, previous errors completely confuse the compiler, so that subsequently reported errors tend to be spurious, and the programmer doesn't even care to look at them --- he/she simply corrects the very first error and compiles the code again.

A major impact on the performance of the compiler would derive by implementing some kind of code optimization module. Indeed, the code generated by the algol2MMIX compiler is simple-minded, featuring quite a bit of redundancy, which makes it a perfect playground for applying local optimizations, such as peephole optimizations and local common sub-expressions elimination.
Given the RISC design of the MMIX architecture, the assembly code output by the compiler is rather similar to a sequence of quadruples. The main difference is that at assembly level there is no type information around anymore; however, for the kind of optimization mentioned above the availability of type information is not crucial. A code optimizer could thus be built as a separated module, transforming .mms files produced by the algol2MMIX compiler into much shorter, and possibly more efficient, functionally-equivalent version of its input.

First, an easy optimization to do could be to remove NOP's from the assembly code: in a couple of cases, the code generator of the algol2MMIX compiler couldn't figure out exactly what instruction to associate to some label, so that the label was first attached to a SWYM operation (the MMIX opcode for "no operation").
This could be implemented within a more general piece of code handling dead code elimination, i.e. finding out instructions in the .mms code that cannot affect the behavior of the program. Such optimization would prove very effective on typical assembly files generated by the algol2MMIX compiler: quite often a value is stored in memory and then loaded again (see for example the assembly file generated for indices.a68); in other cases the same constant is loaded in the same register multiple times, and then stored in contiguous memory locations (see for example the assembly file generated for helloWorld.a68)

Another optimization that could drastically reduce the length of the assembly code generated by algol2MMIX is to track down copy propagation information: in the same fragment of the assembly file generated for indices.a68 that was highlighted above, the value of the register FP is copied in the register EXTRA, and such value is used before the register FP is changed. In this case, a peephole optimizer would substitute the use of the register EXTRA for FP, and then would eliminate the dead code SET EXTRA, FP.

If common sub-expression are recognized and eliminated, another optimization would be possible again in the same piece of assembly code: the constant 40 is loaded twice in the local registers $0 and $2. The effect of all these optimizations on the fragment of assembly code in question is showed in the table below.

Initial code fragment After copy propagation After dead code elimination After common sub-expression elimination Final optimized code
SET VALUE,$0
SETML $0,0
INCL $0,40
STO VALUE,FP,$0
SET EXTRA,FP
SETML $2,0
INCL $2,40
LDO VALUE,EXTRA,$2
SET VALUE,$0
SETML $0,0
INCL $0,40
STO VALUE,FP,$0
SET EXTRA,FP
SETML $2,0
INCL $2,40
LDO VALUE,FP,$2
SET VALUE,$0
SETML $0,0
INCL $0,40
STO VALUE,FP,$0
              
SETML $2,0
INCL $2,40
LDO VALUE,FP,$2
SET VALUE,$0
SETML $0,0
INCL $0,40
STO VALUE,FP,$0
              
              
LDO VALUE,FP,$0
SET VALUE,$0
SETML $0,0
INCL $0,40
STO VALUE,FP,$0
              


Last modified: 08/30/02
Antonio Nicolosi