Research Overview


Project Goal The main goal of this research is to develop techniques for applying aggressive machine level compiler optimizations to high level, typed languages, such as ML, and Java. SSA based scalar optimizations, global instruction scheduling algorithms, and global register allocation algorithms are considered. All optimizations that we consider are performed in the granularity of individual instructions, and not on pseudo instructions.
Main Problems High level languages, unlike their low level counterparts such as C and C++, require non-trivial runtime system support such as garbage collection. Extensive compiler support is necessary to make the a back end work seamlessly in such an environment. Such support includes ways of communicating semantics information such as types and dependence, and safety information such as code motion constraints, with the back end.

For example, to support garbage collection-aware code generation, a back end has to provide hooks for generating garbage collection code, in the form of explicit garbage collection tests, and/or trace tables for locating live objects.

To support precise garbage collection-safe optimization, the problem is even more intricate, since it is necessary for the front-end to specify the safety constraints on optimizations that perform code motion, and to provide a means of tracking transformations done by optimization phases, so that the root sets at gc safe-points can be determined.

In a traditional framework, the goals of retargetability and extensibility are usually compromised given these difficult requirements. In many existing retargetable optimizers, specific source language features are frequently assumed, either explicitly or implicitly. For example, a C-oriented back end may hardwire the C calling convention assumption into its intermediate representation. This makes it very difficult to reuse existing compiler technologies for languages substantially different than C.

Contribution As part of this research, I have developed an optimizer framework on top of the MLRISC customizable back end, to solves these problems. The framework is language agnostic: the optimizer makes no initial assumption about the source language semantics and the features of the target architectures. But instead it allows various components of the optimizer to be customized to fit the task at hand. The original MLRISC framework has already proven itself as a back end for various languages. My research aims to extend this flexibility to a high performance optimizer framework on top of the MLRISC system.
Strategy MLRISC is implemented on top of the Standard ML language, and by taking full advantage of the module system of Standard ML, the intermediate representation and all aspects the optimization phases can be parameterized. These include architecture information, aliasing, dependence, semantics and type annotations from the front-end, and even the intermediate representation itself.

For example, I use the following machine description driven strategy for implementing portable gc-safe optimizations.

Machine Descriptions Driven Optimizations
The operational semantics of each architecture is described in a uniform manner in terms of a Lambda RTL-like machine specification language. Optimizations such as scalar redundancy eliminations in SSA are driven in terms of these specifications.
Customizable GC Types
Garbage collection constraints are expressed in terms of a client defined type system, and typing rules are defined in terms of the RTL of the machine instructions. GC types are automatically translated into the appropriate form of constraints needed by different optimization phases.

An actual optimizer is obtained from specializing a generic optimizer which respect to (i) the client extensions to the immediate representation, (ii) architecture machine descriptions, and (iii) the gc type system.

Given m architectures and n different sources languages with different sets of garbage collection safety constraints, only m different sets of (MLRISC supplied) machine descriptions, and n different sets of client-supplied typing rules are necessary to construct a multi-source/multi-target optimizer.

Validation The optimizer is currently being developed and validated, as the back end for the SML/NJ compiler.
Related Work Here are some current research that are related to this work.
The gcc/egcs compilers
are multi-source and multi-target compilers. However, the source languages are mainly suited to those with runtime system requirements that do not deviate substantially from C.
The lcc compiler
is a multi-target C compiler. Targets from other languages and representations currently exists.
The National Compiler Infrastructure
also aims to develop a multi-source and multi-target compiler intrastructure suitable for compiler research. It includes various research projects including the SUIF compiler, Machine SUIF, and vpo.
C--
is a portable assembly language suitable for high level languages compilation.
More Information MLRISC documentation

Back to my home page Validate this page