How to Obtain MLRISC
Problem Statement
MLRISC Based Compiler
MLRISC Intermediate Representation
MLRisc Generation
Back End Optimizations
Register Allocation
Machine Description
Garbage Collection Safety
System Integration
Graphical Interface
Line Counts
Systems Using MLRISC
Future Work
Architecture of MLRISC
The MLTREE Language
MLTree Extensions
MLTree Utilities
Instruction Selection
Machine Code Emitters
Delay Slot Filling
Span Dependency Resolution
The Graph Library
The Graph Visualization Library
Basic Compiler Graphs
SSA Optimizations
ILP Optimizations
Optimizations for VLIW/EPIC Architectur...
Register Allocator
Back Ends
The Alpha Back End
The PA RISC Back End
The Sparc Back End
The Intel x86 Back End
The PowerPC Back End
The MIPS Back End
The TI C6x Back End
Basic Types
Client Defined Constants
Client Defined Pseudo Ops
Instruction Streams
Label Expressions

Garbage Collection Safety

Garbage Collection Safety
Safety Framework
Concurrency Safety


High level languages such as SML make use of garbage collectors to reclaim unused storage at runtime. For generality, I assume that a precise, compacting garbage collector is used. In general, low-level optimizations that reorder instructions pass gc safepoints, when applied naively, are not safe. In general, two general classes of safety issues can be identified:
derived values
A derived value x is a value that are dependent on the addresses of one of more heap allocated objects a1,a2,a3,... and/or the recent branch history. When these allocated objects a1,a2,a3,... are moved by the garbage collector, x has to be adjusted accordingly.

For example, inductive variable elimination may transformed an array indexing into a running pointer to the middle of an array object. Such running pointer is a derived value and is dependent on the starting address of the array.

The main difficulty in handling a derived value x during garbage collection is that sometimes it is impossible or counter-productive to recompute from a1,a2,a3,.... For example, when the recent branch history is unknown, or when the precise relationship between x and a1,a2,a3,... cannot be inferred from context. We call these unrecoverable derived values.

incomplete allocation
If heap allocation is performed inlined, then code motion may render some allocation incomplete at a gc safepoint. In general, incomplete allocation has to be completed, or rolled backed and then reexecuted after garbage collection, when the source language semantics allow it.

Typically, two gc safepoints cannot be separated by an unbounded number of allocations, which implies that in general, optimizations that move instructions between basic blocks are unsafe when naively applied, which greatly limits the class of optimizations in such an environment to trivial basic block level optimizations. framework is a necessity.

Safety Framework

MLRISC contains a gc-safety framework for performing aggressive machine level optimizations, including SSA-based scalar optimizations, global instruction scheduling, and global register allocation. Unlike previous work in this area, phases that perform optimizations and phases that maintain and update garbage collection information are completely separate, and the optimizer is constructed in a fully modular manner. In particular, gc-types and safety constraints are parameterizable by the source language semantics, the object representation, and the target architectures.

This framework has the following overall structure:

Garbage collection invariants annotation
The front-end client is responsible for annotating each value in the program with a gc type, which is used to specify the abstract object representation, and the constraints on code motion that may be applied to such a value. The front-end uses an architecture independent RTL language for representing the program, thus this annotation phase is portable between target architectures.
GC constraints propagation
After instruction selection, gc constraint are propagated throughout the machine level program representation. Again, for portability, gc typing rules are specified in terms of the RTL of the machine instructions. In this phase, unsafe code motions which exposes unrecoverable derived values to gc safepoints are automatically identified. (Pseudo) control dependence and anti-control dependence constraints are then added the program representation to prohibit all gc-unsafe code motions.
Machine level optimizations
After constraints propagation, traditional machine level optimizations such as SSA optimizations and/or global scheduling are applied, without regard to gc information. This is safe because all gc safety constraints have been translated into the appropriate code motion constraints.
GC type propagation and gc code generation
GC type inference is performed when all optimizations have been performed. GC safepoints are then identified and the root sets are determined. In addition, compensation code are inserted at gc points for repairing recoverable derived values.

Concurrency Safety

In the presence of concurrency, i.e. multiple threads of control that communicate via a shared heap, the above framework will have to slightly extended. As in before, we assume that context switching can only occur at well-defined safepoints. The crucial aspect is that values that are live at safepoints must be classified as local or global. Local values are only observable from the local thread, while global values are potentially observable and mutable from other threads. The invariants to maintain are as follows:
  • Only local and recoverable derived values may be live at a safepoint,
  • Only local and recoverable allocation may be incomplete at a safepoint

Lal George
Allen Leung
SML/NJ Validate this page
Generated by mltex2html
Last modified: Thu Jan 9 19:38:15 EST 2003 by leunga@slinky