How to Obtain MLRISC
Problem Statement
MLRISC Based Compiler
MLRISC Intermediate Representation
MLRisc Generation
Back End Optimizations
Register Allocation
Machine Description
Garbage Collection Safety
System Integration
Graphical Interface
Line Counts
Systems Using MLRISC
Future Work
Architecture of MLRISC
The MLTREE Language
MLTree Extensions
MLTree Utilities
Instruction Selection
Machine Code Emitters
Delay Slot Filling
Span Dependency Resolution
The Graph Library
The Graph Visualization Library
Basic Compiler Graphs
SSA Optimizations
ILP Optimizations
Optimizations for VLIW/EPIC Architectur...
Register Allocator
Back Ends
The Alpha Back End
The PA RISC Back End
The Sparc Back End
The Intel x86 Back End
The PowerPC Back End
The MIPS Back End
The TI C6x Back End
Basic Types
Client Defined Constants
Client Defined Pseudo Ops
Instruction Streams
Label Expressions


-Control Flow Graph
-Low-level Interface
-Building a CFG
--Directly from Instructions
--Cluster to CFG
--CFG to Cluster
-Basic CFG Transformations
-Dataflow Analysis
-Static Branch Prediction
-Branch Optimizations


In this section we will describe the MLRISC intermediate representation.

Control Flow Graph

The control flow graph is the main view of the IR. A control flow graph satisfies the following signature:
  signature CONTROL_FLOW_GRAPH = sig
    structure I : INSTRUCTIONS
    structure P : PSEUDO_OPS
    structure C : CELLS
    structure W : FIXED_POINT 
       sharing I.C = C
The following structures nested within a CFG:
  • I : INSTRUCTIONS is the instruction structure.
  • P : PSEUDO_OPS is the structure with the definition of pseudo ops.
  • C : CELLS is the cells structure describing the register conventions of the architecture.
  • W : FIXED_POINT is a structure that contains a fixed point type used in execution frequency annotations.

The type weight below is used in execution frequency annotations:

    type weight = W.fixed_point
There are a few different kinds of basic blocks, described by the type block_kind below:
    datatype block_kind = 
      | STOP          
      | NORMAL        
      | HYPERBLOCK   
A basic block is defined as the datatype block, defined below:
    and data = LABEL  of Label.label
             | PSEUDO of P.pseudo_op
    and block = 
       BLOCK of
       {  id          : int,                      
          kind        : block_kind,                 
          name        :,                    
          freq        : weight ref,                
          data        : data list ref,             
          labels      : Label.label list ref,     
          insns       : I.instruction list ref,     
          annotations : Annotations.annotations ref 
Edges in a CFG are annotated with the type edge_info, defined below:
    and edge_kind = ENTRY           
                  | EXIT           
                  | JUMP          
                  | FALLSTHRU     
                  | BRANCH of bool
                  | SWITCH of int 
                  | SIDEEXIT of int 
    and edge_info = 
        EDGE of { k : edge_kind,                 
                  w : weight ref,               
                  a : Annotations.annotations ref
Type cfg below defines a control flow graph:
    type edge = edge_info edge
    type node = block node
    datatype info = 
        INFO of { regmap      : C.regmap,
                  annotations : Annotations.annotations ref,
                  firstBlock  : int ref,
                  reorder     : bool ref
    type cfg = (block,edge_info,info) graph

Low-level Interface

The following subsection describes the low-level interface to a CFG. These functions should be used with care since they do not always maintain high-level structural invariants imposed on the representation. In general, higher level interfaces exist so knowledge of this interface is usually not necessary for customizing MLRISC.

Various kinds of annotations on basic blocks are defined below:

    exception LIVEOUT of C.cellset   
    exception CHANGED of unit -> unit
    exception CHANGEDONCE of unit -> unit
The annotation LIVEOUT is used record live-out information on an escaping block. The annotations CHANGED and CHANGEDONCE are used internally for maintaining views on a CFG. These should not be used directly.

The following are low-level functions for building new basic blocks. The functions newXXX build empty basic blocks of a specific type. The function defineLabel returns a label to a basic block; and if one does not exist then a new label will be generated automatically. The functions emit and show_block are low-level routines for displaying a basic block.

    val newBlock          : int * -> block      
    val newStart          : int -> block              
    val newStop           : int -> block             
    val newFunctionEntry  : int -> block            
    val copyBlock         : int * block -> block   
    val defineLabel       : block -> Label.label  
    val emit              : C.regmap -> block -> unit
    val show_block        : C.regmap -> block -> string 
Methods for building a CFG are listed as follows:
    val cfg      : info -> cfg    
    val new      : C.regmap -> cfg
    val subgraph : cfg -> cfg     
    val init     : cfg -> unit    
    val changed  : cfg -> unit   
    val removeEdge : cfg -> edge -> unit
Again, these methods should be used only with care.

The following functions allow the user to extract low-level information from a flowgraph. Function regmap returns the current register map. Function regmap returns a function that lookups the current register map. Function liveOut returns liveOut information from a block; it returns the empty cellset if the block is not an escaping block. Function fallsThruFrom takes a node id v and locates the block u (if any) that flows into v without going through a branch instruction. Similarly, the function fallsThruTo takes a node id u and locates the block (if any) that u flows into with going through a branch instruction. If u falls through to v in any feasible code layout u must preceed v.

    val regmap    : cfg -> C.regmap
    val reglookup : cfg -> C.register -> C.register
    val liveOut   : block -> C.cellset
    val fallsThruFrom : cfg * node_id -> node_id option
    val fallsThruTo   : cfg * node_id -> node_id option
To support graph viewing of a CFG, the following low-level primitives are provided:
    val viewStyle      : cfg -> (block,edge_info,info)
    val viewLayout     : cfg -> GraphLayout.layout
    val headerText     : block -> string
    val footerText     : block -> string
    val subgraphLayout : { cfg : cfg, subgraph : cfg } -> GraphLayout.layout
Finally, a miscellany function for control dependence graph building.
    val cdgEdge : edge_info -> bool


The MLRISC intermediate representation is a composite view of various compiler data structures, including the control flow graph, (post-)dominator trees, control dependence graph, and loop nesting tree. Basic compiler optimizations in MLRISC operate on this data structure; advance optimizations operate on more complex representations which use this representation as the base layer.
Click to enlarge
This IR provides a few additional functionalities:
  • Edge frequencies -- execution frequencies are maintained on all control flow edges.
  • Extensible annotations -- semantics information can be represented as annotations on the graph.
  • Multiple facets -- Facets are high-level views that automatically keep themselves up-to-date. Computed facets are cached and out-of-date facets are recomputed by demand. The IR defines a mechanism to attach multiple facets to the IR.

The signature of the IR is listed below

  signature MLRISC_IR = sig
    structure I    : INSTRUCTIONS
    structure CFG  : CONTROL_FLOW_GRAPH
    structure Dom  : DOMINATOR_TREE
    structure Loop : LOOP_STRUCTURE
    structure Util : CFG_UTIL
       sharing Util.CFG = CFG
       sharing CFG.I = I 
       sharing Loop.Dom = CDG.Dom = Dom
    type cfg  = CFG.cfg  
    type IR   = CFG.cfg  
    type dom  = (CFG.block,CFG.edge_info, Dom.dominator_tree
    type pdom = (CFG.block,CFG.edge_info, Dom.postdominator_tree
    type cdg  = (CFG.block,CFG.edge_info, CDG.cdg
    type loop = (CFG.block,CFG.edge_info, Loop.loop_structure
    val dom   : IR -> dom
    val pdom  : IR -> pdom
    val cdg   : IR -> cdg
    val loop  : IR -> loop
    val changed : IR -> unit  
    val memo : (IR -> 'facet) -> IR -> 'facet
    val addLayout : string -> (IR -> GraphLayout.layout) -> unit
    val view : string -> IR -> unit      
    val views : string list -> IR -> unit      
    val viewSubgraph : IR -> cfg -> unit 
The following facets are predefined: dominator, post-dominator tree, control dependence graph and loop nesting structure. The functions dom, pdom, cdg, loop are facet extraction methods that compute up-to-date views of these facets.

The following protocol is used for facets:

  • When the IR is changed, the function changed should be called to signal that all facets attached to the IR should be updated.
  • To add a new facet of type F that is computed by demand, the programmer has to provide a facet construction function f : IR -> F. Call the function mem to register the new facet. For example, let val g = memo f. Then the function g can be used to as a new facet extraction function for facet F.
  • To register a graph viewing function, call the function addLayout and provide an appropriate graph layout function. For example, we can say addLayout "F" layoutF to register a graph layout function for a facet called ``F''.

To view an IR, the functions view, views or viewSubgraph can be used. They have the following interpretation:

  • view computes a layout for one facet of the IR and displays it. The predefined facets are called ``dom'', ``pdom'', ``cdg'', ``loop.'' The IR can be viewed as the facet ``cfg.'' In addition, there is a layout named ``doms'' which displays the dominator tree and the post-dominator tree together, with the post-dominator inverted.
  • views computes a set of facets and displays it together in one single picture.
  • viewSubgraph layouts a subgraph of the IR. This creates a picture with the subgraph highlighted and embedded in the whole IR.

Building a CFG

There are two basic methods of building a CFG:
  • convert a sequence of machine instructions into a CFG through the emitter interface, described below, and
  • convert it from a cluster, which is the basic linearized representation used in the MLRISC system.
The first method requires you to perform instruction selection from a compiler front-end, but allows you to bypass all other MLRISC phases if desired. The second method allows you to take advantage of various MLRISC's instruction selection modules currently available. We describe these methods in this section.

Directly from Instructions

Signature CODE_EMITTER below describes an abstract emitter interface for accepting a linear stream of instructions from a source and perform a sequence of actions based on this stream\footnote{Unlike the signature { EMITTER_NEW} or { FLOWGRAPH_GEN}, it has the advantage that it is not tied into any form of specific flowgraph representation.}.

  signature CODE_EMITTER = sig 
    structure I : INSTRUCTIONS
    structure C : CELLS
    structure P : PSEUDO_OPS
       sharing I.C = C
    type emitter =
    {  defineLabel : Label.label -> unit,   
       entryLabel  : Label.label -> unit,   
       exitBlock   : C.cellset -> unit,    
       pseudoOp    : P.pseudo_op -> unit,  
       emitInstr   : I.instruction -> unit, 
       comment     : string -> unit,        
       init        : int -> unit,           
       finish      : unit -> unit   
The code emitter interface has the following informal protocol.
Initializes the emitter and signals that the back-end should allocate space for n bytes of machine code. The number is ignored for non-machine code back-ends.
Defines a new label l at the current position.
Defines a new entry label l at the current position. An entry label defines an entry point into the current flow graph. Note that multiple entry points are allowed
Defines an exit at the current position. The cellset c represents the live-out information
Emits an pseudo op p at the current position
Emits an instruction i at the current position
Changes the block name to b
Emits a comment msg at the current position
Signals that the use of the emitter is finished. The emitter is free to perform its post-processing functions. When this is finished the CFG is built.
The functor ControlFlowGraphGen below can be used to create a CFG builder that uses the CODE_EMITTER interface.
  signature CONTROL_FLOW_GRAPH_GEN = sig
    structure CFG     : CONTROL_FLOW_GRAPH
    structure Emitter : CODE_EMITTER
        sharing Emitter.I = CFG.I
        sharing Emitter.P = CFG.P
    val emitter : CFG.cfg -> Emitter.emitter
  functor ControlFlowGraphGen
     (structure CFG     : CONTROL_FLOW_GRAPH
      structure Emitter : CODE_EMITTER
      structure P       : INSN_PROPERTIES
          sharing CFG.I = Emitter.I = P.I
          sharing CFG.P = Emitter.P
          sharing CFG.B = Emitter.B

Cluster to CFG

The core MLRISC system implements many instruction selection front-ends. The result of an instruction selection module is a linear code layout block called a cluster. The functor Cluster2CFG below generates a translator that translates a cluster into a CFG:
  signature CLUSTER2CFG = sig
    structure CFG : CONTROL_FLOW_GRAPH
    structure F   : FLOWGRAPH
       sharing CFG.I = F.I
       sharing CFG.P = F.P
       sharing CFG.B = F.B
    val cluster2cfg : F.cluster -> CFG.cfg
  functor Cluster2CFG
    (structure CFG : CONTROL_FLOW_GRAPH 
     structure F   : FLOWGRAPH
     structure P   : INSN_PROPERTIES
        sharing CFG.I = F.I = P.I 
        sharing CFG.P = F.P
        sharing CFG.B = F.B
    ) : CLUSTER2CFG 

CFG to Cluster

The basic MLRISC system also implements many back-end functions such as register allocation, assembly output and machine code output. These modules all utilize the cluster representation. The functor CFG2Cluster below generates a translator that converts a CFG into a cluster. With the previous functor, the CFG and the cluster presentation can be freely inter-converted.
  signature CFG2CLUSTER = sig
    structure CFG : CONTROL_FLOW_GRAPH
    structure F   : FLOWGRAPH
       sharing CFG.I = F.I
       sharing CFG.P = F.P
       sharing CFG.B = F.B
    val cfg2cluster : { cfg : CFG.cfg, relayout : bool } -> F.cluster
  functor CFG2Cluster
    (structure CFG  : CONTROL_FLOW_GRAPH
     structure F    : FLOWGRAPH
        sharing CFG.I = F.I
        sharing CFG.P = F.P
        sharing CFG.B = F.B
     val patchBranch : {instr:CFG.I.instruction, backwards:bool} -> 
                          CFG.I.instruction list
When a CFG originates from a cluster, we try to preserve the same code layout through out all optimizations when possible. The function cfg2cluster takes an optional flag that specifies we should force the recomputation of the code layout of a control flow graph when translating a CFG back into a cluster.

Basic CFG Transformations

Basic CFG transformations are implemented in the functor CFGUtil. These transformations include splitting edges, merging edges, removing unreachable code and tail duplication.
    functor CFGUtil
       (structure CFG : CONTROL_FLOW_GRAPH
        structure P   : INSN_PROPERTIES
           sharing P.I = CFG.I
       ) : CFG_UTIL
The signature of CFGUtil is defined below:
  signature CFG_UTIL = sig
     structure CFG : CONTROL_FLOW_GRAPH
     val updateJumpLabel : CFG.cfg -> node_id -> unit
     val mergeEdge       : CFG.cfg -> CFG.edge -> bool
     val eliminateJump   : CFG.cfg -> node_id -> bool
     val insertJump      : CFG.cfg -> node_id -> bool
     val splitEdge  : CFG.cfg -> { edge : CFG.edge, jump : bool }
                       -> { edge : CFG.edge, node : CFG.node }
     val isMerge        : CFG.cfg -> node_id -> bool
     val isSplit        : CFG.cfg -> node_id -> bool
     val hasSideExits   : CFG.cfg -> node_id -> bool
     val isCriticalEdge : CFG.cfg -> CFG.edge -> bool
     val splitAllCriticalEdges : CFG.cfg -> unit
     val ceed : CFG.cfg -> node_id * node_id -> bool
     val tailDuplicate : CFG.cfg -> { subgraph : CFG.cfg, root : node_id } 
                                 -> { nodes : CFG.node list, 
                                      edges : CFG.edge list } 
     val removeUnreachableCode : CFG.cfg -> unit
     val mergeAllEdges : CFG.cfg -> unit
These functions have the following meanings:
  • updateJumpLabel G u. This function updates the label of the branch instruction in a block u to be consistent with the control flow edges with source u. This is an nop if the CFG is already consistent.
  • mergeEdge G e. This function merges edge e == u -> v in the graph G if possible. This is successful only if there are no other edges flowing into v and no other edges flowing out from u. It returns true if the merge operation is successful. If successful, the nodes u and v will be coalesced into the block u. The jump instruction (if any) in the node u will also be elided.
  • eliminateJump G u. This function eliminate the jump instruction at the end of block u if it is feasible.
  • insertJump G u. This function inserts a jump instruction in block u if it is feasible.
  • splitEdge G e. This function split the control flow edge e, and return a new edge e' and the new block u as return values. It addition, it takes as argument a flag jump. If this flag is true, then a jump instruction is always placed in the split; otherwise, we try to eliminate the jump when feasible.
  • isMerge G u. This function tests whether block u is a merge node. A merge node is a node that has two or more incoming flow edges.
  • isSplit G u. This function tests whether block u is a split node. A split node is a node that has two or more outgoing flow edges.
  • hasSideExits G u. This function tests whether a block has side exits G. This assumes that u is a hyperblock.
  • isCriticalEdge G e. This function tests whether the edge e is a critical edge. The edge e == u -> v is critical iff there are u is merge node and v is a split node.
  • splitAllCriticalEdges G. This function goes through the CFG G and splits all critical edges in the CFG. This can introduce extra jumps and basic blocks in the program.
  • mustPreceed G (u,v). This function checks whether two blocks u and v are necessarily adjacent. Blocks u and v must be adjacent iff u must preceed v in any feasible code layout.
  • tailDuplicate.
         val tailDuplicate : CFG.cfg -> { subgraph : CFG.cfg, root : node_id } 
                                     -> { nodes : CFG.node list, 
                                          edges : CFG.edge list } 
    Click to enlarge
    This function tail-duplicates the region subgraph until it only has a single entry root. Return the set of new nodes and new edges. The region is represented as a subgraph view of the CFG. Figure Basic CFG Transformations illustrates this transformation.

  • removeUnreachableCode G. This function removes all unreachable code from the graph.
  • mergeAllEdges G. This function tries to merge all the edges in the flowgraph G. Merging is performed in the non-increasing order of edge frequencies.

Dataflow Analysis

MLRISC provides a simple customizable module for performing iterative dataflow analysis. A dataflow analyzer has the following signature:

  signature DATAFLOW_ANALYZER = sig
    structure CFG : CONTROL_FLOW_GRAPH
    type dataflow_info
    val analyze : CFG.cfg * dataflow_info -> dataflow_info
A dataflow problem is described by the signature DATAFLOW_PROBLEM, described below:
  signature DATAFLOW_PROBLEM = sig
    structure CFG : CONTROL_FLOW_GRAPH
    type domain
    type dataflow_info
    val forward   : bool
    val bot       : domain
    val ==        : domain * domain -> bool
    val join      : domain list -> domain
    val prologue  : CFG.cfg * dataflow_info ->
                        CFG.block node ->
                            { input    : domain,
                              output   : domain,
                              transfer : domain -> domain
    val epilogue  : CFG.cfg * dataflow_info ->
                        { node   : CFG.block node,
                          input  : domain,
                          output : domain
                        } -> unit
This description contains the following items
  • type domain is the abstract lattice domain D.
  • type dataflow_info is where the dataflow information is stored.
  • forward is true iff the dataflow problem is in the forward direction
  • bot is the bottom element of D.
  • == is the equality function on D.
  • join is the least-upper-bound function on D.
  • prologue is a user-supplied function that performs pre-processing and setup. For each CFG node X, this function computes
    • input -- which is the initial input value of X
    • output -- which is the initial output value of X
    • transfer -- which is the transfer function on X.
  • epilogue is a function that performs post-processing. It visits each node X in the flowgraph and return the resulting input and output value for X.

To generate a new dataflow analyzer from a dataflow problem, the functor Dataflow can be used:


Static Branch Prediction

Branch Optimizations

Lal George
Allen Leung
SML/NJ Validate this page
Generated by mltex2html
Last modified: Thu Jan 9 19:38:15 EST 2003 by leunga@slinky