MLRISC
MLRISC
Contributors
Requirements
How to Obtain MLRISC
Overview
Problem Statement
Contributions
MLRISC Based Compiler
MLRISC Intermediate Representation
MLRisc Generation
Back End Optimizations
Register Allocation
Machine Description
Garbage Collection Safety
System Integration
Optimizations
Graphical Interface
Line Counts
Systems Using MLRISC
Future Work
System
Architecture of MLRISC
The MLTREE Language
MLTree Extensions
MLTree Utilities
Instruction Selection
Assemblers
Machine Code Emitters
Delay Slot Filling
Span Dependency Resolution
The Graph Library
The Graph Visualization Library
Basic Compiler Graphs
The MLRISC IR
SSA Optimizations
ILP Optimizations
Optimizations for VLIW/EPIC Architectur...
Register Allocator
Back Ends
The Alpha Back End
The PA RISC Back End
The Sparc Back End
The Intel x86 Back End
The PowerPC Back End
The MIPS Back End
The TI C6x Back End
Basic Types
Annotations
Cells
Cluster
Client Defined Constants
Client Defined Pseudo Ops
Instructions
Instruction Streams
Label Expressions
Labels
Regions
Regmap

The MLTREE Language


The MLTREE Language
The Definitions
-Basic Types
-The Basis
Integer Expressions
-Sign and Zero Extension
-Conditional Move
-Integer Loads
-Miscellaneous Integer Operators
Floating Point Expressions
Condition Expressions
Statements
-Assignments
-Parallel Copies
-Jumps and Conditional Branches
-Calls and Returns
-Stores
-Miscelleneous Statements
Annotations
MLTree is the register transfer language used in the MLRISC system. It serves two important purposes: MLTree
  1. As an intermediate representation for a compiler front-end to talk to the MLRISC system,
  2. As specifications for instruction semantics
The latter is needed for optimizations which require precise knowledge of such; for example, algebraic simplification and constant folding.

MLTree is a low-level typed language: all operations are typed by its width or precision. Operations on floating point, integer, and condition code are also segregated, to prevent accidental misuse. MLTree is also tree-oriented so that it is possible to write efficient MLTree transformation routines that uses SML pattern matching.

Here are a few examples of MLTree statements.

    MV(32,t,
       ADDT(32,
         MULT(32,REG(32,b),REG(32,b)),
         MULT(32,
           MULT(32,LI(4),REG(32,a)),REG(32,c))))
 
computes t := b*b + 4*a*c, all in 32-bit precision and overflow trap enabled; while
    MV(32,t,
       ADD(32,
         CVTI2I(32,SIGN_EXTEND,8,
           LOAD(8,
             ADD(32,REG(32,a),REG(32,i))))))
 
loads the byte in address a+i and sign extend it to a 32-bit value.

The statement

    IF([],CMP(64,GE,REG(64,a),LI 0),
          MV(64, t, REG(64, a)),
          MV(64, t, NEG(64, REG(64, a)))
      )
 
in more traditional form means:
    if a >= 0 then 
       t := a
    else
       t := -a
 
This example can be also expressed in a few different ways:
  1. With the conditional move construct described in Section Conditional Move:
         MV(64, t, 
            COND(CMP(64, GE, REG(64, a)), 
                 REG(64, a), 
                 NEG(64, REG(64, a))))
          
  2. With explicit branching using the conditional branch construct BCC:
          MV(64, t, REG(64, a));
          BCC([], CMP(64, GE, REG(64, a)), L1);
          MV(64, t, NEG(64, REG(64, a)));
          DEFINE L1;
         

The Definitions

MLTree is defined in the signature MLTREE and the functor MLTreeF

The functor MLTreeF is parameterized in terms of the label expression type, the client supplied region datatype, the instruction stream type, and the client defined MLTree extensions.

   functor MLTreeF
     (structure LabelExp : LABELEXP
      structure Region : REGION
      structure Stream : INSTRUCTION_STREAM
      structure Extension : MLTREE_EXTENSION
     ) : MLTREE
 

Basic Types

The basic types in MLTree are statements (stm) integer expressions (rexp), floating point expression (fexp), and conditional expressions (ccexp). Statements are evaluated for their effects, while expressions are evaluated for their value. (Some expressions could also have trapping effects. The semantics of traps are unspecified.) These types are parameterized by an extension type, which we can use to extend the set of MLTree operators. How this is used is described in Section MLTree Extensions.

References to registers are represented internally as integers, and are denoted as the type reg. In addition, we use the types src and dst as abbreviations for source and destination registers.

    type reg = int
    type src = reg
    type dst = reg
 
All operators on MLTree are typed by the number of bits that they work on. For example, 32-bit addition between a and b is written as ADD(32,a,b), while 64-bit addition between the same is written as ADD(64,a,b). Floating point operations are denoted in the same manner. For example, IEEE single-precision floating point add is written as FADD(32,a,b), while the same in double-precision is written as FADD(64,a,b)

Note that these types are low level. Higher level distinctions such as signed and unsigned integer value, are not distinguished by the type. Instead, operators are usually partitioned into signed and unsigned versions, and it is legal (and often useful!) to mix signed and unsigned operators in an expression.

Currently, we don't provide a direct way to specify non-IEEE floating point together with IEEE floating point arithmetic. If this distinction is needed then it can be encoded using the extension mechanism described in Section MLTree Extensions.

We use the types ty and fty to stand for the number of bits in integer and floating point operations.

   type ty  = int
   type fty = int
 

The Basis

The signature MLTREE_BASIS defines the basic helper types used in the MLTREE signature.
 signature MLTREE_BASIS =
 sig
  
   datatype cond = LT | LTU | LE | LEU | EQ | NE | GE | GEU | GT | GTU 
 
   datatype fcond = 
      ? | !<=> | == | ?= | !<> | !?>= | < | ?< | !>= | !?> |
      <= | ?<= | !> | !?<= | > | ?> | !<= | !?< | >= | ?>= |
      !< | !?= | <> | != | !? | <=> | ?<>
 
   datatype ext = SIGN_EXTEND | ZERO_EXTEND
 
   datatype rounding_mode = TO_NEAREST | TO_NEGINF | TO_POSINF | TO_ZERO
 
   type ty = int
   type fty = int
 
 end
 
The most important of these are the types cond and fcond, which represent the set of integer and floating point comparisions. These types can be combined with the comparison constructors CMP and FCMP to form integer and floating point comparisions.
Operator Comparison
LT Signed less than
LTU Unsigned less than
LE Signed less than or equal
LEU Unsigned less than or equal
EQ Equal
NE Not equal
GE Signed greater than or equal
GEU Unsigned greater than or equal
GT Signed greater than
GTU Unsigned greater than
Floating point comparisons can be ``decoded'' as follows. In IEEE floating point, there are four different basic comparisons tests that we can performed given two numbers a and y:
a < b
Is a less than b?
a = b
Is a equal to b?
a > b
Is a greater than to b?
a ? b
Are a and b unordered (incomparable)?
Comparisons can be joined together. For example, given two double-precision floating point expressions a and b, the expression FCMP(64,<=>,a,b) asks whether a is less than, equal to or greater than b, i.e.~whether a and b are comparable. The special symbol ! negates the meaning the of comparison. For example, FCMP(64,!>=,a,b) means testing whether a is less than or incomparable with b.

Integer Expressions

A reference to the ith integer register with an n-bit value is written as REG(n,i). The operators LI, LI32, and LABEL, CONST are used to represent constant expressions of various forms. The sizes of these constants are inferred from context.
  
   REG   : ty * reg -> rexp
   LI    : int -> rexp
   LI32  : Word32.word -> rexp
   LABEL : LabelExp.labexp -> rexp
   CONST : Constant.const -> rexp
 
The following figure lists all the basic integer operators and their intuitive meanings. All operators except NOTB, NEG, NEGT are binary and have the type
   ty * rexp * rexp -> rexp
 
The operators NOTB, NEG, NEGT have the type
   ty * rexp -> rexp
 
ADD Twos complement addition
NEG negation
SUB Twos complement subtraction
MULS Signed multiplication
DIVS Signed division, round to zero (nontrapping)
QUOTS Signed division, round to negative infinity (nontrapping)
REMS Signed remainder (???)
MULU Unsigned multiplication
DIVU Unsigned division
REMU Unsigned remainder
NEGT signed negation, trap on overflow
ADDT Signed addition, trap on overflow
SUBT Signed subtraction, trap on overflow
MULT Signed multiplication, trap on overflow
DIVT Signed division, round to zero, trap on overflow or division by zero
QUOTT Signed division, round to negative infinity, trap on overflow or division by zero
REMT Signed remainder, trap on division by zero
ANDB bitwise and
ORB bitwise or
XORB bitwise exclusive or
NOTB ones complement
SRA arithmetic right shift
SRL logical right shift
SLL logical left shift

Sign and Zero Extension

Sign extension and zero extension are written using the operator CVTI2I. CVTI2I(m,SIGN_EXTEND,n,e) sign extends the n-bit value e to an m-bit value, i.e. the n-1th bit is of e is treated as the sign bit. Similarly, CVTI2I(m,ZERO_EXTEND,n,e) zero extends an n-bit value to an m-bit value. If m <= n, then CVTI2I(m,SIGN_EXTEND,n,e) = CVTI2I(m,ZERO_EXTEND,n,e).

     datatype ext = SIGN_EXTEND | ZERO_EXTEND
     CVTI2I : ty * ext * ty * rexp -> rexp 
 

Conditional Move

Most new superscalar architectures incorporate conditional move instructions in their ISAs. Modern VLIW architectures also directly support full predication. Since branching (especially with data dependent branches) can introduce extra latencies in highly pipelined architectures, condtional moves should be used in place of short branch sequences. MLTree provide a conditional move instruction COND, to make it possible to directly express conditional moves without using branches.
    COND : ty * ccexp * rexp * rexp -> rexp
 
Semantically, COND(ty,cc,a,b) means to evaluate cc, and if cc evaluates to true then the value of the entire expression is a; otherwise the value is b. Note that a and b are allowed to be eagerly evaluated. In fact, we are allowed to evaluate to both branches, one branch, or neither~\footnote{When possible.}.

Various idioms of the COND form are useful for expressing common constructs in many programming languages. For example, MLTree does not provide a primitive construct for converting an integer value x to a boolean value (0 or 1). But using COND, this is expressible as COND(32,CMP(32,NE,x,LI 0),LI 1,LI 0). SML/NJ represents the boolean values true and false as machine integers 3 and 1 respectively. To convert a boolean condition e into an ML boolean value, we can use

    COND(32,e,LI 3,LI 1)
 
Common C idioms can be easily mapped into the COND form. For example,
  • if (e1) x = y translates into MV(32,x,COND(32,e1,REG(32,y),REG(32,x)))
  •       x = e1; 
          if (e2) x = y
        
    translates into MV(32,x,COND(32,e2,REG(32,y),e1))
  • x = e1 == e2 translates into MV(32,x,COND(32,CMP(32,EQ,e1,e2),LI 1,LI 0)
  • x = ! e translates into MV(32,x,COND(32,CMP(32,NE,e,LI 0),LI 1,LI 0)
  • x = e ? y : z translates into MV(32,x,COND(32,e,REG(32,y),REG(32,z))), and
  • x = y < z ? y : z translates into
          MV(32,x,
              COND(32,
                 CMP(32,LT,REG(32,y),REG(32,z)),
                    REG(32,y),REG(32,z)))
        

In general, the COND form should be used in place of MLTree's branching constructs whenever possible, since the former is usually highly optimized in various MLRISC backends.

Integer Loads

Integer loads are written using the constructor LOAD.
    LOAD  : ty * rexp * Region.region -> rexp
 
The client is required to specify a region that serves as aliasing information for the load.

Miscellaneous Integer Operators

An expression of the LET(s,e) evaluates the statement s for its effect, and then return the value of expression e.
   LET  : stm * rexp -> rexp
 
Since the order of evaluation is MLTree operators are unspecified the use of this operator should be severely restricted to only side-effect-free forms.

Floating Point Expressions

Floating registers are referenced using the term FREG. The ith floating point register with type n is written as FREG(n,i).
    FREG   : fty * src -> fexp
 
Built-in floating point operations include addition (FADD), subtraction (FSUB), multiplication (FMUL), division (FDIV), absolute value (FABS), negation (FNEG) and square root (FSQRT).
    FADD  : fty * fexp * fexp -> fexp
    FSUB  : fty * fexp * fexp  -> fexp
    FMUL  : fty * fexp * fexp -> fexp
    FDIV  : fty * fexp * fexp -> fexp
    FABS  : fty * fexp -> fexp
    FNEG  : fty * fexp -> fexp
    FSQRT : fty * fexp -> fexp
 
A special operator is provided for manipulating signs. To combine the sign of a with the magnitude of b, we can write FCOPYSIGN(a,b)\footnote{What should happen if a or b is nan?}.
    FCOPYSIGN : fty * fexp * fexp -> fexp
 
To convert an n-bit signed integer e into an m-bit floating point value, we can write CVTI2F(m,n,e)\footnote{What happen to unsigned integers?}.
    CVTI2F : fty * ty * rexp -> fexp
 
Similarly, to convert an n-bit floating point value e to an m-bit floating point value, we can write CVTF2F(m,n,e)\footnote{ What is the rounding semantics?}.
    CVTF2F : fty * fty * -> fexp
 
   datatype rounding_mode = TO_NEAREST | TO_NEGINF | TO_POSINF | TO_ZERO
   CVTF2I : ty * rounding_mode * fty * fexp -> rexp
 
    FLOAD : fty * rexp * Region.region -> fexp
 

Condition Expressions

Unlike languages like C, MLTree makes the distinction between condition expressions and integer expressions. This distinction is necessary for two purposes:
  • It clarifies the proper meaning intended in a program, and
  • It makes to possible for a MLRISC backend to map condition expressions efficiently onto various machine architectures with different condition code models. For example, architectures like the Intel x86, Sparc V8, and PowerPC contains dedicated condition code registers, which are read from and written to by branching and comparison instructions. On the other hand, architectures such as the Texas Instrument C6, PA RISC, Sparc V9, and Alpha does not include dedicated condition code registers. Conditional code registers in these architectures can be simulated by integer registers.

A conditional code register bit can be referenced using the constructors CC and FCC. Note that the condition must be specified together with the condition code register.

    CC   : Basis.cond * src -> ccexp 
    FCC  : Basis.fcond * src -> ccexp    
 
For example, to test the Z bit of the %psr register on the Sparc architecture, we can used CC(EQ,SparcCells.psr).

The comparison operators CMP and FCMP performs integer and floating point tests. Both of these are typed by the precision in which the test must be performed under.

    CMP  : ty * Basis.cond * rexp * rexp -> ccexp  
    FCMP : fty * Basis.fcond * fexp * fexp -> ccexp
 
Condition code expressions may be combined with the following logical connectives, which have the obvious meanings.
    TRUE  : ccexp 
    FALSE : ccexp 
    NOT   : ccexp -> ccexp 
    AND   : ccexp * ccexp -> ccexp 
    OR    : ccexp * ccexp -> ccexp 
    XOR   : ccexp * ccexp -> ccexp 
 

Statements

Statement forms in MLTree includes assignments, parallel copies, jumps and condition branches, calls and returns, stores, sequencing, and annotation.

Assignments

Assignments are segregated among the integer, floating point and conditional code types. In addition, all assignments are typed by the precision of destination register.

    MV   : ty * dst * rexp -> stm
    FMV  : fty * dst * fexp -> stm
    CCMV : dst * ccexp -> stm
 

Parallel Copies

Special forms are provided for parallel copies for integer and floating point registers. It is important to emphasize that the semantics is that all assignments are performed in parallel.

    COPY  : ty * dst list * src list -> stm
    FCOPY : fty * dst list * src list -> stm
 

Jumps and Conditional Branches

Jumps and conditional branches in MLTree take two additional set of annotations. The first represents the control flow and is denoted by the type controlflow. The second represent control-dependence and anti-control-dependence and is denoted by the type ctrl.

    type controlflow = Label.label list
    type ctrl = reg list
 
Control flow annotation is simply a list of labels, which represents the set of possible targets of the associated jump. Control dependence annotations attached to a branch or jump instruction represents the new definition of pseudo control dependence predicates. These predicates have no associated dynamic semantics; rather they are used to constraint the set of potential code motion in an optimizer (more on this later).

The primitive jumps and conditional branch forms are represented by the constructors JMP, BCC.

    JMP : ctrl * rexp * controlflow  -> stm
    BCC : ctrl * ccexp * Label.label -> stm
 
In addition to JMP and BCC, there is a structured if/then/else statement.
    IF  : ctrl * ccexp * stm * stm -> stm
 
Semantically, IF(c,x,y,z) is identical to
    BCC(c, x, L1)
    z
    JMP([], L2)
    DEFINE L1
    y
    DEFINE L2
 
where L1 and L2 are new labels, as expected.

Here's an example of how control dependence predicates are used. Consider the following MLTree statement:

    IF([p], CMP(32, NE, REG(32, a), LI 0),
         MV(32, b, PRED(LOAD(32, m, ...)), p),
         MV(32, b, LOAD(32, n, ...)))
 
In the first alternative of the IF, the LOAD expression is constrainted by the control dependence predicate p defined in the IF, using the predicate constructor PRED. These states that the load is control dependent on the test of the branch, and thus it may not be legally hoisted above the branch without potentially violating the semantics of the program. For example, semantics violation may happen if the value of m and a is corrolated, and whenever a = 0, the address in m is not a legal address.

Note that on architectures with speculative loads, the control dependence information can be used to guide the transformation of control dependent loads into speculative loads.

Now in constrast, the LOAD in the second alternative is not control dependent on the control dependent predicate p, and thus it is safe and legal to hoist the load above the test, as in

    MV(32, b, LOAD(32, n, ...));
    IF([p], CMP(32, NE, REG(32, a), LI 0),
         MV(32, b, PRED(LOAD(32, m, ...)), p),
         SEQ []
      )
 
Of course, such transformation is only performed if the optimizer phases think that it can benefit performance. Thus the control dependence information does not directly specify any transformations, but it is rather used to indicate when aggressive code motions are legal and safe.

Calls and Returns

Calls and returns in MLTree are specified using the constructors CALL and RET, which have the following types.
    CALL : rexp * controlflow * mlrisc * mlrisc * 
           ctrl * ctrl * Region.region -> stm
    RET  : ctrl * controlflow -> stm
 
The CALL form is particularly complex, and require some explanation. Basically the seven parameters are, in order:
address
of the called routine.
control flow
annotation for this call. This information specifies the potential targets of this call instruction. Currently this information is ignored but will be useful for interprocedural optimizations in the future.
definition and use
These lists specify the list of potential definition and uses during the execution of the call. Definitions and uses are represented as the type mlrisc list. The contructors for this type is:
   CCR : ccexp -> mlrisc
   GPR : rexp -> mlrisc
   FPR : fexp -> mlrisc
 
definition of control and anti-control dependence
These two lists specifies definitions of control and anti-control dependence.
region
annotation for the call, which summarizes the set of potential memory references during execution of the call.

The matching return statement constructor RET has two arguments. These are:

anti-control dependence
This parameter represents the set of anti-control dependence predicates defined by the return statement.
control flow
This parameter specifies the set of matching procedure entry points of this return. For example, suppose we have a procedure with entry points f and f'. Then the MLTree statements
   f:   ...
        JMP L1
   f':  ...
   L1:  ...
        RET ([], [f, f'])
 
can be used to specify that the return is either from the entries f or f'.

Stores

Stores to integer and floating points are specified using the constructors STORE and FSTORE.
    STORE  : ty * rexp * rexp * Region.region -> stm
    FSTORE : fty * rexp * fexp * Region.region -> stm
 
The general form is
    STORE(width, address, data, region)
 
Stores for condition codes are not provided.

Miscelleneous Statements

Other useful statement forms of MLTree are for sequencing (SEQ), defining a local label (DEFINE).
    SEQ    : stm list -> stm
    DEFINE : Label.label -> stm
 
The constructor DEFINE L has the same meaning as executing the method defineLabel L in the stream interface.

Annotations

Annotations are used as the generic mechanism for exchanging information between different phases of the MLRISC system, and between a compiler front end and the MLRISC back end. The following constructors can be used to annotate a MLTree term with an annotation:
    MARK : rexp * Annotations.annotation -> rexp
    FMARK : fexp * Annotations.annotation -> fexp
    CCMARK : ccexp * Annotations.annotation -> ccexp 
    ANNOTATION : stm * Annotations.annotation -> stm
 

Lal George
Allen Leung
SML/NJ Validate this page
Generated by mltex2html
Last modified: Thu Jan 9 19:38:15 EST 2003 by leunga@slinky