The MLTREE Language
MLTree is a lowlevel typed language: all operations are typed by its width or precision. Operations on floating point, integer, and condition code are also segregated, to prevent accidental misuse. MLTree is also treeoriented so that it is possible to write efficient MLTree transformation routines that uses SML pattern matching. Here are a few examples of MLTree statements. MV(32,t, ADDT(32, MULT(32,REG(32,b),REG(32,b)), MULT(32, MULT(32,LI(4),REG(32,a)),REG(32,c))))computes t := b*b + 4*a*c, all in 32bit precision and overflow trap enabled; while MV(32,t, ADD(32, CVTI2I(32,SIGN_EXTEND,8, LOAD(8, ADD(32,REG(32,a),REG(32,i))))))loads the byte in address a+i and sign extend it to a 32bit value. The statement IF([],CMP(64,GE,REG(64,a),LI 0), MV(64, t, REG(64, a)), MV(64, t, NEG(64, REG(64, a))) )in more traditional form means: if a >= 0 then t := a else t := aThis example can be also expressed in a few different ways:
The DefinitionsMLTree is defined in the signature MLTREE and the functor MLTreeFThe functor MLTreeF is parameterized in terms of the label expression type, the client supplied region datatype, the instruction stream type, and the client defined MLTree extensions. functor MLTreeF (structure LabelExp : LABELEXP structure Region : REGION structure Stream : INSTRUCTION_STREAM structure Extension : MLTREE_EXTENSION ) : MLTREE Basic TypesThe basic types in MLTree are statements (stm) integer expressions (rexp), floating point expression (fexp), and conditional expressions (ccexp). Statements are evaluated for their effects, while expressions are evaluated for their value. (Some expressions could also have trapping effects. The semantics of traps are unspecified.) These types are parameterized by an extension type, which we can use to extend the set of MLTree operators. How this is used is described in Section MLTree Extensions.References to registers are represented internally as integers, and are denoted as the type reg. In addition, we use the types src and dst as abbreviations for source and destination registers. type reg = int type src = reg type dst = regAll operators on MLTree are typed by the number of bits that they work on. For example, 32bit addition between a and b is written as ADD(32,a,b), while 64bit addition between the same is written as ADD(64,a,b). Floating point operations are denoted in the same manner. For example, IEEE singleprecision floating point add is written as FADD(32,a,b), while the same in doubleprecision is written as FADD(64,a,b) Note that these types are low level. Higher level distinctions such as signed and unsigned integer value, are not distinguished by the type. Instead, operators are usually partitioned into signed and unsigned versions, and it is legal (and often useful!) to mix signed and unsigned operators in an expression. Currently, we don't provide a direct way to specify nonIEEE floating point together with IEEE floating point arithmetic. If this distinction is needed then it can be encoded using the extension mechanism described in Section MLTree Extensions. We use the types ty and fty to stand for the number of bits in integer and floating point operations. type ty = int type fty = int The BasisThe signature MLTREE_BASIS defines the basic helper types used in the MLTREE signature.signature MLTREE_BASIS = sig datatype cond = LT  LTU  LE  LEU  EQ  NE  GE  GEU  GT  GTU datatype fcond = ?  !<=>  ==  ?=  !<>  !?>=  <  ?<  !>=  !?>  <=  ?<=  !>  !?<=  >  ?>  !<=  !?<  >=  ?>=  !<  !?=  <>  !=  !?  <=>  ?<> datatype ext = SIGN_EXTEND  ZERO_EXTEND datatype rounding_mode = TO_NEAREST  TO_NEGINF  TO_POSINF  TO_ZERO type ty = int type fty = int endThe most important of these are the types cond and fcond, which represent the set of integer and floating point comparisions. These types can be combined with the comparison constructors CMP and FCMP to form integer and floating point comparisions.
Integer ExpressionsA reference to the $$ith integer register with an $$nbit value is written as REG($$n,$$i). The operators LI, LI32, and LABEL, CONST are used to represent constant expressions of various forms. The sizes of these constants are inferred from context.REG : ty * reg > rexp LI : int > rexp LI32 : Word32.word > rexp LABEL : LabelExp.labexp > rexp CONST : Constant.const > rexpThe following figure lists all the basic integer operators and their intuitive meanings. All operators except NOTB, NEG, NEGT are binary and have the type ty * rexp * rexp > rexpThe operators NOTB, NEG, NEGT have the type ty * rexp > rexp
Sign and Zero ExtensionSign extension and zero extension are written using the operator CVTI2I. CVTI2I($$m,SIGN_EXTEND,$$n,$$e) sign extends the $$nbit value $$e to an $$mbit value, i.e. the $$n1th bit is of $$e is treated as the sign bit. Similarly, CVTI2I($$m,ZERO_EXTEND,$$n,$$e) zero extends an $$nbit value to an $$mbit value. If $$m <= n, then CVTI2I($$m,SIGN_EXTEND,$$n,$$e) = CVTI2I($$m,ZERO_EXTEND,$$n,$$e).
datatype ext = SIGN_EXTEND  ZERO_EXTEND CVTI2I : ty * ext * ty * rexp > rexp Conditional MoveMost new superscalar architectures incorporate conditional move instructions in their ISAs. Modern VLIW architectures also directly support full predication. Since branching (especially with data dependent branches) can introduce extra latencies in highly pipelined architectures, condtional moves should be used in place of short branch sequences. MLTree provide a conditional move instruction COND, to make it possible to directly express conditional moves without using branches.COND : ty * ccexp * rexp * rexp > rexpSemantically, COND(ty,cc,$$a,$$b) means to evaluate cc, and if cc evaluates to true then the value of the entire expression is $$a; otherwise the value is $$b. Note that $$a and $$b are allowed to be eagerly evaluated. In fact, we are allowed to evaluate to both branches, one branch, or neither~\footnote{When possible.}. Various idioms of the COND form are useful for expressing common constructs in many programming languages. For example, MLTree does not provide a primitive construct for converting an integer value x to a boolean value (0 or 1). But using COND, this is expressible as COND(32,CMP(32,NE,x,LI 0),LI 1,LI 0). SML/NJ represents the boolean values true and false as machine integers 3 and 1 respectively. To convert a boolean condition $$e into an ML boolean value, we can use COND(32,e,LI 3,LI 1)Common C idioms can be easily mapped into the COND form. For example,
In general, the COND form should be used in place of MLTree's branching constructs whenever possible, since the former is usually highly optimized in various MLRISC backends. Integer LoadsInteger loads are written using the constructor LOAD.LOAD : ty * rexp * Region.region > rexpThe client is required to specify a region that serves as aliasing information for the load. Miscellaneous Integer OperatorsAn expression of the LET($$s,$$e) evaluates the statement $$s for its effect, and then return the value of expression $$e.LET : stm * rexp > rexpSince the order of evaluation is MLTree operators are unspecified the use of this operator should be severely restricted to only sideeffectfree forms. Floating Point ExpressionsFloating registers are referenced using the term FREG. The $$ith floating point register with type $$n is written as FREG($$n,$$i).FREG : fty * src > fexpBuiltin floating point operations include addition (FADD), subtraction (FSUB), multiplication (FMUL), division (FDIV), absolute value (FABS), negation (FNEG) and square root (FSQRT). FADD : fty * fexp * fexp > fexp FSUB : fty * fexp * fexp > fexp FMUL : fty * fexp * fexp > fexp FDIV : fty * fexp * fexp > fexp FABS : fty * fexp > fexp FNEG : fty * fexp > fexp FSQRT : fty * fexp > fexpA special operator is provided for manipulating signs. To combine the sign of $$a with the magnitude of $$b, we can write FCOPYSIGN($$a,$$b)\footnote{What should happen if $$a or $$b is nan?}. FCOPYSIGN : fty * fexp * fexp > fexpTo convert an $$nbit signed integer $$e into an $$mbit floating point value, we can write CVTI2F($$m,$$n,$$e)\footnote{What happen to unsigned integers?}. CVTI2F : fty * ty * rexp > fexpSimilarly, to convert an $$nbit floating point value $$e to an $$mbit floating point value, we can write CVTF2F($$m,$$n,$$e)\footnote{ What is the rounding semantics?}. CVTF2F : fty * fty * > fexp datatype rounding_mode = TO_NEAREST  TO_NEGINF  TO_POSINF  TO_ZERO CVTF2I : ty * rounding_mode * fty * fexp > rexp FLOAD : fty * rexp * Region.region > fexp Condition ExpressionsUnlike languages like C, MLTree makes the distinction between condition expressions and integer expressions. This distinction is necessary for two purposes:
A conditional code register bit can be referenced using the constructors CC and FCC. Note that the condition must be specified together with the condition code register. CC : Basis.cond * src > ccexp FCC : Basis.fcond * src > ccexpFor example, to test the Z bit of the %psr register on the Sparc architecture, we can used CC(EQ,SparcCells.psr). The comparison operators CMP and FCMP performs integer and floating point tests. Both of these are typed by the precision in which the test must be performed under. CMP : ty * Basis.cond * rexp * rexp > ccexp FCMP : fty * Basis.fcond * fexp * fexp > ccexpCondition code expressions may be combined with the following logical connectives, which have the obvious meanings. TRUE : ccexp FALSE : ccexp NOT : ccexp > ccexp AND : ccexp * ccexp > ccexp OR : ccexp * ccexp > ccexp XOR : ccexp * ccexp > ccexp StatementsStatement forms in MLTree includes assignments, parallel copies, jumps and condition branches, calls and returns, stores, sequencing, and annotation.AssignmentsAssignments are segregated among the integer, floating point and conditional code types. In addition, all assignments are typed by the precision of destination register.
MV : ty * dst * rexp > stm FMV : fty * dst * fexp > stm CCMV : dst * ccexp > stm Parallel CopiesSpecial forms are provided for parallel copies for integer and floating point registers. It is important to emphasize that the semantics is that all assignments are performed in parallel.
COPY : ty * dst list * src list > stm FCOPY : fty * dst list * src list > stm Jumps and Conditional BranchesJumps and conditional branches in MLTree take two additional set of annotations. The first represents the control flow and is denoted by the type controlflow. The second represent controldependence and anticontroldependence and is denoted by the type ctrl.
type controlflow = Label.label list type ctrl = reg listControl flow annotation is simply a list of labels, which represents the set of possible targets of the associated jump. Control dependence annotations attached to a branch or jump instruction represents the new definition of pseudo control dependence predicates. These predicates have no associated dynamic semantics; rather they are used to constraint the set of potential code motion in an optimizer (more on this later). The primitive jumps and conditional branch forms are represented by the constructors JMP, BCC. JMP : ctrl * rexp * controlflow > stm BCC : ctrl * ccexp * Label.label > stmIn addition to JMP and BCC, there is a structured if/then/else statement. IF : ctrl * ccexp * stm * stm > stmSemantically, IF($$c,x,y,z) is identical to BCC($$c, $$x, L1) $$z JMP([], L2) DEFINE L1 $$y DEFINE L2where L1 and L2 are new labels, as expected. Here's an example of how control dependence predicates are used. Consider the following MLTree statement: IF([p], CMP(32, NE, REG(32, a), LI 0), MV(32, b, PRED(LOAD(32, m, ...)), p), MV(32, b, LOAD(32, n, ...)))In the first alternative of the IF, the LOAD expression is constrainted by the control dependence predicate p defined in the IF, using the predicate constructor PRED. These states that the load is control dependent on the test of the branch, and thus it may not be legally hoisted above the branch without potentially violating the semantics of the program. For example, semantics violation may happen if the value of m and a is corrolated, and whenever a = 0, the address in m is not a legal address. Note that on architectures with speculative loads, the control dependence information can be used to guide the transformation of control dependent loads into speculative loads. Now in constrast, the LOAD in the second alternative is not control dependent on the control dependent predicate p, and thus it is safe and legal to hoist the load above the test, as in MV(32, b, LOAD(32, n, ...)); IF([p], CMP(32, NE, REG(32, a), LI 0), MV(32, b, PRED(LOAD(32, m, ...)), p), SEQ [] )Of course, such transformation is only performed if the optimizer phases think that it can benefit performance. Thus the control dependence information does not directly specify any transformations, but it is rather used to indicate when aggressive code motions are legal and safe. Calls and ReturnsCalls and returns in MLTree are specified using the constructors CALL and RET, which have the following types.CALL : rexp * controlflow * mlrisc * mlrisc * ctrl * ctrl * Region.region > stm RET : ctrl * controlflow > stmThe CALL form is particularly complex, and require some explanation. Basically the seven parameters are, in order:
The matching return statement constructor RET has two arguments. These are:
StoresStores to integer and floating points are specified using the constructors STORE and FSTORE.STORE : ty * rexp * rexp * Region.region > stm FSTORE : fty * rexp * fexp * Region.region > stmThe general form is STORE($$width, $$address, $$data, $$region)Stores for condition codes are not provided. Miscelleneous StatementsOther useful statement forms of MLTree are for sequencing (SEQ), defining a local label (DEFINE).SEQ : stm list > stm DEFINE : Label.label > stmThe constructor DEFINE L has the same meaning as executing the method defineLabel L in the stream interface. AnnotationsAnnotations are used as the generic mechanism for exchanging information between different phases of the MLRISC system, and between a compiler front end and the MLRISC back end. The following constructors can be used to annotate a MLTree term with an annotation:MARK : rexp * Annotations.annotation > rexp FMARK : fexp * Annotations.annotation > fexp CCMARK : ccexp * Annotations.annotation > ccexp ANNOTATION : stm * Annotations.annotation > stm
