Basic Compiler Graphs
IntroductionIn this section we describe the set of core compiler specific graphs and algorithms implemented in MLRISC. Mostly of these algorithms are parameterized with respect to the actual intermediate representation, and as such they do not provide many facilities that are provided by higher abstraction layers, such as in MLRISC IR, or in SSA.Dominator/Postdominator TreesDominance is a fundamental concept in compiler optimizations. Node $$A $$dominates $$B iff all paths from the start node to $$B intersects A. A dual notion is the concept of $$postdominance: $$A postdominates $$B iff all paths from $$B to the stop node intersects $$A. A (post)dominator tree can be used to summarize the dominance/postdominance relationship.
functor DominatorTree (GraphImpl : GRAPH_IMPLEMENTATION) : DOMINATOR_TREEThe functor implements dominator analysis and creates a dominator/postdominator tree from a graph $$G. A dominator tree is implemented as a graph with the following definition: signature DOMINATOR_TREE = sig exception Dominator datatype 'n dom_node = DOM of { node : 'n, level : int, preorder : int, postorder : int } type ('n,'e,'g) dom_info type ('n,'e,'g) dominator_tree = ('n dom_node,unit,('n,'e,'g) dom_info) graph type ('n,'e,'g) postdominator_tree = ('n dom_node,unit,('n,'e,'g) dom_info) graphWe annotated each node in a dominator tree with three extra fields of information, which is useful for other algorithms:
To create a dominator tree and a postdominator tree from a graph, the following function should be called. val dominator_trees : ('n,'e,'g) graph > ('n,'e,'g) dominator_tree * ('n,'e,'g) postdominator_treeWe use the algorithm of Tarjan and Lengauer, which runs in time $$O(V+Eα(V+E)) where $$α is the functional inverse of the Ackermann function. To perform many common queries on a dominator tree, we first call the function methods to obtain a method object. val methods : ('n,'e,'g) dominator_tree > dominator_methodsThe methods are packed into the following type: type dominator_methods = { dominates : node_id * node_id > bool, immediately_dominates : node_id * node_id > bool, strictly_dominates : node_id * node_id > bool, postdominates : node_id * node_id > bool, immediately_postdominates : node_id * node_id > bool, strictly_postdominates : node_id * node_id > bool, control_equivalent : node_id * node_id > bool, idom : node_id > node_id, $(* ~1 if none *)$ idoms : node_id > node_id list, doms : node_id > node_id list, ipdom : node_id > node_id, $(* ~1 if none *)$ ipdoms : node_id > node_id list, pdoms : node_id > node_id list, dom_lca : node_id * node_id > node_id, pdom_lca : node_id * node_id > node_id, dom_level : node_id > int, pdom_level : node_id > int, control_equivalent_partitions : unit > node_id list list }The query methods are as follows:
Control Dependence GraphGiven two nodes $$A and $$B in a control flow graph $$G, we say that $$B is control dependent on $$A iff
To build a control dependence graph, the functor ControlDependenceGraph can be used: signature CONTROL_DEPENDENCE_GRAPH = sig type ('n,'e,'g) cdg = ('n,'e,'g) graph val control_dependence_graph : ('e > bool) > ('n,'e,'g) dominator_tree * ('n,'e,'g) postdominator_tree > ('n,'e,'g) cdg end functor ControlDependenceGraph (structure Dom : DOMINATOR_TREE structure GraphImpl : GRAPH_IMPLEMENTATION ) : CONTROL_DEPENDENCE_GRAPHThe control depedence graph is a subcomponent of the program dependence graph commonly used in modern compiler optimizations. Dominance FrontiersMany algorithms involving the notion of control dependence or dominance can be rephrased in terms of dominance frontiers. A node $$A is in the dominance frontiers of $$B iff $$B dominates a predecessor of $$A but $$B does not strictlydominate $$A. We denote this as $$A in DF(B). The dual notion of postdominance frontiers can be defined analogously using the postdominator tree\footnote{Control dependence can be defined in terms of postdominance frontiers.}.
functor DominanceFrontiers(Dom : DOMINATOR_TREE) : DOMINANCE_FRONTIERSThe functor DominanceFrontiers can be used to compute all the dominance frontiers of all the nodes in a graph. It has the following signature.
signature DOMINANCE_FRONTIERS = sig structure Dom : DOMINATOR_TREE type dominance_frontiers = node_id list array val DFs : ('n,'e,'g) Dom.dominator_tree > dominance_frontiers end Iterated Dominance FrontiersIterated dominance frontiers (denoted as $$DF^{+}) are defined as the least fixed point of iterating the operation $$DF. Formally, define the dominance frontiers on a set $$S as follows:
functor DJGraph(Dom : DOMINATOR_TREE) : DJ_GRAPHThe functor DJGraph implements this algorithm. It satisfies the signature below: signature DJ_GRAPH = sig structure Dom : DOMINATOR_TREE type ('n,'e,'g) dj_graph = ('n,'e,'g) Dom.dominator_tree val dj_graph : ('n,'e,'g) dj_graph > { DF : node_id > node_id list, IDF : node_id > node_id list, IDFs : node_id list > node_id list } endThe function dj_graph takes a dominator tree and returns three query methods for computing dominance and iterated dominance frontiers. Method DF computes $$DF(v) for a single node $$v. Method IDF computes the $$DF^{+}(v), and method IDFs computes $$DF^{+}(S) when given a set of node ids. The dominator tree must not be updated while these operations are being performed. Sreedhar's original algorithm is phrased in terms of the DJgraph, which is a fusion of the dominator tree with its underlying flowgraph. Our variant operates on the dominator tree and the flowgraph at the same time, without building an intermediate data structure. Iterated dominance frontiers are used in many algorithms that deal with the notion of dominance. For example, our SSA construction algorithm uses iterated dominance frontiers to identify confluent points in the program where $$phifunctions are to be placed. Loop Nesting TreeA natural loop $$L in a graph is a maximal strongly connected component such that all nodes in $$L are dominated by a single node $$h, called the loop header. Loops tend to form good optimization candidates and consequently loop detection is an essential task in a compiler. The functorfunctor structure.sml" target=code>LoopStructure (structure GraphImpl : GRAPH_IMPLEMENTATION structure Dom : DOMINATOR_TREE ) : LOOP_STRUCTURErecognizes all natural loops in a graph and built a loop nesting tree that describes the loop nesting relationship between graphs.
signature structure.sig" target=code>LOOP_STRUCTURE = sig structure Dom : DOMINATOR_TREE datatype ('n,'e,'g) loop = LOOP of { nesting : int, header : node_id, loop_nodes : node_id list, backedges : 'e edge list, exits : 'e edge list } type ('n,'e,'g) loop_info type ('n,'e,'g) loop_structure = (('n,'e,'g) loop,unit, ('n,'e,'g) loop_info) graph val loop_structure : ('n,'e,'g) Dom.dominator_tree > ('n,'e,'g) loop_structure val nesting_level : ('n,'e,'g) loop_structure > node_id array val header : ('n,'e,'g) loop_structure > node_id array endOur algorithm computes the loop nesting tree in time $$O((V+E)α(V+E)). Each node in this tree represents a loop in the flowgraph, except the root of the tree, which represents the entire graph. Given a flowgraph $$G, the root of the loop nesting tree is defined to be the sole vertex in #entry $$G. Other nodes in the tree are indexed by the loop header node ids. Loop detection classifies each loop and for each loop $$L, the following information is obtained:
Static Single AssignmentAn SSA construction algorithm based on[SSA,BriggsSSA,lineartimeIDF] is implemented in the following functor:functor StaticSingleAssignmentForm (Dom : DOMINATOR_TREE) : STATIC_SINGLE_ASSIGNMENT_FORMSSAbased optimizations in MLRISC are actually implemented on top of a highlevel SSA layer described in Section SSA Optimizations. So it is not necessary to use this module directly. Nevertheless, there can be situations in which this module can be specialized in other ways; for example, in the construction of sparse evaluation graphs.
signature STATIC_SINGLE_ASSIGNMENT_FORM = sig structure Dom : DOMINATOR_TREE type var = int type phi = var * var * var list $(* orig def/def/uses *)$ type renamer = {defs : var list, uses: var list} > {defs : var list, uses: var list} type copy = {dst : var list, src: var list} > unit val compute_ssa : ('n,'e,'g) Dom.dominator_tree > { max_var : var, defs : 'n node > var list, is_live : var * int > bool, rename_var : var > var, rename_stmt : {rename:renamer,copy:copy} > 'n node > unit, insert_phi : {block : 'n node, in_edges : 'e edge list, phis : phi list } > unit } > unit endThis module defines the function compute_ssa, which constructs an SSA graph. It requires the following information from the client:
IDEFS/IUSE setsReif and Tarjan define the following useful notions for computing approximate birthpoints for expressions, which in turn can be used to drive other optimizations. Given a node $$X, let $$idom(X) denote the immediate dominator of $$X. Let $$def(X) ($$use(X)) denote all the definitions (uses) in $$X. Given a path $$p == v_{1}... v_{n}, define $$def(p) ($$use(p)) as
signature IDEFS = sig type var = int val compute_idefs : {def_use : 'n Graph.node > var list * var list, cfg : ('n,'e,'g) Graph.graph } > { idefuse : unit > (RegSet.regset * RegSet.regset) Array.array, ipostdefuse : unit > (RegSet.regset * RegSet.regset) Array.array } end structure IDefs : IDEFSStructure IDefs implements the function comput_idefs for computing the $$idef, $$iuse, $$ipostdef and $$ipostuse sets of a control flow graph. It takes as arguments a flowgraph and a function def_use, which takes a graph node and returns the def/use sets of the node. It returns two functions idefuse and ipostdefuse which compute the $$idef/iuse and $$ipostdef/ipostuse sets. These sets are returned as arrays indexed by node ids.
