DOCTORAL DISSERTATION DEFENSE

On Compiling Regular Loops for Efficient Parallel Execution

Candidate: Pei Ouyang

Advisor: Zvi Kedem and Krishna Palem

12:00 p.m., Wednesday, December 18, 1991

room 101, Warren Weaver Hall

On Compiling Regular Loops for Efficient Parallel Execution

Candidate: Pei Ouyang

Advisor: Zvi Kedem and Krishna Palem

12:00 p.m., Wednesday, December 18, 1991

room 101, Warren Weaver Hall

**Abstract**

In this thesis, we study the problem of *mapping* regular loops
onto multiprocessors. We develop mapping schemes that yield very
efficient executions of regular loops on *shared* and *distributed* memory architectures. We also develop novel analysis
techniques, using which we argue about the efficiency of these
resulting executions. The quality of the execution of these regular
loops in the distributed memory setting, relies heavily on
implementing *cyclic shifts* efficiently. Effectively, cyclic
shifts are used to communicate results between individual processors,
to which different interdependent iterations are assigned. Therefore,
in order to achieve efficient executions of regular loops on
distributed memory architectures, we also develop and analyze
algorithms for solving the cyclic shift problem.
In order to analyze the executions of regular loops that result from
any specific mapping, we need to characterize the important parameters
that determine its efficiency. We formally characterize a basic set
of such parameters. These parameters facilitate the analysis of the
*memory* and the *processor* requirements of a given
execution, as well as its *running time*. Using these parameters,
we analyze a *greedy* execution scheme, in the shared memory
model. For example, we can determine the limit on the number of
processors beyond which no speedup can be attained by the greedy
method, for regular loops. The greedy scheme is of interest because
it exploits the maximal possible parallelism in a natural way.

We then address the mapping scheme of regular loops onto distributed memory machines. Unfortunately, we show that the problem of finding an optimal mapping is computationally intractable in this case. In order to provide schemes that can be actually applied to regular loops at compile-time, we relax the requirement that the resulting executions be optimum. Instead, we design a heuristic mapping algorithm and validate it through experiments. This heuristic mapping scheme relies heavily on the use of efficient algorithms for realizing cyclic shifts. Therefore, we also study the problem of realizing cyclic shifts on hypercube architectures.