Title: SYSTOLIC COMBINING SWITCH DESIGNS
3:00 p.m., Tuesday, March 15, 1994
719 Broadway, 12th floor conference room
High-performance VLSI switches are needed in the interconnection network
of massively parallel shared memory multiprocessors. The switch designs
we consider alleviate the ``hot spot'' problem by adding extra logic
to the switches to combine conventional loads and stores as well as
$\phi$ operations destined for the same memory location.
The performance of three buffered switch architectures was investigated
through probabilistic analysis and simulation: Type A switches,
with k queues, one at each output, each accepting k inputs
per cycle; and two one-input queue designs, Type B switches, with
output queues, and Type C switches, with k input queues. While the Type
C switch is less expensive, Type A and B have considerably better performance.
An efficient CMOS implementation for systolic queue designs was devised.
A non-combining switch containing these systolic queues was fabricated
through MOSIS in 3 micron CMOS and employed the NORA clocking methodology,
using qualified clocks for distributing global control.
A combining switch was fabricated in 2 micron CMOS for use in the 16 by 16 processor/memory interconnection network of the NYU Ultracomputer prototype. Details are given about the internal logic of the two component types used in the network. A design usable in networks of size up to 256 * 256 has been prepared for fabrication by NCR at a smaller feature size in a higher pincount package. Differences in the logic partitioning of the two designs are described. We describe the performance of these designs for systems of up to 1024 PEs obtained through simulation. Our experience in implementing a combining switch indicates that the cost of hardware combining is much less than is widely believed. We compare the cost of a combining switch to that of a non-combining switch and discuss the scalability of the implemented design to large numbers of processors. Differences in the capabilities of combining switch architectures are studied. We describe the implementation of ``two-and-a-half-way'' combining, which promises to avoid network saturation in large networks at only slightly greater cost than two-way combining. We also discuss implementation alternatives and performance for a 4 by 4 combining switch.