###
V22.0436 - Prof. Grishman

###
Lecture 9 - Carry look-ahead

(text, Appendix C.6)
#### Carry look-ahead

- simplest adder is "ripple carry": slow (delay time
linear
in
size of operands)
- add time is usually critical in determining overall cycle time of
a
machine

We can speed up addition by introducing notion of "carry generate" and
"carry propagate":
gi = ai * bi

pi = ai + bi

- this can be used to compute carries into each bit position:

c1 = g0 + (p0 * c0)

c2 = g1 + (p1 * g0) + (p1 * p0 * c0)

c3 = g2 + (p2 * g1) + (p2 * p1 * g0) + (p2 * p1 * p0 * c0)

c4 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0) + (p3 *
p2 * p1 * p0 * c0)

- and then we compute each sum bit independently using the carry:

Sumi = (ai ex-or bi) ex-or ci

We get greater savings when we build a 16-bit adder, and compute group
generate and propagate values for each 4-bit group. Note that group
values
are designated by capital letters.
P0 = p3 * p2 * p1 * p0

P1 = p7 * p6 * p5 * p4

P2 = p11 * p10 * p9 * p8

P3 = p15 * p14 * p13 * p12

G0 = g3 + (p3 * g2) + (p3 * p2 * g1) + (p3 * p2 * p1 * g0)

G1 = g7 + (p7 * g6) + (p7 * p6 * g5) + (p7 * p6 * p5 * g4)

G2 = g11 + (p11 * g10) + (p11 * p10 * g9) + (p11 * p10 * p9 * g8)

G3 = g15 + (p15 * g14) + (p15 * p14 * g13) + (p15 * p14 * p13 * g12)

- we can use these to compute the carry into each group:

into bit 4 (C1 = c4), into bit 8 (C2 = c8), and into bit 12 (C3
= c12)

C1 = G0 + (P0 * c0)

C2 = G1 + (P1 * G0) + (P1 * P0 * c0)

C3 = G2 + (P2 * G1) + (P2 * P1 * G0) + (P2 * P1 * P0 * c0)

as well as the carry from the entire 16-bit addition, C4

- once we have C1, carries c5, c6, and c7 can be computed from c4
(=C1);

similarly, carries c9, c10, and c11 can be computed from c8 (=C2),
and

carries c13, c14, and c15 can be computed from c12 (=C3)

- how long does this all take?
- 1 gate delay to compute gi, pi
- 2 gate delays to compute Gi (only 1 for Pi)
- 2 gate delays for Ci
- 2 gate delays for ci
- 1 gate delay (exclusive-or) for Sumi

all together, 8 gate delays to add 16 bits!
- for a 64-bit adder, we would compute generate and propagate on
16-bit super-groups,
adding 4 more gate delays

in general, delay time with carry look-ahead is *logarithmic*
in the size of the operands