LECTURE 1
Chee Yap
NUMERICAL ALGEBRAIC COMPUTATION (I)

NUMERICAL ALGEBRAIC COMPUTATION (I)


[Previous] [Next]            [Top] [Bot]

Introduction


[Previous] [Next]            [Top] [Bot]

Overview


[Previous] [Next]            [Top] [Bot]

Preliminary

· Initial algebraic structures: Z Í Q Í R Í C

· Main object of study: polynomials,


A(X) = m
å
i=0 
ai Xi,        (am ¹ 0,    ai Î R0 Í C)
(1)

· What kind of objects are polynomials? Dual nature!

  1. algebraic objects (members of polynomial ring R0[X])
  2. analytic objects (functions A: C®C assuming R0 Í C)


[Previous] [Next]            [Top] [Bot]

Main Problem


[Previous] [Next]            [Top] [Bot]

Some Generalizations

 
· A(X,Y)=0, curves
· A(X,Y,Z) = B(X,Y,Z)=0, surfaces
· Multivariate systems of equations
· Ideal theoretic setting: ideal membership problem
· Tarski's theory of elementary geometry and algebra


[Previous] [Next]            [Top] [Bot]

Two Distinct Approaches


[Previous] [Next]            [Top] [Bot]

Algebraic Numbers


[Previous] [Next]            [Top] [Bot]
Sylvester Resultants

· Are a+b and ab algebraic?

· Two standard approaches: symmetric functions and resultants. We prefere the latter as more constructive.

· The resultant of A(X) and B(X) is the determinant of the (m+n)-square matrix:


S(A,B) :=
é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
am
am-1
¼
a0
am
am-1
¼
a0
···
···
am
am-1
¼
a0
bn
bn-1
¼
b1
b0
bn
bn-1
¼
b1
b0
···
···
bn
bn-1
¼
b0
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
(2)


[Previous] [Next]            [Top] [Bot]

Properties

· Elimination: let A,B be multivariate polynomials, treated as polynomials in X. Then in resX(A,B), we have ``eliminated'' the variable X from A, B.

· This justifies the name ``resultant'':

res(A,B)=0    Û    deg( gcd
(A,B)) > 0.

· PROOF:
deg( gcd
(A,B)) > 0    Û    GA + HB = 0
where degG < n and degH < m. [Just choose G = B/gcd(A,B), H = A/gcd(A,B).]

But, GA + HB = 0 is equivalent to:
[ gn-1, gn-2 ,¼, g0, hm-1 ,¼, h0] · S(A,B) · é
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ê
ë
Xm+n-1
Xm+n-2
:
Xm+1
Xm
Xm-1
:
X
1
ù
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
û
= 0
But the last equation is equivalent to [[`g],[`h] ] ·S(A,B)=0, or det(S(A,B))=0. Q.E.D.


[Previous] [Next]            [Top] [Bot]

Poisson's Formula for Resultants

· Let
A=a m
Õ
i=1 
(X-ai),       B=b m
Õ
j=1 
(X-bj).

· LEMMA: For a Î C,
(a) res(a, B) = an.
(b) res(A, B) = (-1)mnres(B, A).
(c) res((X-a)·A,B) = B(a)res(A,B).

· THEOREM:
res(A,B)
=
an m
Õ
i=1 
B(ai)
=
anbm m
Õ
i=1 
n
Õ
j=1 
(ai-bj)
PROOF: To get the first equality, apply the lemma(c) m times, and lemma(a) once to res(aÕm(X-ai),B). To get the second equality, note that B(ai) = bÕj(ai-bj).


[Previous] [Next]            [Top] [Bot]

Algebraic Numbers form a Field

· THEOREM:
(i) 1/a is the root of Xm A(1/X) provided a ¹ 0.
(ii) b±a is a root of C(X)=resY (A(Y),B(XY)).
(iii) ab is a root of C(X)=resY (A(Y),YnB([ X/Y])).

· PROOF: (i) is immediate. We prove (ii) as (iii) is similar:


resY(A(Y),B(XY))
=
an m
Õ
i=1 
B(Xai)
=
anbm m
Õ
i=1 
n
Õ
j=1 
(Xai-bj).


[Previous] [Next]            [Top] [Bot]
Degree and Length Bounds

· The above proof yields important information about the degrees and heights when we add or multiply algebraic numbers.

· Let ai have degree £ di and length at most li (i=1,2).

· Then degree and length of a1a2 are at most d1d2 and


l = l1d2 l2d1.

· The degree and length of a1±a2 are at most d1d2 and
l = l1d2 l2d1 2d1d2+min{d1,d2}.


[Previous] [Next]            [Top] [Bot]

Representing Algebraic Numbers

· How to represent polynomials? Sequence of coefficients.

· Let A(X) be defining polynomial for a.

· ``Symbolic form'': a ~ (A(X), i),        1 £ i £ m

· Isolating Interval: a ~ (A(X), a, b) where [a,b] is isolating interval for a.

· Thom Representation: Let the sign signature of r Î R relative to A(X) is the sign sequence of


A(r) ,A¢(r) ,¼, A(i)(r) ,¼, A(m)(r).

Thom's lemma: the set of numbers r that has a given sign sequence is either an isolated point or an open interval. Isolated points are just the roots of A(X). So
a ~ (A(X), signature(a))

· Constructive Representation:
- relatively new, but important for us (more below)
- No explicit defining polynomial
- basically DAGs with integers at leaves and operators in the interior nodes.


[Previous] [Next]            [Top] [Bot]

Localization of Zeros

· D(z,r) = complex open disk centered at z Î C, radius r > 0. Write D(r) if z=\0.

· Let A(X) have k distinct roots. Define
0 < r1 £ r2 £ ¼ £ rm
where ri is minimum radius such that D(ri) has min{i, k} distinct roots of A(X).

· Some root bound estimations:
- Give upper bound on rm
- Give upper bound on r1
- Give lower bound on r1
- Give a lower bound on mini=1k-1 |ri+1-ri|. (Root Separation)
- Give upper bounds in rm/r1 (isolation ratio).


[Previous] [Next]            [Top] [Bot]
Some Bounds

· (Cauchy)
r1 ³  |a0|

|a0|+ max
{|a1| ,¼, |an|}
,      rn £ 1 +
max
{|a0| ,¼, |an-1|}

|an|
.
(3)

· (Mahler) For any k+1 of roots of A(X), say a0 ,¼, ak    (k=1 ,¼, m), we reorder them so that
|a0| ³ |a1| ³ ¼ ³ |ak|.
Then
k
Õ
i=1 
|ai-1-ai| >
Ö
 

|disc(A)|
 
·M(A)-m+1 ·m-m/2 · æ
è
 Ö3

m
ö
ø
k

 
.
NOTE: A(X) has multiple roots iff disc(A)=0. Here M(A)=Õi min{1,|ai} is Mahler's measure. We know that M(A) £ || A ||2.

When k=1, we get a root separation bound: if A is square-free and integer,
sep(A) ³ ||A||-m m-(m+2)/2.
(4)


[Previous] [Next]            [Top] [Bot]
More Bounds

· Let l1 ,¼, ln are positive numbers such that
m
å
i=1 
li £ 1.

· LEMMA (Henrici) We have r1 ³ b where
b :=

min
i=0 ,¼, n 
ì
í
î
æ
è
ê
ê
 a0

ai
ê
ê
li ö
ø
1/i

 
ü
ý
þ
(5)
and the ith term is ¥ whenever ai=0.

· PROOF: Assume |z| < b. It suffices to prove that |p(z)| > 0:
|p(z)|
=
ê
ê
n
å
i=0 
ai zi ê
ê
>
|a0| æ
è
1 - n
å
i=1 
ê
ê
 ai

a0
ê
ê
bi ö
ø
³
|a0| æ
è
1 - n
å
i=1 
li ö
ø
      (since li ³ bi |ai/a0| )
³
0.


[Previous] [Next]            [Top] [Bot]
Applications

· As application, choose li = 1/2i for i=1 ,¼, m. Then
r1 ³  1

2
m
min
i=1 
ì
í
î
ê
ê
 a0

ai
ê
ê
1/i

 
ü
ý
þ
(6)

· Another application: if b1 is the unique positive solution of the equation
|a0| = m
å
i=1 
|ai| Xi
then r1 ³ b1.

· PROOF: choose
li :=
ê
ê
 am

a0
ê
ê
b1i,       (i=1 ,¼, m)
By definition of b1, we have l1+¼+lm=1. Also,
b1 = li 1/i ê
ê
 a0

ai
ê
ê
for all ai ¹ 0. Thus b1 is the same as b defined in the above lemma. Our claim follows from the said lemma.


[Previous] [Next]            [Top] [Bot]
Related Bounds

· Complementary bounds: upper bounds on rm and lower bounds on r1 are basically interchangeable.

· This is because a ¹ 0 is zero of A(X) iff 1/a is zero of B(X) = Xm A(1/Xm).

· From (6), we immediately get:
rm £ 2 m-1
max
i=0 
ê
ê
 am-i

am
ê
ê
1/i

 

· Similarly, rm is upper bounded by the unique positive root of the equation
|am| Xm - m-1
å
i=0 
|ai|Xi.
This is another bound of Cauchy's.

· Above we are consider discs D(r) centered at the origin. We can consider D(z,r) just by shifting the polynomial, A(X-z).


[Previous] [Next]            [Top] [Bot]
Cauchy Index

· For more precise location of roots, we need Sturm's theory.

· Write A(X) = amÕik(X-ai)mi (1 £ k £ m)

· Key is the behaviour of the logarithmic derivative of A,
f(X) =  A¢(X)

A(X)
= m
å
i=1 
 mi

X-ai

· The Cauchy index of a real function f(x) at x=r:
If(r) = +1 if r is a pole of f and f(x) switches from -¥ to +¥ as x increases through r.
If(r) = -1 if sign changes from +¥ to -¥.
Else, If(r) = 0

· For a £ b, let If[a,b] be the sum of If(r) over all a £ r £ b. E.g., when f(X)=A¢(X)/A(X) , If[a,b] is the number of distinct real roots of A(X) in [a,b].


[Previous] [Next]            [Top] [Bot]
Sturm Sequence

· Given a sequence (x1 ,¼, xn) of real numbers, define its sign variation Var(x1 ,¼, xn) to be the number of sign changes in the signs of the xi, ignoring zero signs.

· E.g.,
Var(1,0,-2, 1.3, 3.2, 0, 0, -.5) = 3.

· The Sturm sequence for a pair of real functions A0(X), A1(X) over an interval [a,b] is [`A]=(A0, A1 ,¼, Ak)

of real functions such that
(i) A0(a)A0(b) ¹ 0,
(ii) Ak(x) is non-zero for x Î [a,b],
(iii) Let i=0 ,¼, k-1 and a £ x £ b. If Ai(x)=0, then either i=0 and A1(x) ¹ 0, or Ai-1(x)Ai+1(x) < 0.

· For any real a, let V[`A](a) be the sign variation of [`A](a)=(A0(a), A1(a) ,¼, Ak(a)). Omit [`A] if understood.

· THEOREM (Sturm)

If [`A] is a Sturm sequence for A0, A1 over the interval [a,b], then If[a,b] = V(a)-V(b) where f=A0/A1.


[Previous] [Next]            [Top] [Bot]
Connection to PRS

· How to construct Sturm sequences efficiently? Connection to polynomial remainder sequences (PRS), a very well-studied topic.

· Focus on computing in a D[X] where D is a unique factorization domain (UFD). Typically D=Z or D=Z[X1 ,¼, Xn-1]). There is a big complexity difference between D and its quotient field.

· Let A,B Î D[X]. Write A ~ B (similarity) if aA=bB for some a,b Î D.

· The content of A is the the GCD of its coefficients. Call A primitive if its content is 1. Gauss' lemma: A,B primitive implies AB primitive.

· Pseudo-remainder of A(X), B(X) Î D[X]: prem(A,B) = rem(lead(B)d A, B) where d=deg(A)-deg(B)+1.

· A sequence (A0, A1 ,¼, Ah) is called a PRS (over D) if each Ai+1 ~ prem(Ai-1, Ai) for i=2 ,¼, h, and prem(Ah-1, Ah)=0.

· E.g., Pseudo-Euclidean PRS: just let Ai+1 = prem(Ai-1, Ai). Problem: exponential growth (easily).

· E.g., Primitive PRS: let Ai+1 be primitive part at each step. Problem: multiple GCD at every step.


[Previous] [Next]            [Top] [Bot]
Subresultant PRS

· Solution of Collins (1967), with subsequent simplifications by Brown:

· Maintain recursive an element bi Î D and let Ai+1 = prem(Ai-1, Ai)/bi.

· For i=0 ,¼, k-1, we do the following: Assume that Ai+1 has just been computed. Then
di:=deg(Ai)-deg(Ai+1),      ai:=lead(Ai).

Then let
bi+1: = ì
í
î
(-1)d0+1
if
i=0,
(-1)di+1(yi)di ai
if
i=1 ,¼, k-2,
where (y0 ,¼, yk-1) is an auxiliary sequence given by
yi+1 : = yi(  ai+1

yi
)di =  (ai+1)di

(yi)di-1
,
for i=0 ,¼, k-2. Set y0 = 1.


[Previous] [Next]            [Top] [Bot]
Correctness and Complexity

· Why is this correct?

· Why is this polynomial time?

· Theory of subresultant sequences will explain all these. See book.

· Complexity. It is easy to see that the algorithm takes O(n2logn) operations of D.

· Improvement: Schwartz [10] applied the Half-GCD idea to get an O(nlog2 n) bound, provided we only compute the sequence of partial quotients and coefficients of similarities
(Q1,a1,b1) ,¼, (Qk-1,ak-1,bk-1)
where aiAi+1=bi Ai-1+AiQi.


[Previous] [Next]            [Top] [Bot]
The Half-GCD Technique

· For simplicity, we assume the Euclidean case (so D is actually a field).

Initial idea: if
Ai+1 = Ai-1 - Qi Ai,        (i=1 ,¼, h-1).
we should focus on the quotient sequence (Q1 ,¼, Qh) instead of the remainders.

JUSTIFICATIONS: (1) Can reconstruct the remainder sequence in O(nlog2 n) time.
(2) Storing the quotient sequence uses linear, not quadratic, space.

· Write in matrix notation:
Vi : = é
ê
ë
Ai-1
Ai
ù
ú
û
= é
ê
ë
Qi
1
1
0
ù
ú
û
é
ê
ë
Ai
Ai+1
ù
ú
û
=:Vi+1.
Note that the matrix here has determinant -1. More concisely:
Vi     Qi
®
 
    Vi+1.

· The HGCD Algorithm, on input A,B (degA > degB > 0) will return the quotient Q so that
V = é
ê
ë
A
B
ù
ú
û
    Q
®
 
    V¢ = é
ê
ë
A¢
B¢
ù
ú
û
such that they straddle the mid-point:
deg(A¢) ³ deg(A)/2 > deg(B;).

· SPEEDUP IDEA: Suppose we replace A, B by A¢: = Adiv Xm/2 and B¢: = Bdiv Xm/2. Let us recursively call Q¢: = HGCD(A¢, B¢). What happens if who now apply Q¢ to reduce (A,B)? More precisely, let
é
ê
ë
A
B
ù
ú
û
    Q¢
®
 
    é
ê
ë
A¢¢
B¢¢
ù
ú
û
and we ask what is deg(A) deg(B)? Intuitively, deg(A¢¢) ³ deg(A)[ 3/4] ³ deg(B¢¢). Thus we managed to go down ``a quarter''.

A second recursive call on HGCD(A¢¢, B¢¢) should get us to the half-way point! Problem: what if we get too lucky?

· If HGCD takes time O(nlog2 n), the we can reduce the GCD problem to HGCD so that the overall running time remains O(nlog2 n).

· Brief history of the half-GCD technique: the ideas apply for polynomial GCD and integer GCD. Latter is harder! Knuth (1971), Schönhage, Moenck, Aho-Hopcroft-Ullman, Brent-Gustavson-Yun, Thull-Yap.


[Previous] [Next]            [Top] [Bot]
From PRS to Sturm

· It is easy to see: if you take the negative of the remainders of the polynomial remainders, then you can get a Sturm sequence.

· But, suppose we are given a PRS, where
bi Ai+1 = ai Ai-1 + QiAi ,       i=1 ,¼, h-1.
How can we get the Sturm sequence? E.g. in subresultant PRS, we know bi, ai.

· Let si = -sign(ai bi).

· Our goal is to compute (s0 ,¼, sh) such that (s0 A0 ,¼, sh Ah) is a Sturm sequence.

· We need
(bi si+1)(si+1Ai+1) = (aisi-1)(si-1Ai-1) + Qi Ai .
such that sign(ai si+1bi si-1)=-1 or,


sign(si si+1si-1)=1.

· Multiplying together these equations,
(s0s1s2)(s2s3s4)(s4 s5 s6) ¼(s2j-2s2j-1s2j)=1.

· Telescoping, we obtain the desired formula for s2j:
s2j = j
Õ
i=1 
s2i-1.

Similarly, for the odd indices,
s2j+1= j
Õ
i=1 
s2i.

· Thus the sequence (s1 ,¼, sh) of signs splits into two alternating subsequences whose computation depends on two disjoint subsets of {s1 ,¼, sh-1}.

· PRS computation is amenable to fast parallel computation. It is nice to observe that computing the si's can be computed in O(logn) parallel steps, using the parallel prefix algorithm.


[Previous] [Next]            [Top] [Bot]
Domination

· There is one detail we forgot (:-). The last term Ah in the PRS is not a constant. Recall the definition.

Evaluating a Sturm sequence at a root r of Ah cause all terms to vanish. We call r a degenerate value.

· For instance, if A0, A1 are relatively prime, then Ah=constant, so there is no degenerate values. Otherwise, the standard response is: use the depressed sequence A¢i = Ai/Ah for i=0 ,¼, h.

· Actually, it is known that this is unnecessary. But we want to understand this phenomenon in general.

· DEFINITION: let m(A,r) denote the multiplicity of r as a root of A(X). N.B. m(A,r)=0 if r is not a root of A.

We say A dominates B if at each root a of A, we have
m(A,a) ³ m(B,a).

· Thus, A dominates B in the following situations:
(a) B is a derivative of A.
(b) A and B are relatively prime.
(c) B is square free.
(d) B divides A.


[Previous] [Next]            [Top] [Bot]
A General Sturm Theorem

· Let us fix A, B and form a Sturm sequence of A,B. For any a < b, let
VA,B[a, b] = V(a)-V(b)
be the difference in sign variations at a and at b.

· Different forms of ``Sturm theories'' amount to the question: what does V[a,b] count?

· THEOREM: Assume A dominates B and A(a)A(b) ¹ 0. Then
V[a,b] =
å
a 
sign(A(r)(a)B(s)(a))
(7)
where a ranges over all roots of A in the interval [a,b], and r=m(A,a) and s=m(B,a), and r+s is odd .

· REMARK: this theorem is only proved for polynomials, not real functions A, B.


[Previous] [Next]            [Top] [Bot]
Corollaries

· Let V[a,b] be the difference of sign variation for A, B, as in the last theorem.

· 1. STURM¢s THEOREM:

If B = A¢ (derivative), then V[a,b] counts the number of roots of A in [a,b].

PROOF: In the above (7), we have r+s=odd for every root a of A in [a,b]. Moreover, at each a, we have A(r)(a) = B(s)(a), so that each summand is +1.

N.B. We have thus justified our ``oversight'' of not dividing by Ah.

· 2. THEOREM (Schwartz-Sharir) If A, B are square-free, then
V[a,b] =
å
a 
sign(A¢(a)B(a))
where a ranges over the roots of A(X) in [a,b].

PROOF: Let a be a root of A(X) in [a,b]. If B(a)=0, the it adds nothing to the (7). If B(a) ¹ 0, then r=1 and s=0 in (7). Thus the sum in this theorem is equal to the sum in (7).

· 3. THEOREM (Sylvester, also: Ben-Or, Kozen, Reif)

Let B = A¢C for some polynomial C, and A,C be relatively prime. Then
V[a,b] =
å
a 
sign(C(a))
where a ranges over the roots of A in [a,b].

PROOF: Note that A dominates B. Then V[a,b] is the same sum as in the previous theorem. However, the summands sign(A¢(a)B(a) are now equal to
sign(A¢(a)A¢(a)C(a))=sign(C(a)).

N.B. This result is used in the famous BKR algorithm.

· 4. THEOREM (Cauchy Index)

Let f(X) = A(X)/B(X) where A, B are relatively prime. Then V[a,b] is the negative of the Cauchy Index If[a,b].

PROOF: In the summation of V[a,b], we must have s=0, r=odd. This means
sign(A(r)(a))
=
 sign(A(a+))-sign(A(a-))

2
,
sign(A(r)(a)B(0)(a))
=
 sign(A(a+)B(a+)) -sign(A(a-)B(a-))

2
=
 sign(f(a+))-sign(f(a-))

2
.
Summing the last equation over all a, the left-hand side equals V[a,b], and the right-hand side equals If[a,b].


[Previous] [Next]            [Top] [Bot]
Sign Determination of Algebraic Numbers

· Problem A:

How do we compare two algebraic numbers a: b?

· Assume a ~ (A,a,b) and b ~ (B, a,b) are two numbers represented as isolating interval. It is assumed that A, B are square-free.

N.B. It is wlog to assume that they have the interval.

· Consider the function B(X) in the interval [a,b]. We have:
a ³ b       Û       B(a)·B¢(b) ³ 0.
Note that equality is attained on both sides at the same time.

· But the sign of B¢(b) is that of
B(b)-B(a)
and the sign of B(a) is given by the theorem of Sylvester (and Ben-Or, Kozen, Reif):
VA,A¢B[a,b].

· Problem B:

Suppose a ~ (A, a,b) as above. We want to compute in the field Q(a). Suppose b = B(a) where B is any polynomial. What is the sign of b?

· By the theorem of Schwartz-Sharir, we first compute
VA,B[a,b] = sign(A¢(a) B(a).)
But sign(A¢(a) = sign(A(b)-A(a)). Hence sign(B(a)) can be determined.


[Previous] [Next]            [Top] [Bot]
Theorem of Routh-Hurwitz

· PROBLEM: Given F(Z) Î C[Z], we want to count the number of its complex roots that lie in the upper half of the complex plane.

· Let F(X)=F0(X)+ iF1(X) where Fi(X) Î R[X].
- We may assume F0 F1 ¹ 0. Otherwise F or iF is a real polynomial, and this is easy.
- a is a real root of F(X) iff it is a root of gcd(F0,F1).
- We may assume F0, F1 are relatively prime (so no real roots).
- We may assume deg(F0) ³ deg(F1) (otherwise, replace F by iF).

· THEOREM (Routh-Hurwitz): The number of roots of F(Z) in the upper half plane is given by
 1

2
(deg(F) - VF0,F1[-¥,+¥] ).


[Previous] [Next]            [Top] [Bot]
Complex Root Isolation Algorithm

· We show how to count roots in any box (i.e., axes-aligned rectangle). This is basically Pinkert approach (1970).

Name the 4 quadrants of the complex plane (I), (II), (III), (IV).

· By simple transformations of F(Z) we can also solve the followng problems:
(1)- Counting roots in the lower half-plane: use use the above procedure on G(Z) : = F(- Z).
(2)- Counting roots to the right of the imaginary axis: a is to the right of the imaginary axis iff ia is above the real axis. So, use G(Z) : = F(Z/i) = F(-iZ).
(3)- Counting roots in the either (I) or (III): a is in the (I) or (III) iff a2 is in the upper half plane. The polynomial f(Z) : = F(Z)F(-Z) has roots which are precisely squares of the roots of F(Z). Moreover, f(Z) has no odd-degree terms: f(Z)=G(Z2) for some G(Z). Use this G(Z)!
G(Z) : = f(ÖZ) = F(ÖZ)F(-ÖZ)
(8)

(4)- Similarly, we can count roots in either (II) or (IV).


(5)- Counting roots in (I): This is given by
#(I) =  1

2
[#(I+II) + #(I+IV) - #(II+IV) ].

(6)- Counting roots in a translated quadrant t+(I): a is in t+(I) iff a-t is in (I). So let G(Z)=F(Z+t). Then F(a)=0 iff G(a-t)=0. We can count such a-t's by applying (5) to G(Z).
(7)- Counting in any finite box B: Let (I) be translated to each corner of the box, and call them NW, SW, NE, SE. Then we have
#(B) = #(SW) - #(NW) - #(SE) + #(NE).

· Once we can count in a box, we can use any bisection methods (or quadtree subdivision) to converge to every root.


[Previous] [Next]            [Top] [Bot]
Multidimensional Sturm

· Hermite (1852, 1853, 1880) gave the generalization to 2-D. Gave a method for computing them using symmetric functions.

· Pedersen [8] (NYU Thesis, 1991) gave the generalization to all dimensions. He provided sequences for counting zeroes in boxes, simplices and arbitrary shapes.

· Philip Milne [5] (Bath Thesis, 1990) independently gave another formulation based on volume function, which does not fall under Pedersen's class of sequences. Although simple, the computational tool here has not been fully developed.


[Previous] [Next]            [Top] [Bot]
Survey of Zero Finding Algorithms

· Follow Henrici [3] (with updates) by citing list of attributes


[Previous] [Next]            [Top] [Bot]
Variations and Complexity Model

Formulations of the root finding problem:

· Complex Root Location: given P(X) of degree n and a precision b > 1, to determine m discs D1 ,¼, Dm such that the diameter of Di is at most 2-b, and there is a bijection f(zj) from the m roots of P to Di's such that if f(zj)=Di means zj Î Di.

REMARK: the discs can overlap (even coincide).

· Complex Root isolation: Like the above, except that the disks are either disjoint or identical.

· Complex Root Factorization: (Schöhage) Compute m linear factors Li(X) : = ai X + bi (ai, bi Î C) such that |P - L1¼Ln| < 2-b. Here |·| can be any polynomial norm.

· Complexity Model: Algebraic Model where we just count the number of algebraic operations (typically +, -,×, ¸ and possibly radical extraction).

Boolean Model where reduce complexity to counting bit operations. This can be uniform (e.g. Turing machines) or non-uniform (Boolean circuits).

· The complexity of root finding is a function T(n,b) in an appropriate model. Focus of theoretical research: bound T(n,b) tightly. DISCUSSION: another parameter may be appropriate.

· The algebraic model is preferred by most papers (it is simplest). We can generally tranform a bound in the algebraic model to the bit model by multiplying by M(n(n+b)), where M(n)=O(nlogn loglogn) is complexity of multiplication in Boolean model.


[Previous] [Next]            [Top] [Bot]
Weyl's Exclusion Algorithm

· Weyl's (1924) constructive proof of the Fundamental Theorem is considered the first ``algorithm'' for complex root finding [9]. [Also L.E.J. Brouwer (1924)]

· Pan [7] observed that in several reworkigs of Weyl's algorithm by Henrici and Gargantini (1969), Renegar (1987) and Pan (1987), each implied new record bounds for the complexity bound T(n,b).

· Weyl's method is based on the notion of Exclusion Tests relative to a polynomial F(Z): given a region R Í C of a suitable type, the region either passes or fails the test.

· Exclusion property: If R passes the test, then R has no zeros.

· Typical regions are disks or boxes (actually squares). Each region has a center. Assume there is a natural way to cover each region R by a fixed number of smaller regions. E.g., for boxes, use a quadtree subdivision.

· The ``Weyl scheme'' for complex root finding. Stage 0: begin with a region containg all roots. Stage i+1: let Si be all the regions that fails the test from the previous stage. For each R Î Si, apply the test to each subregion that spawned from R. Halt when the regions are sufficiently small.

· Simple exclusion tests: we know (see (3) or (6)) how to compute lower bounds d where d £ r1. Given a box B with center z, we determine d such that the disk D(z,d) contains no zeros. We pass the test iff B Í D(z,d).


[Previous] [Next]            [Top] [Bot]
Improvements: Proximity Tests

· Proximity Tests are exclusion tests with an additional property: for some r > 1, if a region B fails a test then the ``expanded region¢¢rB mustcontain a root.

REMARK: rB denotes the expansion (or contraction) about the center of B. Call r the ratio or relative error of the test.

· Let the roots of F(Z) = åi=0m ai Zi be a1 ,¼, am.

· Following van der Sluis [2], we shift the origin to the center of gravity C = -am-1/(n am) of the roots, and consider G0(Z) = F(Z+C) = åi=0m bi Zi. Then
b   æ
Ö

 2

n
 
    £    
max
i 
|ai - C|     £     1.63 b,            (b =
max
i 
|bi/bm|).
This gives us a proximity test with ratio r=1.63 Ö{n/2}. Using arguments from above, we get the weaker r=2n.

· Recall in (8) that we have a transformation: F(Z) ® G(Z) such that the zeros of G(Z) are the squares of the zeros of F(Z). Let Gk(Z) denote the k-th iteration of this transformation applied to G0.

· Applying the proximity test to Gk(Z), then we obtain ratio r = (1.63 Ö{n/2})1/2k. Choose k=Q(loglogm) to get a ratio r £ 2, say.

· P. Turan (1968, 1975, 1984) gave non-trivial bounds on the roots based on power sums of the roots. To apply these bounds, we compute the power sums of the roots of Gk. This can be solved using Newton's identities, in O(nlogn) steps. This leads to further improvement of Weyl's method.


[Previous] [Next]            [Top] [Bot]
Eigenvalue Approach

· Theoretical versus Practical Methods: this classification is almost co-extensive with classifying algorithms into ``recursive versus iterative methods''. The asymptotically best algorithms do not work in practice.

· Bini and Fiorentino (1999) implemented a very fast method mpsolve based on Aberth's method. It is based on Newton's method.

· Another approach to iterative algorithms is via eigenvalue computation. Fortune (2000) implemented an algorithm eigensolve approach that is very competitive mpsolve.

· Recall: l is eigenvalue of (square) matrix M with associated eigenvector v if Mv = lv. The characteristic polynomial of M is p(x) = det(xI - M). So, l are the zeros of p(x).

· Goal: given p(x), find the associated M. Thus, we reduce root finding to matrix eigenvalues.

· Let D = diag(d1 ,¼, dn) be a diagonal matrix, and R = (rij) be a rank 1 matrix (i.e., R = uT v where u,v are n-vectors).


[Previous] [Next]            [Top] [Bot]
Reduction to Eigensolving

· LEMMA:
det
(D+R) = n
Õ
i=1 
di + n
å
i=1 
rii
Õ
j ¹ i 
dj.

PROOF: by induction on n, and
det
(D+R) = det
(D[1] | (D+R)[2:n]) + det
(R[1] | (D+R)[2:n])
where M[i:j] is the submatrix with columns i to j, and M[i] : = M[i:i].

· COROLLARY:
(-1)n char(D+R) = n
Õ
i=1 
(X-di) + n
å
i=1 
rii
Õ
j ¹ i 
(X-di).

· Let S=(s1 ,¼, sn) be distinct complex numbers, P(X) a monic polynomial of degree n. Also let Q(X) = QS(X) : = Õi=1n(X-si). Define the Lagrange coefficients
li : = ê
ê
 P(si)

Q¢S(si)
ê
ê
,       (i=1 ,¼, n).

· The generalized companion matrix C(P,S) is
C(P,S) : = diag(s1 ,¼, sn) - é
ê
ê
ê
ê
ê
ë
l1
l2
¼
ln
l1
l2
¼
ln
¼
¼
¼
¼
l1
l2
¼
ln
ù
ú
ú
ú
ú
ú
û
.


[Previous] [Next]            [Top] [Bot]
Root Solving via Eigenvalue

· THEOREM (Smith, 1970): The eigenvalues of matrix C(P,S) are precisely the roots of P.

PROOF: This is equivalent to saying that char(C(P,S)) = ±P. By the COROLLARY,
(-1)n char(C(P,S)) = n
Õ
i=1 
(X-di) + n
å
i=1 
li
Õ
j ¹ i 
(X-si).
The second term on the RHS is the Lagrange interpolating polynomial of degree n-1 with values P(si) at si. Thus the LHS agrees with P at each si. Since they are both monic, they must by equal.

· Let C(P) be the usual companion matrix of P. We assume the existence of a floating point eigenvalue subroutine called eig.

· Here is the Fortune's algorithm:
                            Procedure eigensolve(P)
1. eig(fl(C(P))); // fl converts to float.
2. while (! converged(S))
3.         S = eig(fl(C(P,S)));
4. return S;

· It is an open problem to prove unlimited convergence of this algorithm. Fortune has proved conditional convergence.


[Previous] [Next]            [Top] [Bot]
Durand-Kerner Method

· Durand-Kerner Method. Recent implementation by Shankar Krishnan. Works quite well in practice.

· Let F(X) be a monic polynomial of degree n. Let X(k) = (x(k)1, x(k)2 ,¼, x(k)n) be the simultaneous approximation to all the roots in the kth iteration. Then:
x(k+1)i = x(k)i -  F(X)


Õ
j ¹ i 
(x(k)i - x(k)j )

· Local convergence is known. Initial guess X(0) is obtained by n equidistributed points on a circle about the origin. Improved initial guess are used by Krishnan.

· OTHER METHODS: Bairstow's method is applicable to real polynomials when we want to avoid complex arithmetic. We can extract only quadratic or linear terms with real coefficients.


[Previous] [Next]            [Top] [Bot]
Newton Iteration

· Newton iteration is still a remarkably lively subject. Several papers of Brent (ca. 1976) gives the definitive complexity theoretic treatment. Recent work of Smale investigates the average behavior.

· Well-known: Newton is locally quadratically convergent. The Newton basin of a root is the region where Newton's method converges to that root. Friedman (1989) gave a lower bound on the radius of the basin.

· We ask another question: how close must one be to a root before the quadratic convergence behavior of Newton appears? (Call this the ``quadratic Newton basin''. Smale gives implicit conditions for this, but it is desirable to know a priori bounds.

· THEOREM: If A(X) Î Z[X] is square-free then Newton will converge quadratically when the initial point is within a distance
d = m-3m-9 (2+||A||¥)-6m.

PROOF: See book.


[Previous] [Next]            [Top] [Bot]
A New Computational Mode

· Traditional applications of algebraic computation: as seen in Maple or Mathematica.

· We are increasingly seeing a new mode of algebraic computation that does not fit the model. What is it?

· Consider the problem of Euclidean shortest path (ESP) between two points, amidst a collection of polygonal obstacles. Assume the input numbers are integers.

· Can use Djikstra's algorithm which is worst case quadratic. Hershberger and Suri improved this to O(nlogn).

· Regardless, all algorithms must make comparisons of the form l:0 where
l = n
å
i=1 
ni
Ö
 

ai
 
,       (ni Î Z, ai Î N)
· Note that deg(l)=2n, and n may be linear in the number of input corners. Is ESP impractical?

· Yes, because of two main characteristics of the said computational mode:
(1) Constructive: l is a ``constructive algebraic number'', not given as the root of a polynomial.
(2) Incremental: the numerical precision needed should be adaptive.


[Previous] [Next]            [Top] [Bot]
Requirements of the Constructive Algebraic Mode

· (1) ``Traditional'' modes of algebraic computation are monolithic. E.g., solve the Stewart Platform configuration. The new mode is small and incremental (extract one square root at a time). But the ultimate buildup may be non-trivial. E.g., l. A platform like Maple is suitable for the monolithic problems, but unsuitable for the latter.

· Most geometric algorithms that has significant algebraic components will fall under this mode. E.g., Voronoi diagrams of a set of lines in 3-space, computing the arrangement of a set of surfaces.

Other examples are incremental geometric constructions (ruler-and-compass type). E.g, Cinderella system.

· (2) The algebraic computation is intermixed with other non-algebraic computation (e.g., constructing the shortest path graph). Thus, we need a generalprogramming language support, including compilers.

· (3) The precision needed for each step could not be easily predicted in advance. Thus, the new mode of computation has to dynamicallyincrease the precision of numbes.

· (4) Achieving exact comparison is a critical part of the correctness of our computation. This poses the problem ofConstant Sign Determination (see next).

· (5) We need facilities to handle constructive algebraic numbers. There are currently 2 systems that handleconstructive algebraic numbers: LEDA¢s real class,and our Core Library.


[Previous] [Next]            [Top] [Bot]
The Constant Sign Problem

· To support the above mode of computation, we need to solve a class of computational problems.

· Let W be a set of algebraic operators over C.

NOTE: constants and variables are operators too. E.g., W = {+,-, sin(), 1, p, Xi: i Î N}.

· Expr(W) is the set of expressions constructed from the operators in W.

· Assume W has no variables. Each E Î Expr(W) has a unique interpretation, I(E) Î C or I(E) = ­ (undefined).

· The Constant Zero Determination (CZD) for W is: given E Î Expr(W), determine if I(E)=0 (or if undefined).

· If W are real operators, we also have the Constant Sign Determination (CSD) for W is: given E Î Expr(W), determine the sign of I(E)=0 (or if undefined).

· FACT: If CSD(W) is decidable, so is CZD(W).

· Expressions are assumed to be given as a DAG or a straightline program.


[Previous] [Next]            [Top] [Bot]
Algebraic and Beyond

· Consider the following sets of operators:
W0 = {+,-,×, 1}.
W1 = W0È{¸}.
W2 = W1È{kÖ{·}: k ³ 2 }.
W3 = W2È{Rootki(): 1 £ i £ k, k Î N}.
W4 = W0È{p, sin()}.

· THEOREM: The problem CZD(W3) is decidable in exponential time. When viewed as real operators, the CSD(W3) is also decidable in exponential time.

· The theorem is a consequence of constructive root bounds (see Mehlhorn's Lecture in this workshop)

· OPEN PROBLEM: Is CZD(W4) decidable?

· Discussion: Exact Geometric Computation (EGC) is possible for algebraic computation, but unknown for all non-algebraic cases. We could replace sin() by any other elementary function.


[Previous] [Next]            [Top] [Bot]
Incremental Complexity of Algebraic Numbers

· Since root refinement is a critical problem in the constructive mode of computation, we pose two problem in this area.

· The first problem is practical: Brent's work has shown that the asymptotic complexity of algebraic numbers, and computing transcental functions is O(M(n)logn). This work remains to be made practical.

· The second problem is more theoretical: How must does it cost to get one additional bit of information about a?

· More precisely, given that a is the i-th largest real root in [a,b], how hard is it to half the range of uncertainty?

· Three ranges of complexity:
(1) Sturm range. We perform one "Sturm Query''. Should the cost of computing the Sturm sequence be counted? This can be amortized over this range. This stage goes from b-a = rn until root separation bound.
(2) Bisection range. This occurs when [a,b] is an isolation interval. When b-a is smaller than the root separation bound, then we are in this range.
(3) Newton range. When b-a is smaller than the radius of the ``quadratic Newton basis'' we are in this range. This is amortized complexity.

· OPEN QUESTION: How inherent are these observations?


[Previous] [Next]            [Top] [Bot]
Conclusions

· The Constructive Algebraic Mode of computation is represented by two current systems, LEDA's real number type, and in our Core Library. See Project Home.

· Traditional concerns of computer algebra does not address adequately the needs of this mode.

· What are the fundamental questions to be investigated in this setting?

· How can we deal with non-algebraic computations in the face of the open question?

[Go To Top]

References

[1]
H. Cohen. A Course in Computational Algebraic Number Theory. Springer-Verlag, 1993.

[2]
A. V. der Sluis. Upper bounds for roots of polynomials. Numer. Math., 15:250-262, 1970.

[3]
P. Henrici. Applied and Computational Complex Analysis. John Wiley & Sons, New York, 1974.

[4]
A. S. Householder. Numerical Treatment of a Single Non-Linear Equation. McGraw-Hill, New York, 1970.

[5]
P. S. Milne. On the algorithms and implementation of a geometric algebra system. PhD thesis, University of Bath, 1990.

[6]
V. Y. Pan. Sequential and parallel complexity of approximate evaluation of polynomial zeros. Comput. Math. Applic., 14(8):591-622, 1987.

[7]
V. Y. Pan. Solving a polynomial equation: some history and recent progress. SIAM Review, 39(2):187-220, 1997.

[8]
P. Pedersen. Counting Real Zeros. PhD thesis, Courant Institute, New York University, 1991.

[9]
A. Schönhage. Equation solving in terms of computational complexity. Proc. International Congress of Mathematicians, pages 131-153, 1986. Berkeley, California.

[10]
J. T. Schwartz and M. Sharir. On the piano movers' problem: II. General techniques for computing topological properties of real algebraic manifolds. Advances in Appl. Math., 4:298-351, 1983.

[11]
C. K. Yap. Fundamental Problems in Algorithmic Algebra. Oxford University Press, 2000. A version is available at URL ftp:/Preliminary/cs.nyu.edu/pub/local/yap/algebra-bk.




File translated from TEX by TTH, version 3.01.
On 9 Oct 2001, 09:34.