DEPARTMENT OF COMPUTER SCIENCE

DOCTORAL DISSERTATION DEFENSE

Candidate: Ramesh Hariharan

Advisor: Richard Cole

DOCTORAL DISSERTATION DEFENSE

Candidate: Ramesh Hariharan

Advisor: Richard Cole

**Designing Pattern Matching Algorithms by Exploiting
Structural Pattern Properties**

5 p.m., Friday, May 13, 1994

12th fl Conference Rm, 719 Broadway

Abstract

This thesis presents algorithms and, in some cases, lower bounds for some fundamental pattern matching problems. In all cases, the algorithms are obtained by understanding and strongly exploiting structural pattern properties. The following results are obtained.

**Exact Complexity of String Matching:**
We consider the question of how many
character comparisons are
needed to find all occurrences of a pattern string of length
*m* in a text string of length *n*.
We show an almost tight upper bound of the form
*n*+*O*(*n*/*m*) character comparisons, following preprocessing.
Specifically, we show an upper bound of
*n* + 8/(3(*m*+1)) (*n*-*m*) character comparisons.
The following lower bounds are also shown:
for on-line algorithms, a bound of
*n* + 9 / (4(*m*+1)) (*n*-*m*) character comparisons
for *m*=35+36*k*, for any integer *k* >= 1,
and for general algorithms, a bound of
*n*+2(*n*-*m*) / (*m*+3) character comparisons,
for *m*=2*k*+1, for any integer *k* >= 1.

**Parallel Two-Dimensional Pattern Matching:**
We give the first time, space and work optimal
common CRCW-PRAM algorithm for finding all occurrences of a two-dimensional
pattern of size
*m*_{1} * *m*_{2} in a two-dimensional text of size *n*_{1} * *n*_{2}.
Our algorithm runs in *O*(1) time performing
*O*(*n*_{1} * *n*_{2}) work, following preprocessing of the pattern.
A major portion of the preprocessing step is the computation of witnesses for the
pattern.
We show how to compute witnesses for the pattern in
*O*(*log log m*_{2})time and
*O*(*m*_{1} * *m*_{2}) work when
*m*_{2} >= *m*_{1}.
In the process of designing the above algorithm,
we also obtain some new periodicity properties of two-dimensional
patterns.

**Parallel Suffix Tree Construction:**
We consider the problem of constructing the suffix tree of a given string
*s* of length *m* in parallel.
An *O*(*m*)-work, *O*(*m*)-space,
*O*(*log*^{4} *m*)-time CREW-PRAM algorithm for constructing
the suffix tree of *s* is obtained when *s* is drawn from any fixed
alphabet set.
This is the first work and space optimal parallel algorithm known for this problem.
It can be generalized to construct the suffix tree of a string *s*drawn from any general alphabet set to perform in
*O*(*log*^{4} *m*) time, ` $O(m\log |\Sigma|)$ `

work, and ` $O(m\log |\Sigma|)$ `

space,
after the characters in *s* have been sorted alphabetically;
here `$|\Sigma|$ `

is the number of distinct characters in *s*.
In this case too, the algorithm is work optimal.