Searching for Strings and Searching in Presence of Errors

Advisors: Dr. Krishna Palem
  Prof. Joel Spencer

This dissertation deals with two classes of searching problems. The first class consists of pattern matching problems, and the second class comprises combinatorial searching problems in presence of errors in response to the queries. Our results are as follows.

Standard Stringology. Standard Stringology is the study of pattern matching problems in which a text location matches one in the pattern provided the associated symbols are identical. The basic problem here is the string matching problem of detecting all occurrences of a pattern string in a text string. This naturally generalizes to the dictionary matching problem of finding all occurrences of a set of patterns, rather than a single pattern, in a given text. Very fast optimal parallel algorithms exist for string matching in the PRAM model. These algorithms rely on structural properties of the strings. Unfortunately these structural properties are not useful for solving the dictionary matching problem. We have obtained the fastest and the most work-efficient algorithms known for this problem and a number of its variants by introducing and using a new technique called shrink-and-spawn.

Non-Standard Stringology. In problems from Non-Standard Stringology, an arbitrary many-to-many matching relation holds between the text and pattern locations. An example is string matching with ``don't cares'' where the position in the text that has a ``don't care'' symbol matches every pattern position. The inherent complexity and structure of such non-standard string matching problems is not well understood. Our main results are inherent complexity bounds for these problems, characterized in terms of algebraic convolutions. Traditionally structure in pattern matching has meant repetitions in patterns, but this work exposes a novel graph-theoretic structure in these problems.

Searching in presence of errors. Given a set of items containing one or more distinguished items, the generic combinatorial search problem is to determine the distinguished item(s) using detection tests on groups of items. Motivated by fault-tolerance issues, we consider the scenario when some tests get incorrect responses. We have developed a strategy to solve the generic problem above using at most one test more than that necessary, even under adversarial placement of incorrect responses to the tests.