Which patterns are hard to find

R. Cole, R. Hariharan, M. Paterson, U. Zwick

Israeli Symposium on Theoretical Computer Science, 1993. SIAM Journal on Computing, 24(1995), 30-45.

The paper considers the exact number of character comparisons needed to find all occurrences of a pattern of length m in a text of length n using on-line and general algorithms. For on-line algorithms, a lower bound of about (1 + 9/(4(m+1))).n character comparisons is obtained. For general algorithms, a lower bound of about (1 + 2/(m+3)).n character comparisons is obtained. These lower bounds complement an on-line upper bound of about (1 + 8/(3(m+1)).n comparisons obtained recently by Cole and Hariharan. The lower bounds are obtained by finding patterns with interesting combinatorial properties. It is also shown that for some patterns off-line algorithms can be more efficient than on-line algorithms.

postscript of full paper