Lecture 13: Running Times of Algorithms: Order of Magnitude, Worst Case, Asymptotic Analysis

The toString() method for linked lists

There is one method that runs slow in all our linked list definitions; namely, toString().

To illustrate, run the code at TestSlowToString.java.
This uses the definition of singly linked lists in MyList1.java.
The code here:

If you run this with argument 10000 the sum prints out immediately but there is a noticeable delay before it computes toString(). With argument 20000, there is a delay of several seconds, and with argument of 100,000 it takes a long time. Meanwhile the sum continue to print out immediately (as you would expect; adding 100,000 numbers on a 1 GHz machine takes 100 microseconds.) What is the problem with toString()?

To answer that we have to look under the hood, as they say.

The solution (Azam pointed this out to me) is to use a StringBuffer rather than a String. A StringBuffer is an array of characters that may be up to half empty and is modifiable. It keeps a count of the number of characters beng used in the array.

Here's the code: TestFastToString.java.

When you execute the statment S.append(A.getValue().toString()); it just copies A.getValue().toString()); into the empty space in S as long as there is room. When you run out of room, the system allocates an array twice as long for S, and copies the old value of S into it. That is a slow operation, but if you end up with an array with C characters, the doubling only has to be done log2C. For N=100,000, the number of characters C is about 400,000, so the doubling only has to be done log2400,000 = 19 times. Other than at the doubling steps each character is copied only once, into the array holding S. It turns out that the total time is proportional to C = N * log10N.

Running time for algorithms

How do you measure how long an algorithm takes to run?

Experimentally

Program the algorithm. Put together a collection of typical sample problems. Run the program, and measure the time.

Difficulties: Depends on

Theoretically

Consider how the computation time grows as a function of the size of the problem.

Standard measure for algorithm A

For any N, consider all problems of size N. Let f(N) be the time that A takes on problem of size N on which it runs slowest. Describe the growth rate of f as a function of N.

"Order of magnitude" --- The general growth rate, ignoring constant factors.
"asymptotic" --- as N gets large.
"worst case" --- the worst problem of size N.
analysis.

Advantage: Same for all programming languages, all compilers, (practically) all machines (abstracting away finite memory), (practically) all computational models.

Exceptions: Models with arbitrary amounts of parallelism. Quantum computers.

Mathematical notation.
Assume f(n) and g(n) are functions that are always positive.

f(n) is O(g(n))
means: There is a constant c such that f(n) <= c*g(n) for all n>=1. Examples:

if f(n) = 100n2 and g(n) = 2n2 then f(n) < c*g(n) for c=51 or higher, so f(n) is O(g(n)).

if f(n) = 100,000n and g(n)=n^{3} then f(n) <= 100,000*g(n) so f(n) is O(g(n)).

if f(n) = n3 and g(n) = 100,000n then f(n) is not O(g(n)). Proof: Choose any value of c. Let n = 100*c. Then f(n)/g(n) = 1,000,000 n3 / 100,000 n = 10 n2, so f > c*g(n)

Rules: If f or g is a sum ignore all but the fastest growing term. Ignore any constant factor.
Example: If f(n)=5n3 + 2n2 + 242, just treat it as n3

Powers of n go like the exponent.
Example: n1/2 is O(n2) but not vice versa.

Exponentials go like the base.
Example: 2n is O(3n) but not vice versa.

Logarithms and powers of logarithms are slower than power. Example: log(n)2 is O(n1/2) but not vice versa.

n log n is between n and n2 (and much closer to n).

Used in the form "< Running time > is O( < mathematical function > )"

Examples.

What is the "size" of a problem?

Computation theory: The number of bits.

Usual usage: Some reasonable, relevant measure of size. The length of a linked list. The size of a set. The length of a string. etc.

There may be more than one size parameter.
E.g. The time to compute the intersection of two ordered lists L and M is O(|L|+|M|).