Assignment II

Due date Sep. 29, 2003

In this assignment you will use operations on arrays that are relevant to work in computational biology.

As you no doubt know, genetic information is encoded in the DNA by means of an alphabet consisting of 4 characters: A, C, G and T. A strand of DNA consists of thousands or millions of these letters. Within this very long string, certain combinations of letters have a special meaning (for example punctuation marks that indicate where a gene begins and ends).

For this assignment, the input consists of two strings: one long string that corresponds to a DNA fragment, and one short string that describes a marker. You can assume that the length of the marker is less than 10.

The output of the program must include the following:

a) The percentages of A, C, G, and T in the original strand (i.e. count the number of occurrences of each and divide by the length).

b) The positions at which the marker occurs in the original strand.

c) assuming that every piece that appears between two markers is a gene, indicate the position of the longest gene in the strand, and output the gene itself.

We will discuss in class how to read the input and go from strings to arrays.