Pairwise Comparison between Genomic Sequences and Optical-Maps
Candidate: Bing Sun
Advisor: Bud Mishra


With the development and improvement of high throughput experimental technologies, massive amount of biological data including genomic sequences and optical-maps have been collected for various species. Comparative techniques play a central role in investigating the adaptive significance of organismal traits and revealing evolutionary relations among organisms by comparing these biological data. This dissertation presents two efficient comparative analysis tools used in comparative genomics and comparative optical-map study, respectively.

A complete genome sequence of an organism can be viewed as its ultimate genetic map, in the sense that the heritable information are encoded within the DNA and the order of nucleotides along chromosomes is known. Comparative genomics can be applied to find functional sites by comparing genetic maps. Comparing vertebrate genomes requires efficient cross-species sequence alignment programs. The first tool introduced in this thesis is COMBAT (Clean Ordered Mer-Based Alignment Tool), a new mer-based method which can search rapidly for highly similar translated genomic sequences using the stable-marriage algorithm (SM) as an alignment filter. In experiments COMBAT is applied to comparative analysis between yeast genomes, and between the human genome and the recently published bovine genome. The homologous blocks identified by COMBAT are comparable with the alignments produced by BLASTP and BLASTZ.

When genetic maps are not available, other genomic maps, including optical-maps, can be constructed. An optical map is an ordered enumeration of the restriction sites along with the estimated lengths of the restriction fragments between consecutive restriction sites. CAPO (Comparative Analysis and Phylogeny with Optical-Maps), introduced as a second technique in this thesis, is a tool for inferring phylogeny based on pairwise optical map comparison and bipartite graph matching. CAPO combines the stable matching algorithm with either the Unweighted Pair Group Method with Arithmetic Averaging (UPGMA) or the Neighbor-Joining (NJ) method for constructing phylogenetic trees. This new algorithm is capable of constructing phylogenetic trees in logarithmic steps and performs well in practice. Using optical maps constructed in silico and in vivo, our work shows that both UPGMA-flavored trees and the NJ-flavored trees produced by CAPO share substantial overlapping tree topology and are biologically meaningful.