April, 2014
Applications of Subgraph Isomorphism Algorithm
Complexes Search in Biological Networks
The CORUM dataset contains protein complexes from several species. Each complex extracted from a species is known to be highly conserved in different species. That is, proteins of a complex are high conserved in the other species via protein sequence similarity obtained from the SIMAP database. This conservation does not necessarily imply that the molecular interactions within a complex in a species are also present in the other species. We test whether the implication nevertheless does apply. We labeled the protein interaction networks used in the datasets PPI using the GO annotations taken from BioDBNet. We extract the interactions from the species where some complexes of interest originated, and we look for all occurrences of such complexes graphs (patterns) in the other species (targets) using their known functional annotations and the similarity map computed with SIMAP. Using DAVID we extracted the most relevant GO terms associated with the subunits of the complexes. The similarity map is used to select among the results the most similar to the complex patterns.
The complex 20S proteasome from Mus musculus.
Let us use as example the complex 20S proteasome from Mus musculus. It has 14 vertices and 182 edges. It is an essential component of the ATP-dependent proteolytic pathway in eukaryotic cells and is responsible for the degradation of most cellular proteins. The 20S proteasome in our PPI dataset is high conserved in Homo sapiens, Rattus norvegicus, Bos taurus, Xenopus tropicalis, Danio rerio, and Takifugu rubipres and medium conserved in Caenorhabditis elegans, Drosophila melanogaster and Saccaromyces cerevisiae. Figure shows a screen shot of CORUM phylogenetic conservation of the 20S proteasome complex.
CORUM phylogenetic conservation of the 20S proteasome complex.
We found that the 20S proteasome complex interactions are preserved in all tests networks except in the network for Xenopus tropicalis. This may due to the lack of pattern edges or relevant vertices in the target networks. In general we observed a large number of matches in networks related to complex organisms (for example in Homo sapiens we found that the complex also matches with subunits specific of the 26S proteasome complexes). Figure shows a match of the 20S proteasome complex in Takifugu rubipres and Rattus norvegicus, respectively.
The 20S proteasome complex occurrence in Takifugu rubipres.
The 20S proteasome complex occurrence in Rattus norvegicus.
[I Dunger-Kaltenbach, G Fobo, C Frishman, G Montrone, and HW Mewes. Corum: the comprehensive resource of mammalian
protein complexes. Nucleic Acids Res., 2010.]
[T Rattei, P Tischler, S Gotz, J Jehl, MA Hoser, R Arnold, A Conesa, and MewesHW. Simapa comprehensive database of
pre-calculated protein sequence similarities, domains, annotations and clusters. Nucleic Acids Res., 2010.]
[U Mudunuri, A Che, M Yi, and MR Stephens. biodbnet: the biological database network. Bioinformatics, 25(4):555–556,
2009.]
[BT Sherman, DW Huang, Q Tan, Y Guo, S Bour, D Lui, R Stephens, MW Baseler, C Lane, and RA Lempicki. David
knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput
gene functional analysis. BMC Bioinformatics, 2007.]
[LA Elenich, D Nandi, AE Kent, TS McCluskey, M Cruz, MN Iyer, EC Woodward, CW Conn, AL Ochoa, DB Ginsburg, and
Monaco JJ. The complete primary structure of mouse 20s proteasomes. Immunogenetics, 1999.]
Searching molecules
Cheminformatics computational tools solve chemistry problems such as exploration of structural and chemical or biological data. Daylight Chemical Information Systems is a leader company in this area.
Systems such as Daylight collect a set of molecules represented in two dimensions. Then, given a subgraph (pattern), they may apply a subgraph isomorphism algorithm to determine how many times and where the subgraphs occur in each molecule (target) of the collection. The aim of the above work is to predict or increase the functionality of new or known molecules.
The following example shown the occurrences in the drug Dobutamine and Oxaprozin of a carbon ring. We use to generate this images the tool Depictmatch provided by Daylight at http://www.daylight.com/daycgi_tutorials/depictmatch.cgi. We point to the website of Daylight for more examples of its use.
![]() |
![]() |
The carbon ring pattern graph. | The occurrences of the pattern graph (carbon ring) in Dobutamine and Oxaprozin molecules. |
Querying biological Networks
Biological networks represent systems that arise from complex interactions between components (people, organisms, cells, proteins, DNA, RNA and small molecules). Such networks are naturally modeled as large graphs, which can be analyzed using graph theoretical techniques. Locating subgraphs matching a specific topology is useful to find higher-order connectivity motifs of networks that may have functional relevance in the modeled biological system. In cell biology, it may be of interest to see whether the connectivity of genes of one functional type is similar to some characteristic shape, like a feed-forward loop. In epidemiology, acquaintance graphs between people may characterize specific patterns of disease outbreaks and can be used to optimize vaccine delivery. The results are subgraphs of the original graph (target) connected in the same way as the pattern graph.
For example NetMatch is a Network Querying Tool that uses an efficient subgraph isomorphism algorithm like RI. NetMatch is a Cytoscape plugin. An example of use of NetMatch is the following "Find all feed-forward loops in galFiltered.xgmml network".
Given the pattern graph it finds by using a subgraph isomorphism algorithm all occurrences in the target graph as shown below.
![]() |
![]() |
The pattern graph. | The occurrences of the pattern graph in the target graph. |
[Alfredo Ferro, Rosalba Giugno, Giuseppe Pigola, Alfredo Pulvirenti, Dmitry Skripin, G. D. Bader, Dennis Shasha: NetMatch: a Cytoscape plugin for searching biological networks. Bioinformatics 23(7): 910-912 (2007)]
http://ferrolab.dmi.unict.it/netmatch.html
http://baderlab.org/Software/NetMatch