GraphBlast

Rosalba Giugno
Universita' di Catania
Dipartimento di Matematica e Informatica
giugno@dmi.unict.it

Dennis Shasha
Courant Institute of Mathematical Sciences
Department of Computer Science
New York University
shasha@cs.nyu.edu

1. GraphGrepSX Installation and Usage
2. GraphBlast
3. GraphGrep1.0,1.1

# What is GraphGrep?

GraphGrep  is a software package to search for a query graph in a database of graphs. Given a collection of graphs and a pattern graph, GraphGrep finds all the occurrences of the pattern in each graph. The pattern is a subgraph and it can be also a tree, a path, or  a node. The  pattern is expressed as a list of nodes and a list of edges.

The components of GraphGrep are the following:

1. graphbuild, to index the graphs in the collection and store their abstract representation;
2. graphgrep, a graph query language;
2.1 algebra,  a collection of operations to manipulate and select data.
GraphGrep manipulates heterogeneous graph collections and does not depend on a specific application.

Why GraphGrep? Many applications in industry, science, and engineering depend on efficient information retrieval methods from structured or semistructured  databases. These databases contain items with a complex structure where the relationships between the  components  (of the items) have to be maintained. The items are typically  represented by graphs. For example, shapes, images (computer vision),  XML or OEM data (web exchange data model), molecules and proteins (computational biology), graphical sentences and grammars (visual languages) are represented by graphs. By using graphs the problem of information retrieval on a database  is transformed to that of finding  subgraphs  in a collection of graphs. This problem is well known to be NP  complete. Although graph-to-graph matching  algorithms can be used, efficiency reasons dictate the use of special  techniques in order to reduce the search space and the time complexity.

Home

# Compare GraphGrep

To use any version of GraphGrep you have to: (1) create a dataset file; (2) preprocess the collection of graphs; and (3) run a query. The output of "graphgrep" contains all the occurrences of the query in the collection of graphs.

GraphGrep2.0 is implemented in C++, it uses an efficient storage system, it is fast and accurate. Compared to GraphGrep1.0 or GraphGrep1.1, GraphGrep2.0 is faster but it does not allow inexact matchings.

 Input Data Format Input Query Format Output Format Language Storage System Filtering System Wildcards GraphGrep2.1 LNE LNE, Glide List of corresponding nodes C++ Berkeley DB Compacted Hashed Paths Vector Yes GraphGrep2.0 LNE LNE List of corresponding nodes C++ Berkeley DB Compacted Hashed Paths Vector No GraphGrep1.0GraphGrep1.1 LNE LNE,Glide List of corresponding paths C, KC Binary Files Extended Hashed paths Yes

Home

In order to build a very fast system we are exploring different filtering algorithms, storage  and matching techniques. Please specify to us which version you would like to use.

To download GraphGrep, send email to shasha@cs.nyu.edu  and to giugno@dmi.unict.it.
If you care to describe your application, we'd be glad to hear about it. In any case, we will send you instructions for downloading the program.

Home

# Installation of GraphGrep2.0

GraphGrep2.0 is implemented in ANSI C++. GraphGrep2.0 interfaces BerkeleyDB as the embedded database system to handle the data. It has been ported on Unix, and  Windows  platforms.

Follow these instructions to install GraphGrep2.0.

1. Install BerkeleyDB

2. Unzip it to a folder
3. Switch to the inside folder "build_unix"
4. Type "../dist/configure --enable-cxx" and execute
5. Type "make"  and execute
6. Type "make install"  and execute
7. Make sure there is a directory "/usr/local/BerkeleyDB.4.1/"  in your system.
8. Make sure "libdb_cxx-4.1.a" " libdb_cxx.a" are in the "/usr/lib" folder. You need this library to build GraphGrep

If you have any trouble building BerkelyDB or to see detail information, please check "http://www.sleepycat.com/docs/ref/build_unix/intro.html"

2.  Build GraphGrep2.0

1. Install successfully  BerkeleyDB.
2. Unzip the graphgrep2.0 package
3. Type "cd graphgrep2.0"
4. Type "cd src; make; cd .. "
5. Check if there are "graphgrep" and "graphbuild" existing in the current folder.

For bugs and questions please contact the giugno@dmi.unict.it and shasha@cs.nyu.edu.

Home

# Usage of GraphGrep2.0

• dataset file
The file consists of a collection of graph specifications  (LNE=List of Nodes and Edges ids format).  The first line of each graph must begin with the character '#' and contains the label of the graph; it can be a number or a string. The next line contains the number of nodes in the graph. Subsequent  lines contain the nodes'  labels, one label per line. The first label is the label of node 0, the next one is the label of node 1, etc. Next is the number of edges  in the graph, followed by a list of edges consisting of node id pairs. Each line contains only one edge. GraphGrep2.0 assumes that all edges are symmetric and no double edges are given. An example of dataset file is given in here.|

• query file
It has the same format of the dataset file; only one graph per file is allowed (see an example here) .

• output file
The output is given in a file called "query_file_name.out". For example, if the query file name is query.dat the output is given in query.dat.out. The first rows of the output file contain the list of the node ids of the query.
Each following row of the output file is a subgraph matching the query. The rows contain an identification of the graph, the number of matches in each graph if not zero  and a list of data nodes matching the query nodes given in the first row (see example here ).

Run GraphGrep2.0

To run the examples go to the GraphGrep2.0 directory and do the following (If you want to run GraphGrep2.0 in a different directory: make sure that graphgrep and graphbuild are in your path):

• Database creation
In order to build a GraphGrep2.0 database, run "graphbuild -f dataset_file_name". This is a preprocessing step that creates an abstract representation of the data.

•   Parameters
graphbuild -f graphDBfile [ -lp lengthpath -fp length of FP ]
-lp lengthpath value, default is 4
-fp length of FP vector, default is 256

Query Execution
run "graphgrep -m query_file_name"

• Parameters
graphgrep -m queryfile [-o -v -noFL -n numGraphs -lp lengthpath -fp length of FP]
-"o output count"     -----it outputs  the number of total matches
-"v verbose mode"   -----it outputs the steps of opfile
-"noFL disable filtering"
-"n numGraphs"       -----input number of graphs in DB(only with option -noFL)
-"lp lengthpath"        ------input different LP, default is 4
-"fp length of FP"     -------input different length of FP, default is 256

Home

References

R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs.
Proceeding of the International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002. pdf

D. Shasha, J.T-L Wang, R. Giugno, Algorithmics and Applications of Tree and Graph Searching
Proceeding of the ACM Symposium on Principles of Database Systems (PODS), Madison, Wisconsin, June 2002. pdf

Collaborators

• Chih-Yi, HSU, master in computer science at New York University 2002. Now he is working at AT&T lab, New Jersey.

• This project is supported by New York University, USA  and University of Catania, Italy.
Visit our CT-NYU project page for a further useful package of algorithms with applications in Bioinformatics and Mobile Stations.

# Grant Support

This work has been partly supported by the U.S. National Science Foundation under grants IIS-0414763, DBI-0445666, N2010 IOB-0519985, N2010 DBI-0519984, DBI-0421604, and MCB-0209754. This support is greatly appreciated.