Syllabus for Web Search Engines
(Note: Almost all the papers assigned for reading are on the web,
and the online version of this syllabus is an HTML document with links
to the pages. The bibliographic reference given here are therefore
less complete than would usually be necessary.)
Lecture 1: Overview and Crawlers
Architecture of a general-purpose search engine, such as Google or AltaVista.
Issues in the implementation of an industrial-strength web crawler.
Reading
- Soumen Charkabarti, Mining the Web: Discovering Knowledge
from Hypertext Data, Morgan Kaufmann, 2002. Chaps. 1, 2.
-
The Anatomy of a Large Scale Hypertextual Web Search Engine
Sergey Brin and Lawrence Page, Seventh International World Wide Web
Conference, 1998.
-
Searching the Web Arvind Arasu et al., ACM Transactions on
Internet Technology, 2001.
-
Mercator: A Scalable, Extensible Web Crawler by Allan Heydon and Marc
Najork, World Wide Web 2:4 pp. 219-229, 1999.
Lecture 2: Indexing and Retrieval
Indexing and retrieving text files by words. Database structure.
Measuring the relevance of a text to a query.
Reading
- Chakrabarti, chap.3.
- Richardo Baeza-Yates and Berthier Ribierno-Neto,
Modern Information Retrieval Addison Wesley, 1999. Chap 2.
-
Database Techniques for the World Wide Web
Daniela Florescu, Alon Levy, and Alberto Mendelzon.
SIGMOD Record 27:2 pp. 59-74, 1998.
Lecture 3: Link Analysis for Ranking
Using link structure to evaluate the importance of a Web page.
PageRank. Hubs and Authorities.
Reading
- Chakrabarti, sections 7.1-7.5.
-
The PageRank Citation Ranking: Bringing Order to the Web
by Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd.
Stanford Digital Library Technology Project, 1998.
-
Authoritative Sources in a Hyperlinked Environment
Jon M. Kleinberg
Journal of the ACM 46:5 pp. 604-632, 1999.
-
Finding Authorities and Hubs from Link Structures on the World Wide
Web
by Allan Borodin, Gareth Roberts, Jeffrey Rosenthal, and Panayiotis Tsaparas.
Tenth International World Wide Web Conference, 2001
Lecture 4: Clustering
Organizing text documents into groups.
Reading
Lecture 5.A: Evaluation
Measuring the quality of a search engine. Precision/recall and other
measures.
Reading
- Baeza-Yates and Ribiero-Neto, secs 3.1-3.2
Lecture 5.B: Query Languages
Standard and extended query languages.
Readings
Lecture 6: Question Answering
Reading
Lecture 7: The Invisible Web and Specialized Search Engines
Reading
Lecture 8: Multimedia retrieval: Images
Readings
-
ImageRover: A Content-Based Image Browser for the World-Wide Web
Stan Sclaroff, Leonid Taycher, and Marco La Cascia.
IEEE Workshop on Content-based Access of Image and
Video Libraries, 1997.
-
WebSeer: An Image Search Engine for the World Wide Web
Michael J. Swain, Charles Frankel, Vasillis Athitsos. IEEE
Computer Vision and Pattern Recognition Conference, 1997.
-
Shape Matching: Similarity Measures and Algorithms
Remco C. Veltkamp, 2001
-
Webcrawling using Sketches Michael S. Lew, Kim Lempinen, Nies Huijsmans
1997.
-
Content-Based Image Retrieval Systems: A Survey
Remco C. Veltkamp, Mirela Tanaseo
-
Searching for Images and Videos on the World-Wide Web
John R. Smith, Shih-Fun Chang 1996
Lecture 9: Multimedia Retrieval; 3D Models
Readings
Lecture 10: The Semantic Web
Characterizing web services, and retrieving them in terms of their
functiality.
Readings
Lecture 11: Web Mining
Automated classification of papers. Information extraction from
usage logs and its applications.
Reading
- Chakrabarti, chap. 5.
-
Web Mining Research: A Survey Raymond Kosala, Hendrik Blockeel.
SIGKDD volume 2, 2000.
-
Web Mining: Information and Pattern Discovery on the World Wide Web
R. Cooley, B. Mobasher, and J. Srivastava. Ninth IEEE
Conf. on Tools with Artificial Intelligence, 1997.
-
Data mining for hypertext: A tutorial survey Soumen Chakrabarti
SIGKDD volume 2, 2000.
-
Research Issues in Web Data Mining Sanjay Madria et al.
In Data Warehousing and Knowledge Discovery: First
International Conference pp. 303-312, 1999.
-
Web Usage Mining: Discovery and Applications of Usage Patterns from
Web Data J. Srivastava et al.
SIGKDD volume 2, 2000.
-
Data Preparation for Mining World Wide Web Browsing Patterns
Cooley, Mobasher, and Srivastava. Knowledge and Information
Systems, 1:1 5-32, 1999.
-
WebWatcher: A Tour Guide for the World Wide Web
T. Joachims, D. Freitag, and T. Mitchell (1997)
ICJAI-97, pp. 770-777.
Lecture 12: Focussed Crawling
Reading
Lecture 13: Structure of the Web
Overall structural and evolutionary characteristics of the Web. Techniques
for compressing web data.
Reading