Syllabus for Web Search Engines

(Note: Almost all the papers assigned for reading are on the web, and the online version of this syllabus is an HTML document with links to the pages. The bibliographic reference given here are therefore less complete than would usually be necessary.)

Lecture 1: Overview and Crawlers

Architecture of a general-purpose search engine, such as Google or AltaVista. Issues in the implementation of an industrial-strength web crawler.

Reading

Lecture 2: Indexing and Retrieval

Indexing and retrieving text files by words. Database structure. Measuring the relevance of a text to a query.

Reading

Lecture 3: Link Analysis for Ranking

Using link structure to evaluate the importance of a Web page. PageRank. Hubs and Authorities.

Reading

Lecture 4: Clustering

Organizing text documents into groups.

Reading

Lecture 5.A: Evaluation

Measuring the quality of a search engine. Precision/recall and other measures.

Reading

Lecture 5.B: Query Languages

Standard and extended query languages.

Readings

Lecture 6: Question Answering

Reading

Lecture 7: The Invisible Web and Specialized Search Engines

Reading

Lecture 8: Multimedia retrieval: Images

Readings

Lecture 9: Multimedia Retrieval; 3D Models

Readings

Lecture 10: The Semantic Web

Characterizing web services, and retrieving them in terms of their functiality.

Readings

Lecture 11: Web Mining

Automated classification of papers. Information extraction from usage logs and its applications.

Reading

Lecture 12: Focussed Crawling

Reading

Lecture 13: Structure of the Web

Overall structural and evolutionary characteristics of the Web. Techniques for compressing web data.

Reading