G22.2591 - AdvancedTopics in Natural Language Processing

Prof. Grishman

Spring 2009

Thursday 5-7 PM

Schedule of Lectures


Understanding natural language requires a lot of prior knowledge -- knowledge of language, and knowledge of the world as reflected in the language.  The challenge for developers of natural language processing systems is how to capture this knowledge.

Over the past 15 years, the paradigm first shifted from systems based on hand-coded rules to systems which are trained from text corpora -- in most cases, from corpora that have been hand-annotated with specific linguistic information.  In many cases, the result has been systems which significantly outperform the earlier systems with hand-coded rules.

More recently, with the availability of almost unlimited text on the Web, the focus has shifted again from supervised methods (which require annotated corpora) to semi-supervised and unsupervised methods (which operate on 'raw' text).  In effect, we learn about text from text.  We will consider how to create systems that can operate on Web-scale corpora and offer the potential of more powerful Web search -- the ability to search for facts rather than keywords.

In many cases, relatively simple models and learning methods will do quite well.  For better system performance, however, it is necessary to understand the limitations of these models and the linguistic features which can lead to better performance.  This course will look at several natural language processing tasks from this point of view, examining the linguistic characteristics that support the creation of effective models, and the learning methods required to train these models.  Among the tasks which may be considered are:
The classes will be a mix of lectures, discussion, and student presentations.  In addition to preparing two or three presentations for the course (covering specific articles and the student's project), students will be expected to run a number of smaller experiments, and one larger experiment as a term project.

Students should have
For further information, contact the instructor at  grishman@cs.nyu.edu