Computer Science Colloquium

Document Mining using Things and Strings

Lyle Ungar

Friday, February 2, 2007 11:30 A.M.
Room 1302 Warren Weaver Hall
251 Mercer Street
New York, NY 10012-1185

I. Dan Melamed melamed at cs dot nyu dot edu, (212) 998-3003


The next generation of search engines need to be aware of things (e.g., entities in databases) as well as strings (terms in text). Building scalable entity detection and resolution methods for combining databases and document corpora raises a host of interesting machine learning questions. This talk will describe a few of these questions and some starts at using feature generation and selection methods to answer them.

Joint work with Dean Foster, Jing Zhou and Sasha Popescul

