Computer Science Colloquium
Big Data: Statistical and Computational Challenges and Opportunities
Nando de Freitas, University of British Columbia
March 27, 2013
Warren Weaver Hall, 1302
251 Mercer Street
New York, NY 10012
Spring 2013 Colloquia Calendar
I will review several successful consumer products and argue that huge datasets played a central role in their construction. Among these products, I will describe Zite, a personalized magazine built by my team and acquired by CNN, which uses massive classifiers to label nearly a billion documents and personalize content for millions of users. I will then discuss two machine learning ideas at the core of many of these products: Ensemble methods and Bayesian optimization. I will briefly present new theoretical results on online random forests, a very popular ensemble method, and point out theoretical problems and practical challenges in scaling this technique further. I will conclude with an overview of Bayesian optimization, present our theoretical advances in this area, and share our experiences in applying this technique to automatic algorithm configuration, information extraction, massive online analytics and intelligent user interfaces.
Nando de Freitas is a full professor in machine learning at the department of computer science at UBC. He is also an associate member of statistics and cognitive systems. He received his PhD from Cambridge University early in 2000, and was a postdoctoral fellow at UC Berkeley for two years before joining UBC. He is a fellow of the Canadian Institute For Advanced Research (CIFAR) in the successful Neural Computation and Adaptive Perception program. Among his recent awards are the 2012 Charles A. McDowell Award for Excellence in Research and the 2010 Mathematics of Information Technology and Complex Systems (MITACS) Young Researcher Award.