CBLL HOME
VLG Group
News/Events
Seminars

People
Research
Publications
Talks
Demos
Datasets
Software
Courses
Links

Group Meetings

Join CBLL

Y. LeCun's website
CS at Courant
Courant Institute
NYU

Advanced Machine Learning: Schedule

[ Course Homepage | Schedule and Course Material | Mailing List ]

This page contains the schedule, slide from the lectures, lecture notes, reading lists, assigments, and web links.

I urge you to download the DjVu viewer and view the DjVu version of the documents below. They display faster, are higher quality, and have generally smaller file sizes than the PS and PDF.

01/18: Introduction

01/26: multi-layer learning

Papers

Marc'Aurelio Ranzato: Symmetric Product of Experts.

02/15: Unsupervised Learning

Papers

Boltzmann Machines (1983-1990) (Yann).
Hinton and McClelland: Learning Representations by Recirculation (NIPS 1987) (Matt, Ayse).

02/22: Hinton Day

Talks

A short Review of statistical physics concepts: energy, entropy, free energy, gibbs distribution (Yann).

Helmoltz Machines: this page. Either (Hinton and Zemel, NIPS 1994), (Zemel and Hinton, Neural Computation 1995), (Hinton, Dayan, Frey, and Neal, Science 1995), or (Dayan, Hinton, Neal, Zeme, Neural Computation 1995), or some combination thereof (Alyssa, Piotr, Marina).
Hinton: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 2002 (Philip, Marco, Marc'Aurelio).

03/01: Graphical Models

Different types of graphical models (Yann)

Bayesian belief nets
directed graphical models
graphical models with loops are generally intractable
conditional probability tables are invertible with Bayes rule: the directions of the arrow don't matter in principle (they do not express causality, just dependency).
undirected graphical models: the likelihood is a product of potential functions
Markov random fields: graphical models with local interactions
undirected graphical models with potential functions must be normalized explicitely. The partition function problem.
factor graphs: each potential function is explicitely represented (a slightly more general representation of graphical models)
logarithmic representation: the factors are additive energy functions. The likelihood is proportional to exp(-energy).
energy-based models: factors graphs without normalization (no partition function). Can be used when no explicit probabilities are required: only the relative values of the energis matter.
representing common models as factor graphs: example: an HMM is a "comb".

03/08: Independent Component Analysis, Source Separation

Papers

Bell AJ, Sejnowski TJ (1995) "An information-maximization approach to blind separation and blind deconvolution," Neural Computation, 7: 1129-1159.
a shorter/earlier version of Bell and Sejnowski (Crispy, Jie, George).
Zibulevsky & Pearlmutter: Blind Source Separation by Sparse Decomposition in a Signal Dictionary. Neural Computation, 13(4):863-882. 2001. [DjVu] [PDF] (Jeremy, Sumit, Koray).
Hinton G. E., Welling, M., Teh, Y. W, and Osindero, S. A New View of ICA, Proceedings of ICA-2001, San Diego (Raia, Yury, Jihun).