CBLL HOME
VLG Group
News/Events
Seminars
People
Research
Publications
Talks
Demos
Datasets
Software
Courses
Links
Group Meetings
Join CBLL
Y. LeCun's website
CS at Courant
Courant Institute
NYU
Lush
Lush

Talks by Members of CBLL


All the talks and posters on this page are provided in DjVu, PDF.
Some are also provided in ODP (Open Office's Open Document Format), and PPT (MS PowerPoint).
Caution: a few PDF files do not display correctly in ghostview.

We very strongly encourage interested readers to use the DjVu versions: they display instantly, load much faster, and have no compatibility problems.

  • djview4: Free/Open Source DjVu viewer for Windows, Mac OS-X, and Linux.
  • DjVu plug-in, Windows: "Official" DjVu plug-in for Windows (free).
  • DjVu plug-in, Mac: "Official" DjVu plug-in for Mac (free).
  • DjVu.org: links to other DjVu viewers for Windows, Java, Sharp Zaurus, PalmOS, Symbian, Pocket PC, Be OS....
  • On Linux, DjVu is supported by Evince (standard Gnome document viewer), and Okular (standard KDE document viewer).
  • On Windows, DjVu is supported by ACDSee and IrfanView.

2011-05-05: LIP6 ML Seminar, Universite P. et M. Curie

SLIDES (PDF)

A similar seminar was given at the Xerox Research Center Europe in Grenoble the following day.

Title: Learning Feature Hierarchies for Vision

Abstract: Intelligent perceptual tasks such as vision and audition require the construction of good internal representations. Theoretical and empirical evidence suggest that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract. An important challenge for Machine Learning is to devise "deep learning" methods for multi-stage architecture than can automatically learn good feature hierarchies from labeled and unlabeled data.

A class of such methods that combine unsupervised sparse coding, and supervised refinement will be described. We demonstrate the use of these deep learning methods to train convolutional networks (ConvNets). ConvNets are biologically-inspired architectures consisting of multiple stages of filter banks, interspersed with non-linear operations, and spatial pooling operations, analogous to the simple cells and complex cells in the mammalian visual cortex.

A number of applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a pedestrian detector, and system that recognizes human activities in videos, and a trainable vision system for off-road mobile robot navigation.

A new kind of "dataflow" computer architecture, dubbed NeuFlow, was designed to run these algorithms (and other vision and recognition algorithms) in real time on small, embeddable platforms. an FPGA implementation of NeuFlow running various vision applications will be shown. An ASIC is being designed in collaboration with e-lab at Yale, which will be capable of 700 Giga-operations per second for less than 3 Watts.

2011-03-02: Rutgers/Yahoo ML Seminar

SLIDES (PDF)

VIDEO

Title: Learning Feature Hierarchies for Vision

2011-02-23: Princeton Plasma Physics Lab

Title:Building Artificial Vision Systems with Machine Learning.

This talk is for a general technical audience.

SLIDES (PDF)

VIDEO

Direct link to .mov video file

Abstract: Animals and humans autonomously learn to perceive and navigate the world. What "learning algorithm" does the cortex use to organize itself? Could computers and robots learn to perceive the way animals do, by just observing the world and moving around it? This constitutes a major challenge for machine learning and computer vision.

The visual cortex uses a multi-stage hierarchy of representations, from pixels, to edges, motifs, parts, objects, and scenes. A new branch of machine learning research, known as "deep learning" is producing new algorithms that can learn such multi-stage hierarchies of representations from raw inputs. I will describe a biologically-inspired, trainable vision architecture called convolutional network. It consists of multiple stages of filter banks, non-linear operations, and spatial pooling operations, analogous to the simple cells and complex cells in the mammalian visual cortex. Convolutional nets are first trained with unlabeled samples using a learning method based on sparse coding, and subsequently fine-tuned using labelled samples with a gradient-based supervised learning algorithm.

A number of applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a pedestrian detector, and system that recognizes human activities in videos, and a trainable vision system for off-road mobile robot navigation. A very fast implementation of these systems on specialized hardware will be shown. It is based on a new programmable and reconfigurable "dataflow" architecture dubbed NeuFlow.

2010-07-12: Five Lectures at the PCMI Summer School

Five lectures on machine learning and object recognition at the Park City Mathematics Institute Graduate Summer School, organized by the Institute of Advanced Studies.

Slides:

2010-06-30: Keynote talk at ICISP, Trois Rivieres, Quebec

Learning Hierarchies of Visual Features

Keynote talk given at the International Conference on Image and Signal Processing in Trois Rivieres, Quebec.

Slides:

2010-05-05: IS&T Colloquium at NASA Goddard Space Flight Center

Can Robots Learn to See?

NASA IS&T Colloquium delivered at the NASA Goddard Space Flight Center in Maryland (with video).

NASA has a link to the presentation and a video of the talk.

Slides:

Direct direct link to the video webcast.

2010-04-19: NSF Workshop on Hybrid Neuro-Computer Vision Systems at Columbia University

Learning Deep Hierarchies of Visual Features

Talk given at Columbia University for the NSF Workshop on Hybrid Neuro-Computer Vision Systems at Columbia University. The audience was a mixture of neuroscientists, computer vision researchers and hardware experts.

Slides:

Video from Columbia University site

2010-01-14: Lectures at the CIfAR/Microsoft Winter School, Bangalore, India

Series of lectures at the Microsoft/CIfAR Winter School on Machine Learning and Computer Vision.

Slides:

2009-07-20: Tutorial at the Vision-Learning Pattern Recognition Summer School, Beijing

Two lectures given at the 2009 Sino-USA Vision-Learning Pattern Recognition Summer School that took place at Peking University, Beijing.

  • [PDF (21.5MB)] [DjVu (7.6MB)] [ODP (15.2MB)] Deep Learning
  • [PDF (8.5MB)] [DjVu (4.2MB)] [ODP (15.0MB)] Other Methods and Applications of Deep Learning
  • [PDF (16.5MB)] [DjVu (8.5MB)] [ODP (12.0MB)] Learning Invariant Feature Hierarchies
  • [PDF (3.2MB)] [DjVu (0.9MB)] [ODP (41KB)] Future Challenges

2009-04-10: Interview at NYAS

Podcast of an interview with Yann LeCun about research at CBLL in vision, learning, robotics and neuroscience. This is an 18 minute interview conducted by the New York Academy of Science: [MP3 from NYAS]

2009-03-02: Distinguished Lecture at NEC Labs, Princeton, NJ

Learning Hierarchies of Invariant Visual Features

(a slightly updated version of the UIUC talk).

Slides:

2009-03-02: Distinguished Lecture at University of Illinois, Urbana Champaign

Learning Hierarchies of Invariant Visual Features

Slides:

Video

Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world. Internal representations (or "features") must be invariant (or robust) to irrelevant variations of the input, but must preserve the information relevant to the task. An important goal of our research is to devise methods that can automatically learn good internal representations from labeled and unlabeled data. Results from theoretical analysis, and experimental evidence from visual neuroscience, suggest that the visual world is best represented by a multi-stage hierarchy, in which features in successive stages are increasingly global, invariant, and abstract. The main question is how can one train such deep architectures from unlabeled data and limited amounts of labeled data.

Several methods have recently been proposed to train deep architectures in an unsupervised fashion. Each layer of the deep architecture is composed of a feed-forward encoder which computes a feature vector from the input, and a feed-back decoder which reconstructs the input from the features. The training shapes an energy landscape with low valleys around the training samples and high plateaus everywhere else. A number of such layers can be stacked and trained sequentially. A particular class of methods for deep energy-based unsupervised learning will be described that imposes sparsity constraints on the features. When applied to natural image patches, the method produces hierarchies of filters similar to those found in the mammalian visual cortex. A simple modification of the sparsity criterion produces locally-invariant features with similar characteristics as hand-designed features, such as SIFT.

An application to category-level object recognition with invariance to pose and illumination will be described. By stacking multiple stages of sparse features, and refining the whole system with supervised training, state-the-art accuracy can be achieved on standard datasets with very few labeled samples. Another application to vision-based navigation for off-road mobile robots will be shown. After a phase of off-line unsupervised learning, the system autonomously learns to discriminate obstacles from traversable areas at long range using labels produced with stereo vision for nearby areas.

This is joint work with Y-Lan Boureau, Karol Gregor, Raia Hadsell, Koray Kavakcuoglu, and Marc'Aurelio Ranzato.

2008-09-05: Machine Learning Summer School

Four lectures at the 2008 Machine Learning Summer School, Ile de Ré, France, September 5, 2009.

  • [PDF (16.2MB)] [DjVu (7.3MB)] [ODP (2.8MB)] Energy-Based Models
  • [PDF (14.9MB)] [DjVu (5.9MB)] [ODP (4.4MB)] Supervised Learning
  • [PDF (14.2MB)] [DjVu (6.2MB)] [ODP (4.6MB)] Manifold Learning
  • [PDF (20.9MB)] [DjVu (9.1MB)] [ODP (19.7MB)]: Deep Learning

2008-04-09: Talk at Google and Stanford on Deep Learning

Talk on Visual Perception with Deep Learning given by Yann at Google on April 9, 2008. A similar talk was given at the Stanford AI Lab on April 10.

Slides:

67 minute Video on YouTube

Link to the YouTube page for this video

Abstract: A long-term goal of Machine Learning research is to solve highy complex "intelligent" tasks, such as visual perception auditory perception, and language understanding. To reach that goal, the ML community must solve two problems: the Deep Learning Problem, and the Partition Function Problem.

There is considerable theoretical and empirical evidence that complex tasks, such as invariant object recognition in vision, require "deep" architectures, composed of multiple layers of trainable non-linear modules.  The Deep Learning Problem is related to the difficulty of training such deep architectures.

Several methods have recently been proposed to train (or pre-train) deep architectures in an unsupervised fashion. Each layer of the deep architecture is composed of an encoder which computes a feature vector from the input, and a decoder which reconstructs the input from the features.  A large number of such layers can be stacked and trained sequentially, thereby learning a deep hierarchy of features with increasing levels of abstraction.  The training of each layer can be seen as shaping an energy landscape with low valleys around the training samples and high plateaus everywhere else. Forming these high plateaus constitute the so-called Partition Function problem.

A particular class of methods for deep energy-based unsupervised learning will be described that solves the Partition Function problem by imposing sparsity constraints on the features. The method can learn multiple levels of sparse and overcomplete representations of data. When applied to natural image patches, the method produces hierarchies of filters similar to those found in the mammalian visual cortex.

An application to category-level object recognition with invariance to pose and illumination will be described (with a live demo).  Another application to vision-based navigation for off-road mobile robots will be described (with videos). The system autonomously learns to discriminate obstacles from traversable areas at long range.

This is joint work with Y-Lan Boureau, Sumit Chopra, Raia Hadsell, Fu-Jie Huang, Koray Kavakcuoglu, and Marc'Aurelio Ranzato.

2007-12-08: Interview with Yann LeCun at Video Lectures


A 15 minute Interview with Yann LeCun on machine learning research, lecturing styles, where NIPS is going, the philosophy of science, and various other topics.

2007-12-07: Who is Afraid of Convex Optimization?

Video and slides of a talk given at the 2007 NIPS workshop on Efficient Learning, in Vancouver, Canada, December 7, 2007.

I'll probably make a few friends with that one.

Click on the image at right to view the video of the talk (with lots of questions from the audience).


Who is Afraid of Non-Convex Loss Functions?

2007-12-06: Learning a Deep Hierarchy of Sparse Invariant Features

Slides and Video of a talk given at the NIPS satellite session on deep learning. in Vancouver, Canada, December 6, 2007. Video of the talk: Part 1 (85.0MB), Part 2 (84.3MB)

(Part 2 also contains Martin Szummer's talk)

Other talks at that satellite session are available at the meeting's main web site.

2007-09-24: Energy-Based Models in Document Recognition and Computer Vision

Slides of a keynote talk given at the 2007 International Conference on Document Analysis and Recognition (ICDAR), in Curitiba, Brazil, September 24, 2007.

PAPER:
Yann LeCun, Sumit Chopra, Marc'Aurelio Ranzato and Fu-Jie Huang: Energy-Based Models in Document Recognition and Computer Vision, Proc. International Conference on Document Analysis and Recognition (ICDAR), 2007, [key=lecun-icdar-keynote-07]. 110KBDjVu
355KBPDF
551KBPS.GZ

Abstract: Over the last few years, the Machine Learning and Natural Language Processing communities have devoted a considerable of work to learning models whose outputs are "structured", such as sequences of characters and words in a human language. The methods of choice include Conditional Random Fields, Hidden Markov SVMs, and Maximum Margin Markov Networks. These models can be seen as un-normalized versions of discriminative Hidden Markov Models. It may come to a surprise to the ICDAR community that this class of models was originally developed in the handwriting recognition community in the mid 90's to train handwritten recognition systems at word-level discriminatively. The various approaches can be described in a unified manner through to concept of "Energy-Based Model" (EBM). EBMs capture depencies between variables by associating a scalar energy to each configuration of the variables. Given a set of observed variables (e.g an image), an EBM inference consists in finding configurations of unobserved variables (e.g. a recognized word or sentence) that minimize the energy. Training an EBM consists in designing a loss function whose minimization will shape the energy surface so that correct variable configurations have lower energies than incorrect configurations. The main advantage of the EBM approach is to circumvent one of the main difficulties associated with training probabilistic models: keeping them properly normalized, a potentially intractable problem with complex models. Energy-Based learning has been applied with considerable success to such problems as handwriting recognition, natural language processing, biological sequence analysis, computer vision (object detection and recognition), image segmentation, image restoration, unsupervised feature learning, and dimensionality reduction. Several specific applications will be described (and, for some, illustrated with real-time demonstrations) including: a check reading system, a real-time system for simultaneously detecting human faces in images and estimating their pose; an unsupervised method for learning invariant feature hierarchies; and a real-time system for detecting and recognizing generic object categories in images, such as airplanes, cars, animal, and people.

2007-09-14: Tutorial at IPAM on Trainable Metrics

Slides and audio podcast of a tutorial given at the 2007 IPAM Workshop on the Mathematics of Knowledge and Search Engines, September 14, 2007.

  • 1. Learning Similarity Metrics [ DjVu(6.1MB)] [PDF (14.2MB)]
  • 2. Supervised and Unsupervised Methods for Learning Invariant Feature Hierarchies [DjVu(13.1MB)] [PDF (29.4MB)]
  • Audio Podcast: [MP3 from NYU (58MB)][MP3 from IPAM (58MB)]

2007-07-13: Supervised and Unsupervised Methods for Learning Invariant Feature Hierarchies

Slides of a tutorial given at the 2007 International Computer Vision Summer School, July 13, 2007.

2006-12-08: Learning Similarity Metrics with Invariance

Slides and video of an invited talk given at the 2006 NIPS workshop "Learning to Compare Samples", December 8, 2006.

2006-12-04: NIPS Tutorial: Energy-Based Models, Structured Learning Beyond Likelihoods

Slides and Video of a 2-hour Tutorial at the NIPS conference in Vancouver, Dec 04, 2006.

The video of the tutorial is available from the NIPS website in .mov (QuickTime) format at several resolutions:

2006-09-07: Keynote talk at the IEEE Machine Learning for Signal Processing Workshop, Maynooth, Ireland

Slides of a keynote speech at the IEEE Machine Learning for Signal Processing Workshop, Maynooth, Ireland.

2006-08-16: Tutorial on Energy-Based Models, CIAR Summer School, Toronto

Slides of a 3-hour tutorial given by Yann LeCun at the 2006 CIAR Summer School: Neural Computation & Adaptive Perception, at University of Toronto. This is new version of the tutorial based on the paper A Tutorial on Energy-Based Learning. This is considerably reworked from 2005 IPAM version.

2006-06-09: Keynote Speech at the Canadian Robotics and Vision Conference, Quebec

Supervised and Unsupervised Learning with Energy-Based Models: [Slides in DjVu (6.9MB)] [Slides in PDF (20.6MB)]

This is a good overview of research at CBLL: energy-based learning, object recognition, face detection, unsupervised feature learning, and robot vision and navigation.

2005-07-19: Tutorial on Energy-Based Models, Invariant Recognition, Trainable Metrics, and Graph Transformer Network, IPAM Summer School, UCLA

Slides and Videos of a 4-hour tutorial given by Yann LeCun at the 2005 IPAM Graduate Summer School: Intelligent Extraction of Information from Graphs and High Dimensional Data, at IPAM/UCLA. Streaming videos of all the talks are available from the IPAM web site in RealVideo format.

The tutorial includes 4 talks:

2005-05-23: Tutorial on Energy-Based Models, Invariant Recognition, and Graph Transformer Network, TTI Summer School, Chicago

Slides and videos of a 4-hour tutorial given by Yann LeCun at the Learning Theory Summer School organized by the Toyota Technological Institute in Chicago. The tutorial includes 3 talks. The 3 videos are available from VideoLectures.net.
  • Energy-Based Models: [DjVu (5.5MB)] [PDF (8.2MB)]
  • Invariant Recognition, Convolutional Networks: [DjVu (5.2MB)] [PDF (13.2MB)]
  • Graph Transformer Networks: [DjVu (1.1MB)] [PDF (2.1MB)]

VideoLectures.net also has videos of two "lunch-time debates" (or panel discussions) in which Yann was a participant, together with Rob Schapire, David McAllester, Yasemin Altun, Mikhail Belkin, Yoram Singer, and John Langford.

2005-03-23: Invariant Recognition of Generic Object Categories with Energy-Based Models, MSRI Workshop on Object Recognition

A 1-hour talk on invariant recognition of generic object categories, and on energy-based models given at the Workshop on on Object Recognition at the Mathematical Science Research Institute in Berkeley.

2001-10-22: DjVu a compression technique and software platform for distributing scanned documents, digital documents, and high-resolution images on the Web, UIUC Distinguished Lecture

A 1-hour talk (with a video) of a Distinguished Lecture given on October 22, 2001 at the University of Illinois, Urbana-Champaign.

.