VLG Group
Group Meetings
Y. LeCun's website
CS at Courant
Courant Institute

Research Projects, Past and Present

Object Recognition and Detection with Deep Learning

Sparse Coding for Feature Learning

  • Time Period: 2008 - present.
  • Participants: Karol Gregor, Koray Kavukcuoglu, Arthur Szlam, Rob Fergus, Yann LeCun (Courant Institute/CBLL).
  • Sponsors: ONR, NSF.
  • Description: We are developing unsupervised (and semi-supervised) learning algorithms to learn visual features. The methods are based on the idea that good feature representation should be high dimensional (so as to facilitate the separability of the categories), should contain enough information to reconstruct the input near regions of high data density, and should not contain enough information to reconstruct inputs in regions of low data density. This lead us to use sparsity criteria on the feature vectors. Many of the sparse coding methods we have developed include a feed-forward predictor (a so-called encoder) that can quickly produce an approximation of the optimal sparse representation of the input. This allows us to use the learned feature extractor in real-time object recognition systems. Variant of sparse coding are proposed, including one that uses group sparsity to produce locally invariant features, two methods that separate the "what" from the "where" using temporal constancy criteria, and two methods for convolutional sparse coding, where the dictionary elements are convolution kernels applied to images.

Action Recognition in Video with Convolutional Networks

  • Time Period: 2009 - present.
  • Participants: Graham Taylor, Chris Bregler, Rob Fergus, Yann LeCun (Courant Institute/CBLL).
  • Sponsors: DARPA, ONR.
  • Description: A trainable system was built to recognize actions in videos. The first layer is a Convolutional Gated Restricted Boltzmann Machine, which is trained in an unsupervised manner. It automatically learns features that primarily encode motion. The second layer uses sparse coding to learn mid-level features in an unspervised manner. The feature vectors thereby obtained are pooled over time, using a max-pooling operation, and fed to a Support vector Machine. Excellent performance was obtained on the Hollywood-2 dataset. A similar system was built to recognize actions on the KTH dataset. It also uses a CGRBM at the first layer, but uses a 3D (spatio-temporal) convolutional network architecture for the following layers.

Embedded Hardware for Real-Time Vision

  • Time Period: 2008 - present.
  • Participants: Clement Farabet, Yann LeCun (Courant Institute/CBLL), Eugenio Culurciello (Yale EE).
  • Sponsors: ONR, US Army/STTR, NYU/Poly.
  • Description: Our aim is to implement convolutional networks and other vision algorithms on small, low-power specialized hardware. As of september 2010, we have an operational implementation of convolutional networks running on a development board built around a Xilinx Virtex-6 FPGA. We also have an implementation running on a custom 8x7cm board built around a Xilinx Virtex-4 FPGA. The system uses a new architectural concept for high-throughput processing of streams of data, borrowing ideas from "dataflow" architecture concepts. The design is called "NeuFlow".

Unsupervised Deep Learning

  • Time Period: September 2004 - present.
  • Participants: Marc'Aurelio Ranzato, Koray Kavukcuoglu, Y-Lan Boureau, Yann LeCun (Courant Institute/CBLL).
  • Sponsors: DARPA, ONR, NSF.
  • Description: Animals and humans can learn to see, perceive, act, and communicate with an efficiency that no Machine Learning method can approach. The brains of humans and animals are "deep", in the sense that each action is the result of a long chain of synaptic communications (many layers of processing). We are currently researching efficient learning algorithms for such "deep architectures". We are currently concentrating on unsupervised learning algorithms that can be used to produce deep hierarchies of features for visual recognition. We surmise that understanding deep learning will not only enable us to build more intelligent machines, but will also help us understand human intelligence and the mechanisms of human learning.

EBLearn: A C++ library for deep/energy-based learning and convolutional nets

  • Time Period: 2008 - present.
  • Participants: Pierre Sermanet, Koray Kavukcuoglu, Yann LeCun (Courant Institute/CBLL).
  • Sponsors: Willow Garage, ONR.
  • Description: EBLearn is a C++ library that implements a number of classes and functions for machine learning and computer vision. EBLearn makes it easy to construct complex learning machines by assembling functional blocks. A semi-automatic differentiation mechanism enables easy gradient-based learning. The Library implements such models as convolutional networks, and makes it possible to easily train systems that can detect objects in images (e.g. faces, pedestrians). The library is platform independent, and can be compiled on Linux, Windows, Android, and other platform. It contains a class hierarchy and several functions for manupilating multi-dimensional tensors. The API is inspired by the machine learning library in the Lush language.

Dynamic Factor Graphs for Sequential Data Analysis and Prediction

  • Time Period: September 2006 - December 2010.
  • Participants: Piotr Mirowski, Yann LeCun (Courant Institute/CBLL).
  • Sponsors: ONR.
  • Description: A Dynamic Factor Graph models sequential data such as financial time-series, biological (DNA/protein) sequences, text, and other modalities. A DFG includes factors modeling joint probabilities between hidden and observed variables, and factors modeling dynamical constraints on hidden variables. The DFG assigns a scalar energy to each configuration of hidden and observed variables. A gradient-based inference procedure finds the minimum-energy state sequence for a given observation sequence. Using smoothing regularizers, DFGs have been shown to reconstruct chaotic attractors and to separate a mixture of independent oscillatory sources perfectly. Our DFGs outperformed the best known algorithm on the CATS competition benchmark for time series prediction. DFGs also successfully reconstruct missing motion capture data.

Relational Regression

  • Time Period: September 2006 - present.
  • Participants: Sumit Chopra, Yann LeCun (Courant Institute/CBLL), Trivikraman Thampy, John Leahy, Andrew Caplin (Economics Dept, NYU).
  • Sponsors: NYU
  • Description: We are developing a new type of relational graphical models that can be applied to "structured regression problem". A prime example of structured regression problem is the prediction of house prices. The price of a house depends not only on the characteristics of the house, but also of the prices of similar houses in the neighborhood, or perhaps on hidden features of the neighborhood that influence them. Our relational regression model infers a hidden "desirability sruface" from which house prices are predicted.

Energy-Based Models

  • Time Period: September 2003 - present.
  • Participants: Yann LeCun, Sumit Chopra, Marc'Aurelio Ranzato, Y-Lan Boureau, Fu-Jie Huang (Courant Institute/CBLL)
  • Sponsors: NSF.
  • Description: Probabilistic graphical models associate a probability to each configuration of the relevant variables. Energy-based models (EBGM) associate an energy to those configurations, eliminating the need for proper normalization of probability distributions. Making a decision (an inference) with an EBM consists in comparing the energies associated with various configurations of the variable to be predicted, and choosing the one with the smallest energy. Such systems must be trained discriminatively to associate low energies to the desired configurations and higher energies to undesired configurations. A wide variety of loss function can be usedo for this purpose. We give sufficient conditions that a loss function should satisfy so that its minimization will cause the system to approach to desired behavior.
  • Latest Publication:
    • [LeCun et al 2006]. A Tutorial on Energy-Based Learning, in Bakir et al. (eds) "Predicting Structured Outputs", MIT Press 2006.

LAGR: Learning Applied to Ground Robots

  • Team Leaders: Yann LeCun (Courant Institute/CBLL), Urs Muller (Net-Scale Technologies).
  • Team Members:
  • Former Team Members:
  • Sponsor: DARPA
  • Description: Our team is one of 8 participants in the LAGR project funded by the US Government. Each LAGR team receives an indentical copy of the LAGR robot, built be the National Robotics Engineering Consortium at Carnegie-Mellon University.

    The purpose of the project is to design vision and learning algorithms to allow the robot to navigate in complex outdoors environment.

  • Epilepsy Seizure Prediction with Convolutional Networks

    • Time Period: 2006 - 2009.
    • Participants: Piotr Mirowski, Yann LeCun (Courant Institute/CBLL).
    • Sponsors: NYU Medical Center
    • Description: A collaboration between CBLL, Dr. Deepak Madhavan (University of Nebraska Medical Center), and Dr. Ruben (NYU Comprehensive Epilepsy Center). In 2009, our methodology achieved the best known result on the 21-patient EEG public dataset from the University of Freiburg. Our best algorithm succeeded in predicting all test seizures with no false positives on 15 patients out of 21. The method is built around a temporal convolutional network applied to wavelet-based phase-locking synchrony features.

    DAVE: Learning Vision-Based Obstacle Avoidance for Mobile Robots

    • Time Period: September 2003 - June 2004.
    • Participants: Yann LeCun (Courant Institute/CBLL), Eric Cosatto, Jan Ben, Urs Muller, Beat Flepp (Net-Scale Technologies).
    • Sponser: DARPA
    • Description: We built a small off-road robot that uses an end-to-end learning system to avoid obstacles solely from visual input. The robot transmits analog video from two video cameras to a remote computer, which drives the robot through remote control. The driving software is built around a convolutional network which is trained to produce the steering angle of a human driver from raw video frames.

      This project served as a proof-of-concept for the LAGR project.


    NORB: Generic Object Recognition in Images

    • Time Period: September 2003 - 2009.
    • Participants: Fu Jie Huang, Yann LeCun (Courant Institute/CBLL), Leon Bottou (NEC Labs).
    • Sponsers: NSF.
    • The recognition of generic object categories with invariance to pose, lighting, diverse backgrounds, and the presence of clutter is one of the major challenges of Computer Vision.

      We are developing learning systems that can recognize generic object purely from their shape, independently of pose, illumination, and surrounding clutter.

    • Publications:
      • [LeCun, Huang, Bottou, 2004]. Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting Proceedings of CVPR 2004.
    • DatasetDownload the NORB dataset.

    Automatic Phenotyping

    We are using convolutional networks and conditional random fields to automatically segment movies of developing C. Elegans into regions: cell nucleus, nucleus membrane, cytoplasm, cell membrane.

    Simultaneous Face Detection and Pose Estimation