Below you'll find the details of my past and present projects.

Factor Graphs for Relational Regression

Mentors: Prof. Yann LeCun, Prof. Andrew Caplin, and Prof. John Leahy
Collaborator: Trivikraman Thampy

Factor Graphs for Relational Regression

Technical Report: TR2007-906, January 2007

Traditionally, most supervised learning algorithms assume that the data is independently and identically sampled from a fixed underlying distribution. These methods process each sample independently of others while making a decision. However there are a number of real world problems in which the data possess an inherent relational structure. The value of the unknown variables (those that are to be predicted) associated with a sample not only depends on its individual set of features, but it also depends on the features and the values of unknown variables associated with other samples that are related to it. Furthermore, these dependencies need not be direct. The samples could influence each other via a set of hidden (latent) variables.


Thus while making a prediction on a particular sample, one can gain significant performance improvement if these interaction between samples are exploited. Moreover, in many situations the latent variables associated with the problem have some underlying structure associated with them. This structure can also be used while handling the latent variables during the process of inference and training.

We are working towards designing a framework for doing regression in such relational settings, where the variables (including the latent variables) take on continuous values. In particular we propose a relational factor graph architecture that allows for efficient inference and training in relational regression problems. In addition to being capable of handling continuous variables, the proposed framework also allows for the use of log-likelihood functions that are non-linear in the parameters (that is, the non-exponential family of functions), and also eliminates the intractable partition function problem through the use of clever design of the factors.

Relational Factor Graphs for the Problem of House Price Prediction

Mentors: Prof. Yann LeCun, Prof. Andrew Caplin, and Prof. John Leahy
Collaborator: Trivikraman Thampy

Discovering the Hidden Structure of House Prices with a Non-Parametric Latent Manifold Model

13th International Conference on Knowledge Discovery and Data Mining (KDD), San Jose CA, August 2007

The problem of real estate price prediction is one such problem that falls in the domain of relational regression. The price of a house obviously depends on its individual characteristics: a 3 bedroom house will generally be more expensive than a 1 bedroom house. However, in addition to this dependence, the price also depends on the "desirability" of the neighborhood in which it lies: given two houses of similar characteristics, the one that is located in an up-market locality will be more expensive than the one which is located in a poor locality. Some of the neighborhood features that contribute to its "desirability" are measurable, such as the quality of the local schools, median house hold income etc. However a number of features that make one particular neighborhood highly "desirable" than others, cannot be measured directly and are merely reflected in the prices of other houses in the vicinity.

We apply the above relational factor graph framework to this problem. The idea is to use a single factor graph for the entire collection of samples (houses). Each sample is associated with two factors. The first factor is parametric in nature that captures the dependencies between the individual sample characteristics and its price. The second factor is non-parametric and captures the dependencies between the samples via latent variables. These latent variables can be interpreted as the "desirability" of the location of sample (a house) which needs to be learnt. We propose an efficient inference and training algorithm to train this huge factor graph.


The plot shows the learnt "desirability" map by our algorithm for the city of Los Angeles. Each colored point correspond to a test sample (a house), and is colored according to the value of the "desirability" of its location. Red color means high desirability and blue color means low. It is interesting to see that the algorithm is doing something very meaningful here. Areas like Pasadena, along the coast line (in particular around Santa Monica), parts of Hollywood hills among others are labeled as desirable. Whereas, areas in the vicinity of the downtown, in particular the eastern and south eastern region are classified as less desirable.

Conditions on Loss Functions for Training Energy-Based Models

Mentors: Prof. Yann LeCun
Collaborator: Raia Hadsell, Marc'Aurelio Ranzato, Jie Huangfu

A Tutorial on Energy-Based Learning

Predicting Structured Outputs, Bakir et al. (eds), MIT Press 2006.

Probabilistic models associate a probability to each configuration of relevant variables. In contrast, Energy Based Models (EBMs) associate an energy to each configuration of the relevant variables, without the need for proper normalization of these energies. Inference in an EBM consists of choosing the configuration with the lowest energy. Training of such a system must be done in a way so that it assigns low energies to configurations of variables consistent with the training samples, and high energies to others. This task can be accomplished by minimizing a suitably designed loss functional, which is a function of energy functions of all the training samples.
A wide variety of loss functionals can be used for this purpose. However not all the loss functionals train all the architectures successfully. One important question is, what properties should a loss function possess so that its minimization will lead the system to achieve the
behavior mentioned above? And how to design it?


We are working on studying the properties of various loss functionals used in energy based models and give sufficient conditions that any loss functional should satisfy, such that its minimization will successfully train the machine, in supervised learning framework. In addition we are working in the direction of designing an alternate EM type (deterministic EM) learning algorithm, for energy based systems involving latent variables that will work for general forms of loss functions.

Learning a Similarity Measure Discriminatively Using a Siamese Network

Mentor: Prof. Yann LeCun
Collaborator: Raia Hadsell

Learning a Similarity Measure Discriminatively with Applications to Face Verification

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego CA, June 2005.

For the problem of classification, traditional discriminative approaches, such as neural networks or svm's, require that all categories be known in advance. In addition these approaches also require that the training samples be available for all categories and they only work for small number of classes. A common approach to such kind of problems are distance based methods. This involves computing a similarity metric between the pattern to be classified/verified an a library of stored prototypes.


We designed a new scheme for discriminatively learning a similarity metric from data. The idea is to learn a function that maps input patterns to a target space such that the L1 norm in the target space approximates the semantic distance in the input space. The method was realized using a network architecture called a Siamese Network. This method can be used in recognition/verification tasks where the number of categories is large and not known in advance and where the number of training samples per category is small. The method was applied to the problem of face verification using a number of standard datasets, which included the AT&T dataset, FERET dataset and AR-Purdue dataset. Other than this we also applied this scheme for invariant object recognition and tested it on general categories (using Caltech 101 dataset), handwritten digits (using MNIST), and cluttered 3d objects (NORB).


A Siamese Network is composed of a number of components. This includes two identical copies of a parameterized function sharing the same parameters and a cost module. A loss function is plugged on top of it. The whole architecture takes a pair of objects (images for example) as input, which is passed to the two copies of the function. The cost module on top generates an energy between the outputs of the two functions. The loss function then combines the energy with the desired output label (which is also supplied to the architecture during training) to produce a scalar loss. The parameters of the functions can be updated using stochastic gradient.

Dimenionality Reduction by Learning an Invariant Mapping

Mentor: Prof. Yann LeCun
Collaborator: Raia Hadsell

Dimensionality Reduction by Learning an Invariant Mapping

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York City, NY, June 2006

Massive high dimensional datasets generated by a number of disciplines demand new techniques for its analysis, feature extraction, dimensionality reduction. Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that "similar" points in the input space are mapped to nearby points in the output manifold. There exists a number of standard techniques for solving this problem, like Principal Component Analysis (PCA), Multi-Dimensional Scaling (MDS), Local Linear Embedding (LLE), ISOMAP, Laplacian Eigen Maps, Hessian LLE and others. However most of these techniques suffer from two major drawbacks. One, they depend on a meaningful and computable distance measure in the input space. Two, none of the above techniques compute a "function" that could faithfully map a new input sample whose relationship with the training data is not known.


We proposed a new scheme for dimensionality reduction - called Dimensionality Reduction by Learning an Invariant Mapping (DRLIM) - that addresses the above two issues. It learns a globally coherent non-linear function that maps the data evenly on the output manifold. The learning relies solely on the neighbourhood relationships and does not require any computable distance measure in the input space. These neighbourhood relationships can come either as prior knowledge or manual labeling. The method can learn mappings that are invariant to a number of transformations of the input.


The top picture shows the output when DRLIM is applied to two digits from the MNIST dataset that were artificially distorted to have geometric distortions (in particular translations). Note how a pair of digits which are identical in shape but differ only in translation lie close to each other. The bottom picture shows the output when DRLIM is applied to images of a single plane from the NORB dataset. Neighbourhoods were defined independent of the lighting conditions. As expected the planes are neatly arranged according to the neighbourhood relationships parameter (which were azimuth and elevation angles), and the arrangement is independent of lighting.

Learning Applied to Ground Robots

Mentor: Prof. Yann LeCun
Collaborator: Raia Hadsell, Matthew Koichi Grimes, Ayse Naz, Marc'Aurelio Ranzato

On-line Learning for Offroad Robots: Using Spatial Label Propagation to Learn Long-Range Traversability

R. Hadsell, P. Sermanet, J. Ben, A. Erkan, J. Han, M. Grimes, S. Chopra, Y. Sulsky, B. Flepp, U. Muller, and Y. LeCun
Technical Report, CBLL-TR-2007-01-01, January 2007

NYU team - headed by Prof. Yann LeCun in collaboration with Net-Scale Technologies - was selected for the LAGR (Learning Applied to Ground Robots) projects funded by DARPA. The goal is to design a robust autonomous navigational system given fixed hardware. The robot is provided by DARPA, and the challenge lies in designing the navigational system complying with the limited capabilities of the hardware.


I was partly involved in designing the long range obstacle detection system. However due to political and competitive reasons I cannot disclose further details of the system, until it is actually tested and deployed on the field!!! Other than that, I also designed a short range obstacle detection system using Homography transformation.

Past Projects

In order to browse through my part research projects, on which I worked on during my under-graduation, click here.