Factor Graphs for Relational Regression

Candidate: Sumit Chopra

Advisor: Yann LeCun

Inherent in many interesting regression problems is a
rich underlying inter-sample "Relational Structure".
In these problems, the samples may be related to each
other in ways such that the unknown variables associated
with any sample not only depends on its individual
attributes, but also depends on the variables associated
with related samples. One such problem, whose importance
is further emphasized by the present economic crises, is
understanding real estate prices. The price of a house
clearly depends on its individual attributes, such as,
the number of bedrooms. However, the price also depends on
the neighborhood in which the house lies and on the time
period in which it was sold. This effect of neighborhood
and time on the price is not directly measurable. It is
merely reflected in the prices of other houses in the
vicinity that were sold around the same time period.
Uncovering these spatio-temporal dependencies can certainly
help better understand house prices, while at the same time
improving prediction accuracy.

Problems of this nature fall in the domain of "Statistical
Relational Learning". However the drawback of most models
proposed so far is that they cater only to classification
problems. To this end, we propose "relational factor graph"
models for doing regression in relational data. A single
factor graph is used to capture, one, dependencies among
individual variables of sample, and two, dependencies among
variables associated with multiple samples. The proposed
models are capable of capturing hidden inter-sample
dependencies via latent variables, and also permits non-linear
log-likelihood functions in parameter space, thereby allowing
considerably more complex architectures. Efficient inference
and learning algorithms for relational factor graphs are proposed.
The models are applied to predicting the prices of real estate
properties and for constructing house price indices. The
relational aspect of the model accounts for the hidden
spatio-temporal influences on the price of every house.
Experiments show that one can achieve considerably superior
performance by identifying and using the underlying
spatio-temporal structure associated with the problem. To the
best of our knowledge this is the first work in the direction
of relational regression and is also the first work in
constructing house price indices by simultaneously accounting
for the spatio-temporal effects on house prices using large-scale
industry standard data set.