Time Series Modeling with Hidden Variables and Gradient-Based Algorithms

Candidate: Piotr Mirowski

Advisor: Yann LeCun

We collect time series from real-world phenomena, such as gene interactions in biology or word frequencies in consecutive news articles.
However, these data present us with an incomplete picture, as they result
from complex dynamical processes involving unobserved state variables.
Research on state-space models is motivated by simultaneously trying to
infer hidden state variables from observations, as well as learning the
associated dynamic and generative models.

I have developed a tractable, gradient-based method for training Dynamic
Factor Graphs (DFG) with continuous latent variables. A DFG consists of
(potentially nonlinear) factors modeling joint probabilities between
hidden and observed variables. The DFG assigns a scalar energy to each
configuration of variables, and a gradient-based inference procedure
finds the minimum-energy state sequence for a given observation sequence. We
approximate maximum likelihood learning by minimizing the expected energy
over training sequences with respect to the factors' parameters. These
alternated inference and parameter updates constitute a deterministic
EM-like procedure.

Using nonlinear factors such as deep, convolutional networks, DFGs were
shown to reconstruct chaotic attractors, to outperform a time series
prediction benchmark, and to successfully impute motion capture data
where a large number of markers were missing. In a joint work with the NYU
Plant Systems Biology Lab, DFGs have been subsequently employed to the
discovery of gene regulation networks by learning the dynamics of mRNA expression
levels.

DFGs have also been extended into a deep auto-encoder architecture, and
used on time-stamped text documents, with word frequencies as inputs. We
focused on collections of documents that exhibit a structure over time.
Working as dynamic topic models, DFGs could extract a latent trajectory
from consecutive political speeches; applied to news articles, they
achieved state-of-the-art text categorization and retrieval performance.

Finally, I used an embodiment of DFGs to evaluate the likelihood of
discrete sequences of words in text corpora, relying on dynamics on word
embeddings. Collaborating with AT&T Labs Research on a project in speech
recognition, we have improved on existing continuous statistical language
models by enriching them with word features and long-range topic
dependencies.