Graham Taylor

Modeling Human Motion Using Binary Latent Variables

Graham W. Taylor, Geoffrey E. Hinton and Sam Roweis

We propose a non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued "visible" variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes on-line inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture.

Download paper in PDF format. (343 KB)
Sample source code available here.

Videos

These videos are generated using the framework discussed in the NIPS paper.

I recommend viewing the mp4 files (which have been encoded using H.264). They are smaller and tend to play at the proper frame rate. If they are not automatically supported by your browser, you can try the multi-platform VLC player. They should also play in recent versions of Quicktime and mplayer.

A The motion was generated using a latent variable model with 200 binary stochastic units. The model was trained on 3825 frames of motion capture data consisting of walking sequences. For generation, the model has been initialized by a few frames of walking. [gif] [mp4]
B The motion was generated using a latent variable model with 200 binary stochastic units. The model was trained on 2515 frames of motion capture data consisting of walking and running sequences. For generation, the model has been initialized by a few frames of running. [gif] [mp4]
C The motion was generated using the same model as B), but the model has been initialized by a few frames of walking. The sequence features a transition from walking to running. [gif] [mp4]

Older videos (using an earlier version of the model)

These videos below have been generated by a simpler, initial version of our model which clearly produces an inferior result. We have left these videos here for interest's sake, but the results are not indicative of the model discussed in the NIPS paper.

1 The motion was generated using a latent variable model with 400 binary stochastic units. The model was trained on 2813 frames of motion capture data consisting of walking and running sequences. For generation, the model has been initialized by a few frames of walking. [gif] [mp4]
2 Same model as above, but the model has been initialized by a few frames of running. [gif] [mp4]
3 Same model as above. The training data contains no transitions between walking and running or vice-versa, but by inserting a small amount of Gaussian noise during sampling, the mode occasionally will transition between styles. [gif] [mp4]
4 Here the model has been trained on walking and running sequences that do contrain transitions between modes. Ocassionally transitions will happen "naturally" during generated sequences. [gif] [mp4]
5 A long walking sequence generated by a model trained on a different dataset, (walking sequences only) again with 400 binary stochastic units. During the generation, we use the real-valued past "probabilities" of the hidden units as input to the directed connections instead of the stochastically chosen past activations. [gif] [mp4]
6 The same model as above, but using binary past states instead of real values. The walking is less smooth, but the behaviour is more "stochastic" and will occasionally pause, turn around, etc... (there are such pauses in the training set). [gif] [mp4]
7 The model also has the ability to fill in missing data online. Here, we have deleted data (left leg, upper body) halfway through the sequence and the model fills in the missing joint angles. [gif] [mp4]