Title : Hierarchical Convolutional Deep Learning in Computer Vision

Candidate: Matthew Zeiler
Advisor: Rob Fergus

Abstract: It has long been the goal in computer vision to learn a hierarchy of features useful for object recognition. Spanning the two traditional paradigms of machine learning, unsupervised and supervised learning, we investigate the application of deep learning methods to tackle this challenging task and to learn robust representations of images.

We begin our investigation with the introduction of a novel unsupervised learning technique called deconvolutional networks. Based on convolutional sparse coding, we show this model learns interesting decompositions of images into parts without object label information. This method, which easily scales to large images, becomes increasingly invariant by learning multiple layers of feature extraction coupled with pooling layers. We introduce a novel pooling method called Gaussian pooling to enable these layers to store continuous location information while being differentiable, creating a unified objective function to optimize.

In the supervised learning domain, a well-established model for recognition of objects is the convolutional network. We introduce a new regularization method for convolutional networks called stochastic pooling which relies on sampling noise to prevent these powerful models from overfitting. Additionally, we show novel visualizations of these complex models to better understand what they learn and to provide insight on how to develop state-of-the-art architectures for large-scale classification of 1,000 different object categories.

We also investigate some other related problems in deep learning. First, we introduce a model for the task of mapping one high dimensional time series sequence onto another. Second, we address the choice of nonlinearity in neural networks, showing evidence that rectified linear units outperform others types in automatic speech recognition. Finally, we introduce a novel optimization method called ADADELTA which shows promising convergence speeds in practice while being robust to hyper-parameter selection.