Matrix Approximation for Large-scale Learning

Modern learning problems in computer vision, natural language processing,
computational biology, and other areas are often based on large data sets of
thousands to millions of training instances. However, several standard learning 
algorithms, such as kernel-based algorithms, e.g., Support Vector Machines, 
Kernel Ridge Regression, Kernel PCA, do not easily scale to such orders of 
magnitude. This thesis focuses on sampling-based matrix approximation 
techniques that help scale kernel-based algorithms to large-scale datasets.  
We address several fundamental theoretical and empirical questions including:

What approximation should be used?
We discuss two common sampling-based methods, providing novel theoretical
insights regarding their suitability for various applications and experimental
results motivated by this theory.  Our results show that one of these methods,
the Nystrom method, is superior in the context of large-scale learning. 

Do these approximations work in practice?
We show the effectiveness of approximation techniques on a variety of problems.
In the largest study to-date for manifold learning, we use the Nystrom method to
extract low-dimensional structure from high-dimensional data to effectively
cluster face images.  We also report good empirical results for kernel ridge
regression and kernel logistic regression.

How should we sample columns? 
A key aspect of sampling-based algorithms is the distribution according to
which columns are sampled.  We study both fixed and adaptive sampling schemes
as well as a promising ensemble technique that can be easily parallelized and
generates superior approximations, both in theory and in practice.

How well do these approximations work in theory?
We provide theoretical analyses of the Nystrom method to understand when this
technique should be used.  We present guarantees on approximation accuracy
based on various matrix properties and analyze the effect of matrix
approximation on actual kernel-based algorithms. 

This work has important consequences for the machine learning community since
it extends to large-scale applications the benefits of kernel-based algorithms.
The crucial aspect of this research, involving low-rank matrix approximation,
is of independent interest within the field of numerical linear algebra.