CSC2515, Fall 2003: Some project ideas -------------------------------------- Optimization: -- Implement the multiplicative updates described in F. Sha, L. K. Saul, and D. D. Lee (2002). "Multiplicative updates for nonnegative quadratic programming in support vector machines." Technical Report MS-CIS-02-19, University of Pennsylvania. and compare to at least one other way fast way of solving the QP problem for support vector machines (e.g. SMO, or "Simple SVM"). If you want to do more theoretical work you also could derive the extension of the updates in the case where the data is not linearly separable in the feature space. -- Investigate the "split-and-merge" heuristic for improving the local minimum found by K-Means. -- Implement a "limited memory BFGS" optimizer, and compare it to gradient descent with momentum (or better yet to "delta-bar-delta") on two fronts: (a) the speed at which it minimizes the training error. It should crush gradient descent. (b) the generalization quality of the local minima that it finds. here you may discover that it is wildly overfitting and loses horribly to BFGS. Regression & Classification Algorithms -- Compare the performance of lasso and ridge regression (with the ridge penalty chosen using cross validation) on several regression benchmarks. -- Compare the performance of leaps-and-bounds for variable subset selection to ridge regression (with the ridge penalty chosen using CV) on several regression benchmarks. -- Come up with a way to do variable selection when there are too many dimensions to use leaps and bounds efficiently. Try out your new method. -- Investigate a "kernel lasso". -- Apply the lasso trick to classification in some way. -- Compare a gaussian prior on weights with adding noise to training data. For example, compare using a quadratic ridge penalty on the weights in logistic regression to another way of regularizing it, namely to add an "epsilon ghost" of each training case, which is a new case with a very small weight, the same inputs and the OPPOSITE output. Unsupervised Learning -- Implement mixtures of fully observed trees and compare them to a single fully observed tree on modeling some data. -- Investigate a mixture of factor analyzers type model in which there are a total of M factors, each one a vector of length equal to the dimension of the observations. However, for each training case add a "lasso-type" regularization penalty on the coefficients used to model that training case. This will force the model to set many latent variables to zero when explaining any given data case. Derive the inference rule for this model, and possibly (if you are keen) the EM learning algorithm. Apply it to at least one simple example (e.g. handwritten digits). This is an interesting example of sparsifying factor analysis, which I don't think anyone has looked at before. Applications -- Train a logistic regression network to distinguish spam from non-spam email. There is some example spam data on the web in various places, including the web site for the "Elements of Statistical Learning" book. -- Do vector quantization on a few adjacent columns of spectrograms made from clean speech of various speakers. Train one VQ per person, and use these to do speaker identification by seeing which VQ best reconstructs a test spectrogram. See me if you want some multi-person speech data. -- Train a simple HMM on character sequences from text. Train one HMM per language and use them to do language identification.