CSC2515: Some project ideas --------------------------- Optimization: -- compare the efficiency of two or more optimization methods for training the same model, e.g. gradient descent vs. conjugate gradient vs. stochastic gradient -- Implement the multiplicative updates described in F. Sha, L. K. Saul, and D. D. Lee (2002). "Multiplicative updates for nonnegative quadratic programming in support vector machines." Technical Report MS-CIS-02-19, University of Pennsylvania. and compare to at least one other way of solving the QP problem for support vector machines. If you want to do more theoretical work you could derive the extension of the updates in the case where the data is not linearly separable in the feature space. -- Investigate the "split-and-merge" heuristic for improving the local minimum found by K-Means. Regression -- Implement the lasso. Compare it to ridge regression with the ridge penalty chosen using CV, on one or more problems. -- Another option is to extend the lasso in some way, e.g. derive a "kernel lasso", or apply the lasso trick to classification in some way. -- Implement the leaps-and-bounds method for variable subset selection. Compare it to ridge regression with the ridge penalty chosen using CV, on one or more problems. Suggest a way to do variable selection when there are too many dimensions to use leaps and bounds efficiently. Try out your new method. Classification -- Train a logistic regression network to distinguish spam from non-spam email. There is some example spam data on the web in various places, including the web site for the "Elements of Statistical Learning" book. -- Do vector quantization on a few adjacent columns of spectrograms made from clean speech of various speakers. Train one VQ per person, and use these to do speaker identification by seeing which VQ best reconstructs a test spectrogram. See me if you want some multi-person speech data. -- Train a simple HMM on character sequences from text. Train one HMM per language and use them to do language identification. Unsupervised Learning -- Implement mixtures of fully observed trees and compare them to a single fully observed tree on modeling some data. -- Investigate a mixture of factor analyzers model in which there are a total of M factors, each one a vector of length equal to the dimension of the observations. Each mixture component selects exactly K of these M factors and uses those K to form its factor loading matrix, Lambda. Derive EM learning for this model, and apply it to one simple example (e.g. handwritten digits). In the E-step, each mixture component must infer the best K factors for it to use, as well as the values of its hidden variable z if it uses those factors. In the M-step we update all M factors. This is an interesting example of parameter-sharing, which I don't think anyone has looked at before. For now, consider M and K small enough that you can search through all possible (M choose K) factor combinations in the E-step.