CSC2515, Fall 2004: Some project ideas -------------------------------------- Regression & Classification Algorithms -- Compare a gaussian prior on weights with adding noise to training data. For example, compare using a quadratic ridge penalty on the weights in logistic regression to another way of regularizing it, namely to add an "epsilon ghost" of each training case, which is a new case with a very small weight, the same inputs and the OPPOSITE output. -- Compare the performance of lasso, leaps-and-bounds for variable subset selection (Furnival-Wilson) and regular ridge regression (with the ridge penalty chosen using cross validation) on several regression benchmarks. -- Come up with a way to do variable selection when there are too many dimensions to use leaps and bounds efficiently. Try out your new method and compare to lasso. -- Investigate a "kernel lasso". -- Apply the lasso trick (absolute value penalty on the weights) to classification in some new way. Unsupervised Learning -- Implement mixtures of fully observed trees and compare them to a single fully observed tree on modeling some data. -- Investigate a mixture of factor analyzers type model in which there are a total of M factors, each one a vector of length equal to the dimension of the observations. However, for each training case add a "lasso-type" regularization penalty on the coefficients used to model that training case. This will force the model to set many latent variables to zero when explaining any given data case. Derive the inference rule for this model, and possibly (if you are keen) the EM learning algorithm. Apply it to at least one simple example (e.g. handwritten digits). This is an interesting example of sparsifying factor analysis, which I don't think anyone has looked at before. -- Implement "unobserved naive bayes" which assumes that conditioned on a discrete, unobserved cluster variable all the observed variables are conditionally independent. Apply it to some clustering problems. Optimization: -- Quadratic Programming: Implement either the multiplicative updates described in F. Sha, L. K. Saul, and D. D. Lee (2002). "Multiplicative updates for nonnegative quadratic programming in support vector machines." Technical Report MS-CIS-02-19, University of Pennsylvania or the LARS algorithm described in Efron, B., Johnstone, I., Hastie, T. and Tibshirani, R. (2003). Least angle regression, Annals of Statistics and compare these to at least one other way fast way of solving the QP problem for support vector machines and Lasso (e.g. SMO, or "Simple SVM"). If you want to do more theoretical work you also could derive the extension of the updates in the case where the data is not linearly separable in the feature space. -- Implement a "limited memory BFGS" optimizer, and compare it to gradient descent with momentum and to conjugate gradients on two fronts: (a) the speed at which it minimizes the training error. It should crush gradient descent on speed, and might be a bit faster than CG. (b) the generalization quality of the local minima that it finds. Here you may discover that it is wildly overfitting and loses horribly to gradient descent, maybe losing mildly to CG? -- Investigate the "split-and-merge" heuristic for improving the local minimum found by K-Means and compare it to the furthest first heuristic and to random initializations. Applications -- Train a logistic regression network to distinguish spam from non-spam email. There is some example spam data on the web in various places, including the web site for the "Elements of Statistical Learning" book. -- Do vector quantization on a few adjacent columns of spectrograms made from clean speech of various speakers. Train one VQ per person, and use these to do speaker identification by seeing which VQ best reconstructs a test spectrogram. See me if you want some multi-person speech data. -- Train a simple HMM on character sequences from text. Train one HMM per language and use them to do language identification. -- Use a "profile HMM" to perform multiple alignment on some discrete sequences.