CSC2515: Some project ideas
---------------------------

Optimization:

  -- compare the efficiency of two or more optimization
     methods for training the same model, e.g. gradient descent
     vs. conjugate gradient vs. stochastic gradient 
  --  Implement the multiplicative updates described in
      F. Sha, L. K. Saul, and D. D. Lee (2002).
      "Multiplicative updates for nonnegative quadratic programming in
      support vector machines."
      Technical Report MS-CIS-02-19, University of Pennsylvania.
      and compare to at least one other way of solving the QP problem
      for support vector machines. 
      If you want to do more theoretical work you could derive the
     extension of the updates in the case where the data is not
     linearly separable in the feature space. 
  -- Investigate the "split-and-merge" heuristic for improving
     the local minimum found by K-Means.

Regression

  -- Implement the lasso. Compare it to ridge regression
     with the ridge penalty chosen using CV, on one or more problems.
  -- Another option is to extend the lasso in some way, e.g. derive a
     "kernel lasso", or apply the lasso trick to classification in some way.
  -- Implement the leaps-and-bounds method for variable subset
     selection. Compare it to ridge regression with the ridge penalty
     chosen using CV, on one or more problems. Suggest a way to do
     variable selection when there are too many dimensions to use
     leaps and bounds efficiently. Try out your new method.

Classification

  -- Train a logistic regression network to distinguish spam
     from non-spam email. There is some example spam data on the web
     in various places, including the web site for the "Elements of
     Statistical Learning" book.
  -- Do vector quantization on a few adjacent columns of spectrograms
     made from clean speech of various speakers. Train one VQ per
     person, and use these to do speaker identification by
     seeing which VQ best reconstructs a test spectrogram.
     See me if you want some multi-person speech data.
  -- Train a simple HMM on character sequences from text.
     Train one HMM per language and use them to do language
     identification.


Unsupervised Learning

  -- Implement mixtures of fully observed trees and
     compare them to a single fully observed tree on modeling some
     data.
  -- Investigate a mixture of factor analyzers model in which there
     are a total of M factors, each one a vector of length
     equal to the dimension of the observations.
     Each mixture component selects exactly K of these M factors
     and uses those K to form its factor loading matrix, Lambda. 
     Derive EM learning for this model, and apply it to one
     simple example (e.g. handwritten digits).
     In the E-step, each mixture component must infer the best
     K factors for it to use, as well as the values of its hidden
     variable z if it uses those factors.
     In the M-step we update all M factors.
     This is an interesting example of parameter-sharing, which
     I don't think anyone has looked at before.
     For now, consider M and K small enough that you can search
     through all possible (M choose K) factor combinations
     in the E-step.