CSC2515, Fall 2003: Some project ideas
--------------------------------------

Optimization:

  --  Implement the multiplicative updates described in
      F. Sha, L. K. Saul, and D. D. Lee (2002).
      "Multiplicative updates for nonnegative quadratic programming in
      support vector machines."
      Technical Report MS-CIS-02-19, University of Pennsylvania.
      and compare to at least one other way fast way of solving the QP problem
      for support vector machines (e.g. SMO, or "Simple SVM").
      If you want to do more theoretical work you also could derive the
      extension of the updates in the case where the data is not
      linearly separable in the feature space. 
  --  Investigate the "split-and-merge" heuristic for improving
      the local minimum found by K-Means.
  --  Implement a "limited memory BFGS" optimizer, and compare it to
      gradient descent with momentum (or better yet to
      "delta-bar-delta") on two fronts: (a) the speed at which it
      minimizes the training error. It should crush gradient descent.
      (b) the generalization quality of the local minima that it
      finds. here you may discover that it is wildly overfitting
      and loses horribly to BFGS.


Regression & Classification Algorithms

  -- Compare the performance of lasso and ridge regression 
     (with the ridge penalty chosen using  cross validation) on 
     several regression benchmarks.
  -- Compare the performance of leaps-and-bounds for variable subset
     selection to ridge regression (with the ridge penalty
     chosen using CV) on several regression benchmarks.
  -- Come up with a way to do variable selection when there are too
     many dimensions to use leaps and bounds efficiently. Try out your
     new method. 
  -- Investigate a "kernel lasso".
  -- Apply the lasso trick to classification in some way.
  -- Compare a gaussian prior on weights with adding noise to training
     data. For example, compare using a quadratic ridge penalty on the
     weights in logistic regression to another way of regularizing it,
     namely to add an  "epsilon ghost" of each training case, which is
     a new case with a very small weight, the same inputs and the
     OPPOSITE output. 

Unsupervised Learning

  -- Implement mixtures of fully observed trees and
     compare them to a single fully observed tree on 
     modeling some data.
  -- Investigate a mixture of factor analyzers type model in which there
     are a total of M factors, each one a vector of length
     equal to the dimension of the observations. However, for each
     training case add a "lasso-type" regularization penalty on the 
     coefficients used to model that training case. This will force
     the model to set many latent variables to zero when explaining
     any given data case. Derive the inference rule for this model,
     and possibly (if you are keen) the EM learning algorithm.
     Apply it to at least one simple example (e.g. handwritten digits).
     This is an interesting example of sparsifying factor analysis, which
     I don't think anyone has looked at before.


Applications

  -- Train a logistic regression network to distinguish spam
     from non-spam email. There is some example spam data on the web
     in various places, including the web site for the "Elements of
     Statistical Learning" book.
  -- Do vector quantization on a few adjacent columns of spectrograms
     made from clean speech of various speakers. Train one VQ per
     person, and use these to do speaker identification by
     seeing which VQ best reconstructs a test spectrogram.
     See me if you want some multi-person speech data.
  -- Train a simple HMM on character sequences from text.
     Train one HMM per language and use them to do language
     identification.