Backpropagation and multi-module gradient-based learning.
Yann LeCun & Raia Hadsell, 2005

The purpose of this assignment is to implement and experiment
with multi-module gradient-based learning and backpropagation.

Your assignment should be sent by email to the TA
- Send your email in plain text (no msword, no html, no postscript, no pdf).
- you must implement the code by yourself, but you are 
  encouraged to discuss your results with other students.
- Include your source code, zipped or tar'd, as an attachment.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
You must implement various modules and learning machine
architectures and train them on the dataset provided
in this directory.

Much of the code is implemented and provided in Lush.
You merely have to copy or rename "modules-template.lsh" 
to "modules.lsh" and  fill in the blanks. You must also
modify "main.lsh" to run the experiments and get the results.
The places where you must insert your code are
indicated by the following line in modules-template.lsh:
  (error "you must implement this")
All of the material has been discussed in class, and
much of the necessary information is available in the 
slides on the class website.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

PART 1:  Implement the necessary modules.
         (modify modules.lsh)

1.1: Implement the euclidean module. 
  The Euclidean module has two vector inputs and 1 scalar output
  computed as follows:   output =  1/2 || input1 - input2 ||^2
  You must implement the fprop and bprop methods
  of the module.

1.2: Implement the linear module.
  The linear module has one input vector and one output vector.
  The input-output function is:
    output = w * input
  where w is a trainable weight matrix.
  You must implement the fprop and bprop methods.
  The bprop method must propagate gradient back to the
  input and to the weights.

1.3: Implement the tanh module.
   The tanh module has a vector input and a vector output of the
   same size. The input-output function for each component is:
     output_i = tanh( input_i + bias_i ).
   bias is a vector of (trainable) internal parameters.
   You must implement the fprop and bprop methods.
   The bprop method must compute gradients with respect to
   the input and to the bias vector.
   Use the functions idx-tanh and idx-dtanh defined in sigmoid.lsh

1.3: Implement nn-layer (one layer of a neural net)
   The nn-layer is a cascade of two modules: a linear
   module followed by a tanh module.
   You must implement the fprop and bprop methods and the 
   constructor.

1.4: Implement nn-2layer (a 2 layer neural net (one hidden layer))
   The nn-2layer is a cascade of two nn-layer modules.
   You must implement the fprop and bprop methods and the constructor.

EXTRA CREDIT:
1.5: devise an automatic scheme for testing the correctness
   of the bprop method of any module. Explain how it would work.

1.6: implement the above scheme.

________________________________________________________________
PART 2: train a 2-layer neural net on the isolet dataset.

  To complete this part you must make slight modifications
  to the functions example-isolet-linear example-isolet-neuralnet
  in  main.lsh

  The dataset is a spoken letter recognition dataset.
  This is a relatively large set with over 600 input features
  per sample, 26 categories, and over 6000 training samples. 
  The dataset contains speech data from 150 subjects who spoke 
  the name of each letter of the alphabet twice. Hence, there are
  52 training examples from each speaker. The features include 
  spectral coefficients (resulting from Fourier transform) as well
  as other features (contour features, sonorant features, 
  pre-sonorant features, and post-sonorant features).
  The file isolet.readme provides additional information 
  about the original data.

  The data is provided in four files in Lush matrix format:
  isolet-train.mat: training set input vectors 
  isolet-train-labels.mat: training set labels (0..25)
  isolet-test.mat: test set input vectors
  isolet-test-labels.mat: test set labels (0..25)

3.1: train a linear module with a euclidean cost on this
  set using 4000 training samples and 1000 test samples. 
  report the average loss and the error rate on the 
  training set and the test set. Experiment to find a 
  good learning rate and decay rate.

3.2: train 2-layer networks (with one hidden layer) 
  containing 10,  20, 40, and 80 hidden units using 4000 
  training samples and 1000 test samples. Find a good 
  learning rate and decay rate for each experiment.
  For each network size report the loss and the error 
  rate on the training set and the test set.
  You should get less than 5% error on the test set
  with an appropriately sized-network (which is better
  than the results reported in the original paper
  that used this data).
  WARNING: a training run will take several minutes.

4.: Calculate the number of parameters (weights and biases) 
  for each of the 5 machines you experimented with. Plot
  the performance (test set error) of the machines with 
  respect to number of parameters. You may use any
  plotting tool you like; lush has a Plotter object
  that can be used.
