Update 5/18/06

Fixed a problem with both dft_statstream.k and grid.k (used by sketch_statstream) where the correlated output pairs reported the stream indices and not the streamIDs.  This caused the outputs to be off by one. 



Update 5/17/06

Comments:

  Some of the changes I have made (and not made):
  
  I corrected spelling errors and documentation errors as I went along.  I have left most of the original author(s)' comments except when I felt they were misleading or where a slight wording change would better communicate what was being coded.  I tried to "pretty-up" the code as I went through it by lining up the indentations and ending brackets, for example.  Hopefully, since I edited these in Windows, the code will still look good on unix.
  
  There were a number of coding errors that I have corrected as I have worked with these files. Most of these I have communicated in earlier emails.  Some corrections I have made most recently are relatively minor in that they don't impact the key functionality of the program.  For example, in sketch_statstream the comments at the top of the file were from a different file and the example calling parameters were incomplete and wrong.  Also the processargs[] routine was incomplete.  These have been fixed.
  
  I have added numerous comments to document what the functions are doing.  I'm sure some of my comments will seem superfluous to an expert K programmer, oh well :)
  
  I have tried to place all file-level variables at the top of each file with comments.  Some of these variables are hard-wired and are commented as such.  
  
  With many of the routines I have annotated "where used" and "assumed variables".
  
  I have added comments that document the dimensions or expected sizes of input and output parameters of routines.
  
  I have removed a lot of superfluous and dead code.  sketch.k is much smaller than it was.  It has only two routines in it and the whole file could be incorporated into sketch_statstream easily, but I have NOT done that (I was tempted though!)
  
  In grid.k there were a number of duplicate statements and statements that were harmless but made no sense.  For example:
  
    R::2
    R=2;factor::2*c
    R=2;grid:(P;P)#-1
  
  In computeneighbor[] there is a still a loop that computes a one dimensional case of the grid, but is never used.  In that loops in general are very expensive in a script language this should be removed, but I left it as I'm not sure what the author had in mind.  Its cost is only during initialization of the grid.
  
  The asymmetry that I reported earlier in distributeQuery[] I decided not to change.  I'll let the authors do that if they wish.  I made my concerns known.  
  
  Base.k has many functions in it that are not used and/or duplicated in other files.  I have not deleted any functions from base as it appears to be a "library" of sorts.  
  
Lee.

Appendix:  Functional changes to the code:

  
  The list is fortunatly short:

1. Dft_statstream.k, Report[] function
  The 2nd line of the function you access the 3 coordinates and negate:

     g = -data[index;`dftnorm;!R]
     x: Hash[g]
 
     For the positive correlated pairs you access g again without 
  negation and then negate again :

     g = data[index;`dftnorm;!R]
     g:-g
     x: Hash[g]
 
  The 2nd lookup of g is superfluous and the 2nd negation of g creates 
  the same g as used for the negative correlated pairs.  I have 
  confirmed this with printouts so it is a bug.
  Fixed.

2. Dft_statstream.k, Report[] function:
Timestamp ranges are incorrectly reported.

Using the default parameters bw=10, sw=10, n_points = 160, n_stream = 50.
This produces nb=5 and h will range from 0 to 15 as there are 16 bw in the entire stream.  

My output.csv shows correlated streams with timepoints that go as high as 170.  That can't be since we only generate 160 timepoints.

The problem, is the initInd and the endInd in Report[]

I added an offset at the top of Report[], which only needs to be calculated once:
	off: (h-(nb-1))*bw

Then in the inner loop
    initInd: 1 + off
	endInd:  sw + off

This seems to work and not be so dependent on "magic" numbers

3.  I changed the constants being used for Pi and sqrt(2)/2 to have full precision values.  In dft_statstream and base.k they are defined as TWOPI and SQRT2O2

4.  In grid.k I changed the Hash[] function so that it does not rely on "Comparison Tolerance". i.e., 

  Hash: {[x]: PP+ _ _floor x*unit_r}

5. In sketch_statstream the processargs[] was incomplete. 