Programming Assignment #4

Computer Science 102 - Fall 2007

Introduction

In this assignment we will study an efficient scheme for for compressing a text file. It is called Huffman coding. A simple method to encode text as a string of 0s and 1s is to represent each character as a unique string of 8 binary digits (bits). A text message is thus translated into a string of bits. By exploiting the fact that not all characters appear with the same frequency in the text, we can encode rarely used characters with long codes and frequently used ones with short codes.

Given a set of characters and their corresponding frequencies, Huffman coding will produce an optimal coding scheme. It uses a binary tree for encoding.

Implementation

We will use a binary heap implementation to generate Huffman codings from an input file.

a) Set up a project/package with the following classes:

public class HuffmanTree
{
   HuffmanNode root;
 
   // - add a constructor to init the Tree from a HuffmanNode
   // - the main method will go here, as well as code to take
   //   a command-line argument for the input file name
}
 
public class HuffmanNode implements Comparable
{
   public String letter; 
   public Double frequency;
   public HuffmanNode left, right;
 
   // - add constructors to initialize letter and frequency, etc.
   // - add int compareTo(Object o) method so we can compare 
   //   two HuffmanNodes based on their frequency attributes. 
   // - we're going to build a .heap. of these HuffmanNodes...
}
 
public class BinaryHeap // the heap class as posted
{
  // - use code provided in class except:
  // - add an int getSize() method that returns 
  //   the # of elements in the heap...  
}
 
You will also need the Overflow class in your project/package

b) Add the following methods:

  1. public HuffmanNode(String letter, Double frequency). This constructor creates a new HuffmanNode where letter is set to l, frequency is set to f, and left and right are set to null.
  2. public HuffmanNode(HuffmanNode left, HuffmanNode right). This constructor creates a new HuffmanNode from its two children (i.e. the two nodes passed as parameters should become children of the new node), setting the letter variable to the concatenation of left.letter & right.letter, and the frequency variable to the sum of left.frequency & right.frequency.
  3. public String toString(). Place this method in the HuffmanNode class. It returns a string of form "<"+letter+", "+frequency+">". There's no need to recursively iterate left/right pointers in this method. It may help you to better debug your assignment if you also add toString() methods in the BinaryHeap and HuffmanTree classes. I'll let you figure out on your own how these methods should behave...
  4. public int compareTo(Object o). Place this method in the HuffmanNode class. This method casts Object o into a HuffmanNode object called huff. We then return this.frequency.compareTo(huff.frequency). This allows us to make a heap of HuffmanNodes where the frequency determines which node is larger than which... (This is the primary reason we make frequency of type Double rather than double...)
  5. public HuffmanTree(HuffmanNode huff)
  6. public void printLegend()is located in the HuffmanTree class, and it calls printLegend(root, ""), which calls private void printLegend(HuffmanNode t, String s) which is a recursive method that works as follows:
  7. public static BinaryHeap fileToHeap(String filename) is located in the HuffmanTree class and takes a String for the file name containing our input (letter & frequency data). The letters and frequencies are all in the first line of the file, with spaces as separators. (You may assume each separator is a single space).
  8. We create a single HuffmanNode for each letter and its frequency, and insert each of these into a new BinaryHeap. (Note: We don't start the Huffman algorithm just yet. We merely call the Binary Heap's constructor that takes an array of Comparables. Don't forget that BinaryHeaps ignore the zeroth entry!!!) .
  9. public static HuffmanTree createFromHeap(BinaryHeap b) is located in the HuffmanTree class. We run the Huffman algorithm here. When we have only one element left in the heap, we remove it, and create a new HuffmanTree object with root set to our removed object...
  10. public static void main(String args[]) calls fileToHeap() on args[0], (which means we specify the name of the file to read at the command-line) and returns a BinaryHeap (bheap, for example). We then call bheap.printHeap() on the heap. Next, we call createFromHeap(bheap) on the heap to run our Huffman algorithm which returns a HuffmanTree, called, here, htree. Finally, we call htree.printLegend() on this HuffmanTree object to print the binary encodings for each of the letters in our input file...

C. The data for this program is posted on the web at huff.txt; or if for some reason you cannot download the data from the web, you can make your own file. (The more files you can test on, the better!) The legend for huff.txt is:

A 20 E 24 G 3 H 4 I 17 L 6 N 5 O 10 S 8 V 1 W 2

Be sure to end all contents of the files with a newline!!!

The HuffmanTree class should be runnable from the command-line with huff.txt as input using something like:

java <package>.HuffmanTree huff.txt // pkg is optional

If you are using JCreator or NetBeans, be sure to select the option to execute with command-line arguments, and specify the name of the input file that way. For Eclipse, add the file-name to the run configuration in the Run/Run. menu in the .Program arguments. field.