Lab 5 notes

Slides from Lab 5 can be found here: lab5_slides.pdf

Code demonstration

Demonstration of learning trees and displaying them as pdfs with scikit-learn: tree_display.py
(also has an example of imputing missing values with the mean of the population).

Outputs are written as .dot files which can be written to pdf with the following command:
dot -Tpdf input.dot > output.pdf (requires graphviz)

example trees learned on the Pima indian dataset
Data is found in the UCI repository. (For interest- see the wikipedia page about the Pima people)

Controlling the max depth:
    max_depth1.dot.pdf
    max_depth5.dot.pdf
    max_depth10.dot.pdf

Controlling the min samples per leaf:
    min_samples_1.dot.pdf
    min_samples_10.dot.pdf
    min_samples_20.dot.pdf