1. Download KNIME (Links to an external site.) to complete this assignment.
  2. Before you start working on this assignment, study the KNIME Quickstart Guide (Links to an external site.) to get familiar with the tool.
  3. Download the file adult.csv available in the data folder on the KNIME Hub. The data are provided by the UCI Machine Learning Repository.

Part 1:

  1. Create a new workflow and name it as: your_firstname_lastname_decision_tree.
  2. In your workflow, train a Decision Tree to predict whether or not a person earns more than $50K per year:
    • Partition the dataset into a training set (75%) and a test set (25%). In your Partitioning node, apply stratified sampling option on the income column.
    • Train a Decision Tree model on the training set, and apply the model to the test set.
  3. Use the Scorer node to evaluate the accuracy of the model.
  4. Change the label of each node from the default (such as “Node 1”) to a brief description (e.g. “Read data from adults.csv”).
  5. Try out other parameter settings to get a higher accuracy. For example, change the quality measure, pruning method, or minimum number of records.
  6. When you are satisfied, export your workflow project and change the file name to include your full name. e.g. “potter_henry_decision_tree.knwf.”

Part 2:

  1. Research and prepare answers to the following questions in a Word or pdf file:
    • What’s the purpose of applying stratified sampling option on the income column?
    • What’s the purpose of pruning and minimum number of records?
    • How did you changed parameter settings to get a higher accuracy?
parameter settings to get a higher accuracy