# dianecai/decision-trees

No description, website, or topics provided.
Switch branches/tags
Nothing to show
Latest commit 06edf49 Nov 8, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
.ipynb_checkpoints Nov 8, 2018
assets Nov 8, 2018
data Nov 8, 2018
practice Nov 8, 2018
slides Nov 8, 2018
CHANGELOG.md Nov 8, 2018
README.md Nov 8, 2018
decision_trees.ipynb Nov 8, 2018
ensembles_random_forests.ipynb Nov 8, 2018

# Decision Trees, Random Forests

Unit 4: Required

## Materials We Provide

Lesson Part 1. Decision Trees Here
Part 2. Ensembles & Random Forests Here
Datasets MLB player data hitters.csv
Titanic survivors titanic.csv
Vehicles (test) vehicles_test.csv
Vehicles (train) vehicles_train.csv
Practice Random Forest from Scratch Here
Source Materials Original files used to create this lesson Here

The MLB player data and Titanic datasets were chosen for this lesson because they are small enough to make very interpretable decision trees without much tuning. The vehicle data is very small, chosen as an easy example students could do by hand. It also provides a pre-split test set.

## Learning Objectives

This lesson has two parts. Decision trees are introduced in part one, while random forests and other ensemble methods are elaborated on in part two. If you have limited class time, focus specifically on part one.

### Part One: Decision Trees

After this lesson, students will be able to:

• Explain how a decision tree is created.
• Build a decision tree model in scikit-learn.
• Tune a decision tree model and explain how tuning impacts the model.
• Interpret a tree diagram.
• Describe the key differences between regression and classification trees.
• Decide whether or not a decision tree is an appropriate model for a given problem.

### Part Two: Ensembles & Random Forests

After this lesson, students will be able to:

• Define how and why decision trees can be improved using bagging and random forests.
• Build random forest models for classification and regression.
• Explain how to extract the most important predictors in a random forest model.

## Student Requirements

Before this lesson, students should already be able to:

• Build and evaluate classification models in sklearn
• Explain how and when to apply different resampling methods
• Define the concepts of cross-validation and overfitting

## Lesson Outline

### Lesson Outline (Part One: Decision Trees)

TOTAL (170 min)

• Introduction
• A: Regression Trees (115 min)
• Group Exercise (20 min)
• Building a Regression Tree by Hand (30 min)
• How Does a Computer build a Regression Tree? (15 min)
• Demo: Choosing the Ideal Cutpoint for a Given Feature (15 min)
• Building a Regression Tree in scikit-learn (10 min)
• What Happens When We Grow a Tree Too Deep? (5 min)
• Tuning a Regression Tree (15 min)
• Making Predictions for the Testing Data (5 min)
• B: Classification Trees (45 min)
• Comparing Regression Trees and Classification Trees (10 min)
• Splitting Criteria for Classification Trees (20 min)
• Building a Classification Tree in scikit-learn (15 min)
• Summary: Comparing Decision Trees With Other Models (10 min)

### Lesson Outline (Part Two: Ensembles & Decision Trees)

TOTAL (100 min - to leave room for projects if needed)

• Introduction
• A: Manual Ensembling (15 min)
• B: Bagging (30 min)
• Manually Implementing Bagged Decision Trees (15 min)
• Bagged Decision Trees in scikit-learn (5 min)
• Estimating Out-of-Sample Error (10 min)
• C: Random Forests (10 min)
• D: Building and Tuning Decision Trees and Random Forests (30 min)
• Optional: Predicting Salary With a Decision Tree (15 min)
• Predicting Salary With a Random Forest (10 min)
• Comparing Random Forests With Decision Trees (5 min)
• E (Optional): Tuning Individual Parameters (15 min)
• Summary