# dat14syd/5-experiments-hypothesis-tests

No description, website, or topics provided.
Switch branches/tags
Nothing to show
Latest commit b9feae5 Jan 10, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets Dec 6, 2018
practice Dec 11, 2018
.gitignore Dec 11, 2018
experiments-hypothesis-tests.ipynb Dec 11, 2018
experiments-hypothesis-tests_solutions.ipynb Jan 10, 2019

# Experiments and Hypothesis Testing

Unit 2, Required

## Materials We Provide

Lesson Experiments and Hypothesis Testing Lesson Here
Solutions Sample Solutions for Lesson Sections Here
Practice Individual Practice Activity (includes data and sample solutions) Here
Extra Materials French Fry Study Here

In this lesson, we'll use an online CSV file about advertising data, taken from the book "An Introduction to Statistical Learning". This dataset is easy to understand. It allows students to easily compare sales data across three advertising mediums.

## Learning Objectives

By the end of this lesson, students will be able to:

• Explain the difference between causation and correlation
• Determine causality and sampling bias using Directed Acyclic Graphs
• Identify what missing data is and how to handle it
• Test a hypothesis using a sample case study

## Student Requirements

Before this lesson(s), students should already be able to:

• Perform basic data analysis in Pandas
• Have a basic understanding of bias, variance, and correlation
• Create basic visualizations in Seaborn
• Have some exposure to major considerations within experimental design

## Lesson Outline

TOTAL (170 min)

• Data Source (10 min)
• What are the features/covariates/predictors?
• What is the outcome/response?
• What do you think each row in the dataset represents?
• Math review (40 min)
• Covariance (15 min)
• Correlation (10 min)
• The variance-covariance matrix (15 min)
• Causation and Correlation (10 min)
• Structure of causal claims
• Why do we care?
• How do we determine if something is causal?
• Pearlean Causal DAG model (15 min)
• What is a DAG?
• It's possible that X causes Y.
• Y causes X.
• The correlation between X and Y is not statistically significant.
• X or Y may cause one or the other indirectly through another variable.
• There is a third common factor that causes both X and Y.
• Both X and Y cause a third variable and the dataset does not represent that third variable evenly.
• Controlled Experiments
• When is it OK to rely on association?
• How does association relate to causation?
• Sampling bias (15 min)
• Forms of sampling bias
• Problems from sampling bias
• Recovering from sampling bias
• Stratified random sampling
• Missing data (20 min)
• Types of missing data
• De minimis
• Class imbalance
• Relation to machine learning
• Introduction to Hypothesis Testing (20 min)