No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
dima more on hypothesis tests
includes a practice exercise
Latest commit 7cc5e92 Dec 11, 2018
Failed to load latest commit information.
assets base lesson Dec 6, 2018
practice more on hypothesis tests Dec 11, 2018
.gitignore more on hypothesis tests Dec 11, 2018 base lesson Dec 6, 2018
experiments-hypothesis-tests.ipynb more on hypothesis tests Dec 11, 2018

Experiments and Hypothesis Testing

Unit 2, Required

Materials We Provide

Topic Description Link
Lesson Experiments and Hypothesis Testing Lesson Here
Solutions Sample Solutions for Lesson Sections Here
Practice Individual Practice Activity (includes data and sample solutions) Here
Extra Materials French Fry Study Here

In this lesson, we'll use an online CSV file about advertising data, taken from the book "An Introduction to Statistical Learning". This dataset is easy to understand. It allows students to easily compare sales data across three advertising mediums.

Learning Objectives

By the end of this lesson, students will be able to:

  • Explain the difference between causation and correlation
  • Determine causality and sampling bias using Directed Acyclic Graphs
  • Identify what missing data is and how to handle it
  • Test a hypothesis using a sample case study

Student Requirements

Before this lesson(s), students should already be able to:

  • Perform basic data analysis in Pandas
  • Have a basic understanding of bias, variance, and correlation
  • Create basic visualizations in Seaborn
  • Have some exposure to major considerations within experimental design

Lesson Outline

TOTAL (170 min)

  • Data Source (10 min)
    • What are the features/covariates/predictors?
    • What is the outcome/response?
    • What do you think each row in the dataset represents?
  • Math review (40 min)
    • Covariance (15 min)
    • Correlation (10 min)
    • The variance-covariance matrix (15 min)
  • Causation and Correlation (10 min)
    • Structure of causal claims
    • Why do we care?
    • How do we determine if something is causal?
  • Pearlean Causal DAG model (15 min)
    • What is a DAG?
    • It's possible that X causes Y.
    • Y causes X.
    • The correlation between X and Y is not statistically significant.
    • X or Y may cause one or the other indirectly through another variable.
    • There is a third common factor that causes both X and Y.
    • Both X and Y cause a third variable and the dataset does not represent that third variable evenly.
    • Controlled Experiments
    • When is it OK to rely on association?
    • How does association relate to causation?
  • Sampling bias (15 min)
    • Forms of sampling bias
    • Problems from sampling bias
    • Recovering from sampling bias
      • Stratified random sampling
  • Missing data (20 min)
    • Types of missing data
    • De minimis
    • Class imbalance
      • Relation to machine learning
  • Introduction to Hypothesis Testing (20 min)
    • Validate your findings
    • Confidence intervals
    • Error types
  • Scenario (40 min)
    • Exercises
    • Statistical Tests
    • Interpret your results

Additional Resources

For more information on this topic, check out the following resources: