No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Experiments and Hypothesis Testing

Unit 2, Required

Materials We Provide

Topic Description Link
Lesson Experiments and Hypothesis Testing Lesson Here
Solutions Sample Solutions for Lesson Sections Here
Practice Individual Practice Activity (includes data and sample solutions) Here
Extra Materials French Fry Study Here

In this lesson, we'll use an online CSV file about advertising data, taken from the book "An Introduction to Statistical Learning". This dataset is easy to understand. It allows students to easily compare sales data across three advertising mediums.

Learning Objectives

By the end of this lesson, students will be able to:

  • Explain the difference between causation and correlation
  • Determine causality and sampling bias using Directed Acyclic Graphs
  • Identify what missing data is and how to handle it
  • Test a hypothesis using a sample case study

Student Requirements

Before this lesson(s), students should already be able to:

  • Perform basic data analysis in Pandas
  • Have a basic understanding of bias, variance, and correlation
  • Create basic visualizations in Seaborn
  • Have some exposure to major considerations within experimental design

Lesson Outline

TOTAL (170 min)

  • Data Source (10 min)
    • What are the features/covariates/predictors?
    • What is the outcome/response?
    • What do you think each row in the dataset represents?
  • Math review (40 min)
    • Covariance (15 min)
    • Correlation (10 min)
    • The variance-covariance matrix (15 min)
  • Causation and Correlation (10 min)
    • Structure of causal claims
    • Why do we care?
    • How do we determine if something is causal?
  • Pearlean Causal DAG model (15 min)
    • What is a DAG?
    • It's possible that X causes Y.
    • Y causes X.
    • The correlation between X and Y is not statistically significant.
    • X or Y may cause one or the other indirectly through another variable.
    • There is a third common factor that causes both X and Y.
    • Both X and Y cause a third variable and the dataset does not represent that third variable evenly.
    • Controlled Experiments
    • When is it OK to rely on association?
    • How does association relate to causation?
  • Sampling bias (15 min)
    • Forms of sampling bias
    • Problems from sampling bias
    • Recovering from sampling bias
      • Stratified random sampling
  • Missing data (20 min)
    • Types of missing data
    • De minimis
    • Class imbalance
      • Relation to machine learning
  • Introduction to Hypothesis Testing (20 min)
    • Validate your findings
    • Confidence intervals
    • Error types
  • Scenario (40 min)
    • Exercises
    • Statistical Tests
    • Interpret your results

Additional Resources

For more information on this topic, check out the following resources: