Permalink
Find file Copy path
b0a3fc6 Dec 2, 2019
bpoulin-CUNY first push of files
0 contributors

Users who have contributed to this file

79 lines (55 sloc) 3.34 KB

Exploratory Data Analysis in Pandas

Unit 2: Required


Materials We Provide

Topic Description Link
Lesson Pandas for Exploratory Data Analysis (ipynb slides) Here
Solution Completed template from lesson Here
Practice Prompts to practice EDA in Pandas Here
Data for EDA Practice Here
Sample Solutions for EDA Practice Here
Datasets Country/continent/servings of alcohol Here
UFO sighting records Here
Movie & Title Info from IMDB Here
User Info from IMDB Here
Movie & Title Info from IMDB Here

This lesson purposefully uses a large number of datasets. This allows students to practice opening different types of data files. So, it would be useful to emphasize manually looking at the files to identify the separator and header. Having many datasets available allows us to explore a variety of themes throughout the lesson that might not be present in one dataset alone.

Note: Datasets have 3 types. ".csv" files are separated by commas, ".tsv" by tabs, and ".tbl" by "|" character


Learning Objectives

  • Explain the definition and purpose of Pandas in a data science context
  • Manipulate Pandas DataFrames and Series
  • Filter and sort Pandas data
  • Manipulate DataFrame columns
  • Define how to handle null and missing values

Student Requirements

Before this lesson(s), students should already be able to:

  • Recall and define basic syntax for Python code

Lesson Outline

Instructor Note: Start with the lesson Jupyter slide deck. Next, walk the students through the lab. Periodically stop and let the students try the challenges. The challenges are typically just 1-3 lines of code that are very similar to what was just discussed.

TOTAL: 170 mins

  • What is Pandas (20 mins)
  • Reading Files, Selecting Columns, and Summarizing (15 mins)
    • EXERCISE ONE (15 mins)
  • Filtering and Sorting (15 mins)
    • EXERCISE TWO (15 mins)
  • Renaming, Adding, and Removing Columns (15 mins)
  • Handling Missing Values (15 mins)
    • EXERCISE THREE (15 mins)
  • Split-Apply-Combine (15 mins)
    • EXERCISE FOUR (15 mins)
  • Selecting Multiple Columns and Filtering Rows (10 mins)
  • Joining (Merging) DataFrames (5 mins)
  • OPTIONAL: Other Commonly Used Features
  • OPTIONAL: Other Less Used Features of Pandas
  • Summary

Additional Resources

For more information on this topic, check out the following resources: