Exploratory Data Analysis in Pandas
Unit 2: Required
Materials We Provide
Topic | Description | Link |
---|---|---|
Lesson | Pandas for Exploratory Data Analysis (ipynb slides) | Here |
Solution | Completed template from lesson | Here |
Practice | Prompts to practice EDA in Pandas | Here |
Data for EDA Practice | Here | |
Sample Solutions for EDA Practice | Here | |
Datasets | Country/continent/servings of alcohol | Here |
UFO sighting records | Here | |
Movie & Title Info from IMDB | Here | |
User Info from IMDB | Here | |
Movie & Title Info from IMDB | Here |
This lesson purposefully uses a large number of datasets. This allows students to practice opening different types of data files. So, it would be useful to emphasize manually looking at the files to identify the separator and header. Having many datasets available allows us to explore a variety of themes throughout the lesson that might not be present in one dataset alone.
Note: Datasets have 3 types. ".csv" files are separated by commas, ".tsv" by tabs, and ".tbl" by "|" character
Learning Objectives
- Explain the definition and purpose of Pandas in a data science context
- Manipulate Pandas DataFrames and Series
- Filter and sort Pandas data
- Manipulate DataFrame columns
- Define how to handle null and missing values
Student Requirements
Before this lesson(s), students should already be able to:
- Recall and define basic syntax for Python code
Lesson Outline
Instructor Note: Start with the lesson Jupyter slide deck. Next, walk the students through the lab. Periodically stop and let the students try the challenges. The challenges are typically just 1-3 lines of code that are very similar to what was just discussed.
TOTAL: 170 mins
- What is Pandas (20 mins)
- Reading Files, Selecting Columns, and Summarizing (15 mins)
- EXERCISE ONE (15 mins)
- Filtering and Sorting (15 mins)
- EXERCISE TWO (15 mins)
- Renaming, Adding, and Removing Columns (15 mins)
- Handling Missing Values (15 mins)
- EXERCISE THREE (15 mins)
- Split-Apply-Combine (15 mins)
- EXERCISE FOUR (15 mins)
- Selecting Multiple Columns and Filtering Rows (10 mins)
- Joining (Merging) DataFrames (5 mins)
- OPTIONAL: Other Commonly Used Features
- OPTIONAL: Other Less Used Features of Pandas
- Summary
Additional Resources
For more information on this topic, check out the following resources: