Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
assets
2.4.1-apply-cleaning-addendum.ipynb
readme.md

readme.md

title type duration creator
Data Cleaning Lab
lab
1:5
name city
David Yerrington
SF

Data Cleaning Lab

By this point, you should be comfortable working with Python, and begining to familliarize yourself with Pandas. For this exersize, you will be building your data analysis pipeline from the ground up. In your own local git repo, start by creating a new notebook using the "dsi" kernel.

Requirements: Rock Songs

Create a new notebook

  • Load the dataset from the datasets folder labeled datasets/rock_songs
  • Rename the columns to something more descriptive
  • Print the summary stats: mean, median, mode, std, variance, range
    • Release Year
    • First
    • Year
    • Play Count
    • F*G
  • Clean / convert / normalize any problematic variables that can't be summarized.
Questions

1. What are the top 20 most popular songs by plays?

2. Which years have the most plays?

3. Which records don't have matching "Play Count" cooresponding to "F*G"?

**Bonus: Which artists have the most misssing values between each of the variables? **

Bonus Requirements: Highest Paid Athletes

In your existing notebook

  • Load the dataset from the datasets folder for highest_paid_athletes
  • Rename any columns
  • Print the summary stats: mean, median, mode, std, variance, range
    • Total Pay
    • Salary/Winnings
    • Endorsements
    • Birthdates
    • Height
  • Clean / convert / normalize any problematic variables that can't be summarized.
Bonus Questions

1. Who's the shortest, highest paid athlete?

**2. This is America! Can you please convert the heights to floats in "feet" units?!

3. Who's making more on endorsements than salary?

4. For #3, Is this more common in some sports more than others?

**5. Which country makes the most and least in endorements? **

**6. Which are the top sports by country/nation? **