No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Exploratory Data Analysis in Python

Unit 2 Project - Due by COB 22rd of DECEMBER (Option 3 can get up to 2 extra weeks extension)

Materials We Provide

Item Description Link
Option 1: IMDB Starter Code Project Prompts and Description Here
Option 1: IMDB Dataset IMDB Dataset Here
--- --- ---
Option 2: Chipotle Starter Code Project Prompts and Description Here
Option 2: Chipotle Dataset Dataset File Here
--- --- ---
Option 3: Analyzing Road Safety Here
Option 3: UK Road Safety Data 2016 Dataset File Here
--- --- ---

Project Objectives

For this project, you will be conducting basic exploratory data analysis, practicing your data analysis skills while becoming comfortable with Python (Pandas not required).

For this project, we have provided three options. Students should choose one of the following options, then complete all of the required sections for the option they've chosen:

Option 1: Best for New Programmers

Using your new python skills, complete a series of guided prompts exploring the top-rated movies on IMDB. IMDB stands for "the Internet Movie Database," an online collection of film information and reviews.

In these exercises, students will be looking to answer such questions as:

  • What is the average rating per genre?
  • How many different actors are in a movie?

The IMDB dataset provided is created from data scraped from the [Internet Movie Database website]( The dataset describes top ranking movies, including: title, data, duration, content rating, headlining actors, and ranking.

Option 2: Best for Intermediate Programmers

Using python, conduct some exploratory data analysis on Chipotle's order data. You will be looking to answer such questions as:

  • How many orders are being made?
  • What is the average price per order?
  • How many different ingredients?

The Chipotle data set is taken from "The Upshot" column in The New York Times. It was chosen because the data is from a familiar source representing real world consumer transaction data - plus their guacamole is delicious.

This dataset was analyzed in-depth by data scientists from the New York Times. We have modified our questions based on their analysis, but we encourage students not to review their analysis until after they have made their own attempt.

Option 3: Best for Skilled Programmers

Using python, conduct some exploratory data analysis on UK Road Safety Data. You will be looking to answer such questions as:

  • What makes accidents more likely and why?
  • Who are safer drivers: women or men?
  • When are severe accident are more likely to occur?

This task encourages to analyse relationships between many variables and allows to practice advanced Pandas functionality, like joining tables.

Project Requirements

In a Jupyter Notebook, create working solutions for all of the required questions for the Option you've chosen. Your notebook should include:

  1. Text for each question, copy and pasted from the starter-code provided.

  2. A working solution to each problem.

    • Do not include test, practice, or broken code (unless you were unable to create a working solution).
  3. Comments for all of your code.

    • In your comments, describe any assumptions you made in order to solve these problems.
  4. Optional: After completing the required portions, try your hand at the other option or complete the bonus sections for an additional challenge!


For all projects, requirements will be evaluated on a simple point scale of 0, 1, or 2. Additionally, instructors will provide you with feedback on required portions of your project.

Score Expectations
0 Incomplete.
1 Does not meet expectations.
2 Meets expectations, good job!
3 Surpasses our wildest expectations!
4 Completed Option 3!

Note: Scores of 2 mean that a requirement has been completely fulfilled, while 3 is typically reserved for bonus objectives.


  • Forking the project repository, adding your solutions, and sending instructor a link to your repository.