Unit 2 Project - Due by COB 22rd of DECEMBER (Option 3 can get up to 2 extra weeks extension)
Materials We Provide
|Option 1: IMDB Starter Code||Project Prompts and Description||Here|
|Option 1: IMDB Dataset||IMDB Dataset||Here|
|Option 2: Chipotle Starter Code||Project Prompts and Description||Here|
|Option 2: Chipotle Dataset||Dataset File||Here|
|Option 3: Analyzing Road Safety||Here|
|Option 3: UK Road Safety Data 2016||Dataset File Here|
For this project, you will be conducting basic exploratory data analysis, practicing your data analysis skills while becoming comfortable with Python (Pandas not required).
For this project, we have provided three options. Students should choose one of the following options, then complete all of the required sections for the option they've chosen:
Option 1: Best for New Programmers
Using your new python skills, complete a series of guided prompts exploring the top-rated movies on IMDB. IMDB stands for "the Internet Movie Database," an online collection of film information and reviews.
In these exercises, students will be looking to answer such questions as:
- What is the average rating per genre?
- How many different actors are in a movie?
The IMDB dataset provided is created from data scraped from the [Internet Movie Database website](https://www.imdb.com. The dataset describes top ranking movies, including: title, data, duration, content rating, headlining actors, and ranking.
Option 2: Best for Intermediate Programmers
Using python, conduct some exploratory data analysis on Chipotle's order data. You will be looking to answer such questions as:
- How many orders are being made?
- What is the average price per order?
- How many different ingredients?
The Chipotle data set is taken from "The Upshot" column in The New York Times. It was chosen because the data is from a familiar source representing real world consumer transaction data - plus their guacamole is delicious.
This dataset was analyzed in-depth by data scientists from the New York Times. We have modified our questions based on their analysis, but we encourage students not to review their analysis until after they have made their own attempt.
Option 3: Best for Skilled Programmers
Using python, conduct some exploratory data analysis on UK Road Safety Data. You will be looking to answer such questions as:
- What makes accidents more likely and why?
- Who are safer drivers: women or men?
- When are severe accident are more likely to occur?
This task encourages to analyse relationships between many variables and allows to practice advanced Pandas functionality, like joining tables.
In a Jupyter Notebook, create working solutions for all of the required questions for the Option you've chosen. Your notebook should include:
Text for each question, copy and pasted from the starter-code provided.
A working solution to each problem.
- Do not include test, practice, or broken code (unless you were unable to create a working solution).
Comments for all of your code.
- In your comments, describe any assumptions you made in order to solve these problems.
Optional: After completing the required portions, try your hand at the other option or complete the bonus sections for an additional challenge!
For all projects, requirements will be evaluated on a simple point scale of 0, 1, or 2. Additionally, instructors will provide you with feedback on required portions of your project.
|1||Does not meet expectations.|
|2||Meets expectations, good job!|
|3||Surpasses our wildest expectations!|
|4||Completed Option 3!|
Note: Scores of
2mean that a requirement has been completely fulfilled, while
3is typically reserved for bonus objectives.
- Forking the project repository, adding your solutions, and sending instructor a link to your repository.