Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Project 1: SAT Scores + Summary Statistics


This week in class we went over some basic statistics, learned some Python programming concepts, and also learned how to navigate files, packages, and libraries using the command line. Great start! At this point you should be chomping at the bit to do some Data Science. If so, good - because it's time for Project 1!

For our first project, we're going to take a look at SAT scores around the United States. We'll be exploring this data to see what we can learn using the descriptive statistics skills covered this week. Your client, the College Board, is expecting some pretty graphs to add to their presentations this year, so don't let them down!

Goal: A Jupyter notebook that describes your data with visualizations & statistical analysis.


Your work must:

  • Describe your data

  • Perform methods of exploratory data analysis, including:

    • Use Matplotlib & Tableau to create visualizations
    • Use NumPy to apply basic summary statistics: mean, median, mode
  • Determine if dataset appears to follow a normal distribution

  • Bonus:

  • Recreate all of your MatPlotLib graphs in Seaborn!

  • Create a blog post of at least 500 words (and 1-2 graphics!) describing your data, analysis, and approach. Link to it in your Jupyter notebook.

Necessary Deliverables / Submission

  • Materials must be submitted in a clearly labeled Jupyter notebook.
  • Materials must be submitted via a Github PR to the instructor's repo.
  • Materials must be submitted by the end of Week 1.

Starter code

For this project we will be using a Jupyter notebook. This notebook will use matplotlib for plotting and visualizing our data. This type of visualization is handy for prototyping and quick data analysis. We will discuss more advanced data visualizations for disseminating your work.

Open the starter code instructions in a Jupyter notebook.

Instructor Note: The solution code is linked here


This data, taken from the College Board, gives the mean SAT math and verbal scores, and the participation rate for each state and the District of Columbia for the year 2001.

Suggested Ways to Get Started

  • Read in your dataset
  • Try out a few NumPy commands to describe your data
  • Write pseudocode before you write actual code. Thinking through the logic of something helps.
  • Read the docs for whatever technologies you use. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success!
  • Document everything.

Useful Resources

Project Feedback + Evaluation

Attached here is a complete rubric for this project.

Your instructors will score each of your technical requirements using the scale below:

Score | Expectations
----- | ------------
**0** | _Incomplete._
**1** | _Does not meet expectations._
**2** | _Meets expectations, good job!_
**3** | _Exceeds expectations, you wonderful creature, you!_

This will serve as a helpful overall gauge of whether you met the project goals, but the more important scores are the individual ones above, which can help you identify where to focus your efforts for the next project!