This week in class we went over some basic statistics, learned some Python programming concepts, and also learned how to navigate files, packages, and libraries using the command line. Great start! At this point you should be chomping at the bit to do some Data Science. If so, good - because it's time for Project 1!
For our first project, we're going to take a look at SAT scores around the United States. We'll be exploring this data to see what we can learn using the descriptive statistics skills covered this week. Your client, the College Board, is expecting some pretty graphs to add to their presentations this year, so don't let them down!
Goal: A Jupyter notebook that describes your data with visualizations & statistical analysis.
Your work must:
Describe your data
Perform methods of exploratory data analysis, including:
- Use Matplotlib & Tableau to create visualizations
- Use NumPy to apply basic summary statistics: mean, median, mode
Determine if dataset appears to follow a normal distribution
Recreate all of your MatPlotLib graphs in Seaborn!
Create a blog post of at least 500 words (and 1-2 graphics!) describing your data, analysis, and approach. Link to it in your Jupyter notebook.
Necessary Deliverables / Submission
- Materials must be submitted in a clearly labeled Jupyter notebook.
- Materials must be submitted via a Github PR to the instructor's repo.
- Materials must be submitted by the end of Week 1.
For this project we will be using a Jupyter notebook. This notebook will use matplotlib for plotting and visualizing our data. This type of visualization is handy for prototyping and quick data analysis. We will discuss more advanced data visualizations for disseminating your work.
Open the starter code instructions in a Jupyter notebook.
Instructor Note: The solution code is linked here
This data, taken from the College Board, gives the mean SAT math and verbal scores, and the participation rate for each state and the District of Columbia for the year 2001.
Suggested Ways to Get Started
- Read in your dataset
- Try out a few NumPy commands to describe your data
- Write pseudocode before you write actual code. Thinking through the logic of something helps.
- Read the docs for whatever technologies you use. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success!
- Document everything.
Project Feedback + Evaluation
Your instructors will score each of your technical requirements using the scale below:
Score | Expectations ----- | ------------ **0** | _Incomplete._ **1** | _Does not meet expectations._ **2** | _Meets expectations, good job!_ **3** | _Exceeds expectations, you wonderful creature, you!_
This will serve as a helpful overall gauge of whether you met the project goals, but the more important scores are the individual ones above, which can help you identify where to focus your efforts for the next project!