Permalink
Browse files

Update materials for Intuit

  • Loading branch information...
danielwilhelm committed Sep 26, 2018
1 parent e2628c9 commit 4fa793bc1a49b431c7c5341c781b517a8e195b4b
Showing with 216 additions and 163 deletions.
  1. +133 −163 README.md
  2. BIN ds-installation-guide.pdf
  3. +83 −0 git-instructions.md
296 README.md
@@ -1,120 +1,121 @@
# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Welcome to Data Science
# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Intuit + GA: Data Science Course

> Welcome to Data Science!
1. [Master Schedule](#master)
1. [Welcome](#welcome)
1. [Your Team](#team)
1. [Course Overview](#course)
1. [Course Schedule](#schedule)
1. [Projects](#projects)
1. [Tech Requirements](#tech)
1. [Classroom Tools](#slack)
1. [Student Expectations](#expectations)
1. [Office Hours](#hours)
1. [Student Feedback](#feedback)
1. [Python Practice Resources](#practice)

<a id='master'></a>
## Master Schedule

Fill this out after each class: [Post-class exit ticket!](https://docs.google.com/forms/d/e/1FAIpQLSdR1m4bKgu0spfAax__O2WfL3Jftgf0By2NyhBolF6D08EpHw/viewform?entry.670756229=INT004-SD-DS-1)

Office Hours:
+ Fridays 12-1pm (in classroom)
+ Wednesdays 7-8pm (in Slack)


Week | Friday Date | Class Topic
--- | --- | ---
- | - | **Unit 1: Fundamentals**
1 | Sept. 28 | [Welcome to Data Science][1-W1]<br>**ISL**: Ch. 2.1
2 | Oct. 5 | [Your Development Environment][1-W2]
3 | Oct. 12 | [Python Foundations][1-W3]
4 | Oct. 19 | FLEX - Recommendation Systems?<br>**Milestone:** [Unit 1 Project DUE][P1]
- | - | **Unit 2: Working with Data**
5 | Oct. 26 | [Exploratory Data Analysis in Pandas][2-W1]
6 | Nov. 2 | [Experiments & Hypothesis Testing][2-W2]
7 | Nov. 9 | [Data Visualization in Python][2-W3]
8 | Nov. 16 | [Statistics in Python][2-W4]
- | Nov. 23 | No Class - Thanksgiving
9 | Nov. 30 | FLEX - Feature Engineering?<br>**Milestone:** [Final Project: Project Proposal DUE][P4] <br>**Milestone:** [Unit 2 Project DUE][P2]
- | - | **Unit 3: Data Science Modeling**
10 | Dec. 7 | [Linear Regression][3-W1]<br>**ISL**: Ch. 3.1-3, 3.5, 6.1, 6.2
11 | Dec. 14 | [Train-Test Split & Bias-Variance][3-W2]<br>**ISL**: Ch. 2.2
12 | Dec. 21 | [KNN / Classification][3-W3]
- | Dec. 28 | No Class - Holiday Break
14 | Jan. 4 | [Logistic Regression][3-W4]<br>**ISL**: Ch. 4.1-3
15 | Jan. 11 | FLEX - Random Forests, Gradient Boosting?<br>**Milestone:** [Final Project: Initial EDA DUE][2-1C]<br>**Milestone:** [Unit 3 Project DUE][P3]
- | - | **Unit 4: Data Science Applications**
16 | Jan. 18 | [Unsupervised Learning (K-Means, Hierarchical)][4-W1]
17 | Jan. 25 | [PCA & Anomaly Detection][4-W2]
18 | Feb. 1 | [Intro to Natural Language Processing][4-W3]
19 | Feb. 8 | [Intro to Time Series][4-W4]
20 | Feb. 15 | [Intro to Neural Networks][4-W5]<br>**Milestone:** [Final Project: Notebook Progress DUE][2-1C]
- | - | **Capstone Project**
21 | Feb. 22 | Capstone Preparation (Office Hours only)
22 | Mar. 1 | Capstone Preparation (Office Hours only)
23 | Mar. 8 | Capstone Preparation (Office Hours only)
24 | Mar. 15 | Capstone Preparation (Office Hours only)
25 | Mar. 22 | [Final Project Presentations][P5]<br>**Milestone:** [Final Project Presentation DUE][P5]


**ISL**: James, Gareth et al. "An Introduction to Statistical Learning." [[PDF](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf)]

[1-W1]: https://git.generalassemb.ly/danwilhelm/welcome-to-data-science
[1-W2]: https://git.generalassemb.ly/danwilhelm/your-development-environment
[1-W3]: https://git.generalassemb.ly/danwilhelm/python-foundations
[1-W4]: https://git.generalassemb.ly/danwilhelm/flex1

[2-W1]: https://git.generalassemb.ly/danwilhelm/exploratory-data-analysis
[2-W2]: https://git.generalassemb.ly/danwilhelm/experiments-hypothesis-tests
[2-W3]: https://git.generalassemb.ly/danwilhelm/visualizations
[2-W4]: https://git.generalassemb.ly/danwilhelm/statistics-in-python
[2-W5]: https://git.generalassemb.ly/danwilhelm/flex2

[3-W1]: https://git.generalassemb.ly/danwilhelm/linear-regression
[3-W2]: https://git.generalassemb.ly/danwilhelm/train-test-split-and-bias-variance
[3-W3]: https://git.generalassemb.ly/danwilhelm/knn-classification
[3-W4]: https://git.generalassemb.ly/danwilhelm/logistic-regression
[3-W5]: https://git.generalassemb.ly/danwilhelm/flex3

[4-W1]: https://git.generalassemb.ly/danwilhelm/flex_unsupervised
[4-W2]: https://git.generalassemb.ly/danwilhelm/flex_pca
[4-W3]: https://git.generalassemb.ly/danwilhelm/natural-language-processing
[4-W4]: https://git.generalassemb.ly/danwilhelm/flex_time-series
[4-W5]: https://git.generalassemb.ly/danwilhelm/flex_neural-networks

---

<a id='welcome'></a>
## Course Overview
Welcome to the part time Data Science course at General Assembly! We are building a global community of lifelong learners who are excited about using data to solve real world problems.
Welcome to our Data Science Fundamentals course! We're building a global community of lifelong learners who are excited about using data to solve real business problems.

In this program, we will use Python to explore datasets, build predictive models, and communicate data driven insights. Specifically, you will learn how to:
In this program, we will learn to use Python programming to explore datasets, build regression models, and communicate data driven insights. Specifically, you will learn how to:

- Define many of the approaches and considerations that data scientists use to solve real world problems.
- Define common approaches and considerations that data scientists use to solve real world problems.
- Perform exploratory data analysis with powerful programmatic tools in Python.
- Build and refine basic machine learning models to predict patterns from data sets.
- Build and refine basic regression and time series models to predict patterns from data sets.
- Communicate data driven insights to peers and stakeholders in order to inform business decisions.


### What You Will Learn

- **Statistical Analysis with Python**:
**Statistical Analysis with Python**
- Perform visual and statistical analysis on data using Python and its associated libraries and tools.
- **Data-Driven Decision-Making**:
**Data-Driven Decision-Making**
- Define and determine the trade-offs involving feature selection, model accuracy, and data quality.
- **Machine Learning & Modeling Techniques**:
- Explore supervised learning techniques, inlcuding classification, regression, and decision trees.
- **Visualizations & Presentations**:
- Create visualizations and interactive notebooks to present to industry stakeholders.
**Data Science Modeling Techniques**
- Explore supervised learning techniques, focusing primarily on linear and logistic regression.
**Visualizations & Presentations**
- Create visualizations and interactive notebooks to present to business stakeholders.


### Python Version
The curriculum materials for this course are written in Python 3.6.

---
<a id='team'></a>
## Your Instructional Team

**Instructor**: [X](X)

**Assistant**: [X](X)

---

<a id='course'></a>
## Curriculum Structure

General Assembly's Data Science part time materials are organized into **four** units.
**Instructor**:
+ Dan Wilhelm - dan.wilhelm@generalassemb.ly

| Unit | Title | Topics Covered | Length |
| --- | --- | --- | --- |
| Unit 1 | Data Foundations | Python Syntax, Development Environment | Lessons 1-4 |
| Unit 2 | Working with Data | Stats Review, Visualization, & EDA | Lessons 5-9 |
| Unit 3 | Data Science Modeling | Regression, Classification, & KNN | Lessons 10-14 |
| Unit 4 | Data Science Applications | Decision Trees, NLP, & Flex Topics | Lessons 15-19 |

---


<a id='schedule'></a>
## Lesson Schedule

Here is the schedule we will be following for our part time data science course:

Lesson | Unit Number | Session Number |
--- | --- | --- |
[Welcome to Data Science][1-1A] | Unit 1 | Session 1 |
[Your Development Environment][1-1B] | Unit 1 | Session 2 |
[Python Foundations][1-1C] | Unit 1 | Session 3 |
FLEX: Project Workshop + Presentations | Unit 1 | Session 4 |
--- | --- | --- |
[Exploratory Data Analysis in Pandas][1-1E] | Unit 2 | Session 5 |
[Experiments & Hypothesis Testing][1-1F] | Unit 2 | Session 6 |
[Data Visualization in Python][1-1G] | Unit 2 | Session 7 |
[Statistics in Python][1-1H] | Unit 2 | Session 8 |
FLEX: Project Workshop + Presentations | Unit 2 | Session 9 |
--- | --- | --- |
[Linear Regression][1-1J] | Unit 3 | Session 10 |
[Train-Test Split & Bias-Variance][1-1K] | Unit 3 | Session 11 |
[KNN / Classification][1-1L] | Unit 3 | Session 12 |
[Logistic Regression][1-1M] | Unit 3 | Session 13 |
FLEX: Project Workshop + Presentations | Unit 3 | Session 14 |
--- | --- | --- |
[Working With Data: APIs][1-1O] | Unit 4 | Session 15 |
[Intro to Natural Language Processing][1-1P] | Unit 4 | Session 16 |
[Intro to Time Series][1-1Q] | Unit 4 | Session 17 |
FLEX: Instructor Choice | Unit 4 | Session 18 |
FLEX: Review + Project Workshop | Unit 4 | Session 19 |
[Final Project Presentations][1-1T] | Unit 4 | Session 20 |

[1-1A]: https://git.generalassemb.ly/data-part-time/welcome-to-data-science
[1-1B]: https://git.generalassemb.ly/data-part-time/your-development-environment
[1-1C]: https://git.generalassemb.ly/data-part-time/python-foundations

[1-1E]: https://git.generalassemb.ly/data-part-time/exploratory-data-analysis
[1-1F]: https://git.generalassemb.ly/data-part-time/experiments-hypothesis-tests
[1-1G]: https://git.generalassemb.ly/data-part-time/visualizations
[1-1H]: https://git.generalassemb.ly/data-part-time/statistics-in-python

[1-1J]: https://git.generalassemb.ly/data-part-time/linear-regression
[1-1K]: https://git.generalassemb.ly/data-part-time/train-test-split-and-bias-variance
[1-1L]: https://git.generalassemb.ly/data-part-time/knn-classification
[1-1M]: https://git.generalassemb.ly/data-part-time/logistic-regression

[1-1O]: https://git.generalassemb.ly/data-part-time/getting-data-APIs
[1-1P]: https://git.generalassemb.ly/data-part-time/natural-language-processing
[1-1Q]: https://git.generalassemb.ly/data-part-time/flex_time-series

[1-1T]: https://git.generalassemb.ly/data-part-time/unit-4_project
**Assistants**:
+ Dave Doerner - davedoerner@gmail.com
+ Gino DeFalco - ginodefalco@gmail.com

---

@@ -124,117 +125,86 @@ FLEX: Review + Project Workshop | Unit 4 | Session 19 |
This course will ask you to complete a series of projects in order to practice and apply the skills covered in-class.

### Unit Projects
At the end of each Unit, you'll work on short structured projects. These activities will test your understanding of that unit’s most important concepts with in-class practice and instructor support.
At the end of each unit, you'll work on short structured projects. These activities will test your understanding of each unit’s most important concepts with in-class practice and instructor support.

For those of you who want to go above and beyond, we’ve also included stretch options, bonus activities, and other opportunities for further reading and practice.

1. [Project 1](https://git.generalassemb.ly/danwilhelm/unit-1_project): Use Python to perform exploratory data analysis on sample movie data in order to answer a series of guided questions.
* **Bonus**: Try some optional programming challenges modeled after common technical interview questions.
2. [Project 2](https://git.generalassemb.ly/danwilhelm/unit-2_project): Apply linear regression in Python to sample housing data in order to answer a series of guided questions.
* **Bonus**: Practice using K-Folds and alternate libraries to extend your learning and compare different modeling approaches.
2. [Project 3](https://git.generalassemb.ly/danwilhelm/unit-3_project): Apply linear regression in Python to sample housing data in order to answer a series of guided questions.
* **Bonus**: Practice using K-Folds and alternate libraries to extend your learning and compare different modeling approaches.

### Final Project

You'll also complete a [final project](https://git.generalassemb.ly/data-part-time/unit-4_project), asking you to apply your skills to a real-world or business problem of your choice.
You'll also complete a [final project](https://git.generalassemb.ly/danwilhelm/unit-4_final-project), asking you to apply your skills to a business problem of your choice.

The capstone is an opportunity for you to demonstrate your new skills and tackle a pressing issue relevant to your life, industry, or organization. You’ll create a hypothesis, analyze internal data, and generate a working model, prototype, solution, or recommendation.
The capstone is an opportunity for you to demonstrate your new skills and tackle a pressing issue relevant to your team, division, or organization. You’ll create a hypothesis, analyze internal data, and generate a working model, prototype, solution, or recommendation.

You will get structured guidance and designated time to work throughout the course. Final project deliverables include:

- **Proposal**: Describe your chosen problem and identify relevant data sets (confirming access, as needed).
- **Brief**: Share a summary of your initial analysis and your next steps with your instructional team.
- **Proposal**: Describe your chosen problem and identify relevant data (while confirming you have access).
- **Brief**: Share a summary of your initial analysis and next steps in order to get assistance from your instructional team.
- **Report**: Submit a cleanly formatted Jupyter notebook (or other files) documenting your code and process for technical/peer stakeholders.
- **Presentation**: Present a summary of your business problem, approach, and recommendation to an audience of non-technical executive stakeholders.

---

### Project Breakdown

1. [Project 1: Python Technical Code Challenges][2-1A]
2. [Project 2: Exploratory Data Analysis][2-1B]
3. [Project 3: Modeling Practice][2-1C]
4. [Project 4: Final Project][2-1D]
- Part 1: Proposal + Dataset
- Part 2: Initial EDA Brief
- Part 3: Technical Report
1. [Unit Project 1: Python Coding][P1]
2. [Unit Project 2: Exploratory Data Analysis][P2]
3. [Unit Project 3: Modeling][P3]
4. [Final Project: Solve an Intuit Business Problem][P4]
- Part 1: Proposal & Dataset
- Part 2: Initial EDA
- Part 3: Solution Prototype
- Part 4: Presentation

[2-1A]: https://git.generalassemb.ly/data-part-time/unit-1_project
[2-1B]: https://git.generalassemb.ly/data-part-time/unit-2_project
[2-1C]: https://git.generalassemb.ly/data-part-time/unit-3_project
[2-1D]: https://git.generalassemb.ly/data-part-time/unit-4_project

---

### Project Schedule

- Project 1: Due @ End of Unit 1
- Project 2: Due @ End of Unit 2
- Project 3: Due @ End of Unit 3
- Project 4 (Final):
- Proposal + Dataset: Due @ End of Unit 2
- Initial EDA Brief: Due @ End of Unit 3
- Technical Report: Due @ End of Unit 4
- Presentation: Due @ End of Unit 4
[P1]: https://git.generalassemb.ly/danwilhelm/unit-1_project
[P2]: https://git.generalassemb.ly/danwilhelm/unit-2_project
[P3]: https://git.generalassemb.ly/danwilhelm/unit-3_project
[P4]: https://git.generalassemb.ly/danwilhelm/unit-4_final-project

---

<a id='tech'></a>
## Technology Requirements

### Hardware
See the [data science installation guide](./ds-installation-guide.pdf).

1. 8GB Ram (at least)
2. 10GB Free Hard Drive Space (after installing Anaconda)

### Software

1. Download and Install [Anaconda with Python 3.6](https://www.continuum.io/downloads).

> Note: Anaconda provides support for two different versions of Python. Make sure to install the "Python 3.6" version.

**PC only**
- Install [Git Bash](https://git-for-windows.github.io/)

### Browser
- Google Chrome

### Miscellaneous
- Text editor (we recommend [Atom](https://atom.io))
---

<a id='slack'></a>
## <img src="https://lh3.googleusercontent.com/CzlsZP3xUHeX3HAGdZ2rL9mK6_C-6T1-YWeBeM8nB3ilmfPSBHCFx4-UbQr8MnQms3d9=w300" width="25px"> Slack

We'll use Slack for our class communications platform. Slack is a messaging platform where you can chat with your peers and instructors. We will use Slack to share information about the course, discuss lessons, and submit projects. Our Slack homepage is [X](x).

**Pro Tip**: If you've never used Slack before, check out these resources:
- [Intro to Slack](https://www.youtube.com/watch?v=9RJZMSsH7-g)
- [Slack Basics and Shortcuts](https://get.slack.help/hc/en-us/articles/217626358-Cheat-sheet-for-basics-and-shortcuts)
- [The Ultimate Slack Cheatsheet](https://chartmogul.attach.io/EyoxcOGL)
Please note: the curriculum materials for this course are written in Python 3.6.

---

<a id='expectations'></a>
## Expectations

[Add specific local market attendance, student policy, and parking expectations here]
1. Be on time
2. Be willing to ask questions
3. Be willing to ask stupid questions
4. This is not a competition, we are all trying to climb our own mountain with the shared goal of learning
5. Grow first by asking yourself "have I done all I can to answer this question - am I truly stuck?" before asking others. And when you do ask others, have them shepard you to your own conclusiong not just give you the answer. This will help the teaching student and you in the long run
6. You get out what you put in - no one else can do it for you, you will learn as much as you are willing.
7. Be humble - no one here is an expert at everything

---

<a id='hours'></a>
## Office Hours
Every week, your instructional team will hold office hours where you can get in touch to ask questions about anything relating to the course. This is a *great opportunity* to follow up on questions or ask for more details about any topics covered so far.

* Instructor's Office Hours - Day, Time (or by Appointment)
* Assistant's Office Hours - Day, Time (or by Appointment)

Slack us or post in our #officehours channel to reserve a time-slot!
## Road to Success
The emotional cycle of change: This course is fast and covers a lot of material. There will be times when you may feel discouraged or overwhelmed, but don't give up - this is natural (and part of the design). By the end of the course, you'll feel more confident in your ability to define problems, analyze data, and prototype solutions.
Student learning responsibility: Our lessons cover topic foundations, but there is always more to learn! You are responsible for your learning experience - but don't get overwhelmed! Instead, just make sure you follow along, practice as much as possible, and ask questions.
GA requirements: Show up. Be on time. Participate. Submit your projects. Allow yourself to struggle. Read the docs. Have fun!
Q/A.

---

<a id='feedback'></a>
## Student Feedback

Throughout the course, you'll be asked to provide feedback about your experience. This feedback is extremely important, as it helps us provide you with a better learning experience.
<a id='practice'></a>
## Python Practice Resources

[Insert specific VTS/Exit Ticket details here]
If you have been enjoying the practice problems at the beginning of some classes, here are some great resources to find a library of more:

---
- https://www.hackerrank.com/
- https://www.codewars.com/
- https://www.coderbyte.com/
- https://www.codefights.com/
Binary file not shown.
Oops, something went wrong.

0 comments on commit 4fa793b

Please sign in to comment.