Main repo for the DAT24 part time data science course
Clone or download
GADS-BOH Lecture 20
Materials for Lecture 20
Latest commit d46774b Aug 22, 2018


Welcome to Data Science

  1. Welcome
  2. Your Team
  3. Course Overview
  4. Course Schedule
  5. Projects
  6. Tech Requirements
  7. Classroom Tools
  8. Student Expectations
  9. Office Hours

Course Overview

Welcome to the part time Data Science course at General Assembly! We are building a global community of lifelong learners who are excited about using data to solve real world problems.

In this program, we will use Python to explore datasets, build predictive models, and communicate data driven insights. Specifically, you will learn how to:

  • Define many of the approaches and considerations that data scientists use to solve real world problems.
  • Perform exploratory data analysis with powerful programmatic tools in Python.
  • Build and refine basic machine learning models to predict patterns from data sets.
  • Communicate data driven insights to peers and stakeholders in order to inform business decisions.

What You Will Learn

  • Statistical Analysis with Python:
  • Perform visual and statistical analysis on data using Python and its associated libraries and tools.
  • Data-Driven Decision-Making:
  • Define and determine the trade-offs involving feature selection, model accuracy, and data quality.
  • Machine Learning & Modeling Techniques:
  • Explore supervised learning techniques, inlcuding classification, regression, and decision trees.
  • Visualizations & Presentations:
  • Create visualizations and interactive notebooks to present to industry stakeholders.

Python Version

The curriculum materials for this course are written in Python 3.6.

Your Instructional Team

Instructor: Brian O'Halloran

Assistants: Anna Jones & Jasmine Pengelly

Curriculum Structure

The course is organised into four units.

Unit Title Topics Covered Length
1 Data Foundations Python Syntax, Development Environment Sessions 1-4
2 Working with Data Stats Review, Visualisation, & EDA Sessions 5-9
3 Data Science Modeling Regression, Classification, Model Evaluation Sessions 10-14
4 Data Science Applications Time Series, NLP, Clustering, Wrap-up Sessions 15-20

Lesson Schedule

Here is the schedule we will be following for our part time data science course:

Session Number Session Unit Number Date
01 Welcome to Data Science Unit 1 18/06/2018
02 Python Foundations Unit 1 20/06/2018
03 Managing Data Unit 1 25/06/2018
04 Project Workshop: Unit Project 1 Unit 1 27/06/2018
--- --- --- ---
05 Exploratory Data Analysis in Pandas Unit 2 02/07/2018
06 Data Visualisation in Python Unit 2 04/07/2018
07 Statistics in Python Unit 2 09/07/2018
08 Experiments & Hypothesis Testing Unit 2 11/07/2018
09 Project Lightning Talks, Project Workshop, Unit Project 2 Unit 2 16/07/2018
--- --- --- ---
10 Introduction to Regression Unit 3 18/07/2018
11 Evaluating Machine Learning Models Unit 3 23/07/2018
12 Introduction to Classification Unit 3 25/07/2018
13 Logistic Regression Unit 3 30/07/2018
14 Project Workshop, Unit Project 3 (EDA brief due) Unit 3 01/08/2018
--- --- ---
15 Clustering Unit 4 06/08/2018
16 Decision Trees & Random Forests Unit 3 08/08/2018
17 Intro to Time Series Unit 4 13/08/2018
18 Intro to Natural Language Processing Unit 4 15/08/2018
19 Wrap up & Project Workshop Unit 4 20/08/2018
20 Final Project Presentations & Tech Report Due Unit 4 22/08/2018

Project Structure

This course will ask you to complete a series of projects in order to practice and apply the skills covered in-class.

Unit Projects

At the end of each Unit, you'll work on short structured projects. These activities will test your understanding of that unit’s most important concepts with in-class practice and instructor support.

For those of you who want to go above and beyond, we’ve also included stretch options, bonus activities, and other opportunities for further reading and practice.

Final Project

You'll also complete a final project, asking you to apply your skills to a real-world or business problem of your choice.

The capstone is an opportunity for you to demonstrate your new skills and tackle a pressing issue relevant to your life, industry, or organization. You’ll create a hypothesis, analyze internal data, and generate a working model, prototype, solution, or recommendation.

You will get structured guidance and designated time to work throughout the course. Final project deliverables include:

  • Proposal: Describe your chosen problem and identify relevant data sets (confirming access, as needed).
  • Brief: Share a summary of your initial analysis and your next steps with your instructional team.
  • Report: Submit a cleanly formatted Jupyter notebook (or other files) documenting your code and process for technical/peer stakeholders.
  • Presentation: Present a summary of your business problem, approach, and recommendation to an audience of non-technical executive stakeholders.

Project Breakdown

  1. Project 1: Python Technical Code Challenges
  2. Project 2: Exploratory Data Analysis
  3. Project 3: Modeling Practice
  4. Project 4: Final Project
    • Part 1: Proposal + Dataset
    • Part 2: Initial EDA Brief
    • Part 3: Technical Report
    • Part 4: Presentation

Project Schedule

  • Project 1: Due @ End of Unit 1
  • Project 2: Due @ End of Unit 2
  • Project 3: Due @ End of Unit 3
  • Project 4 (Final):
    • Proposal + Dataset: Due @ End of Unit 2
    • Initial EDA Brief: Due @ End of Unit 3
    • Technical Report: Due @ End of Unit 4
    • Presentation: Due @ End of Unit 4

Technology Requirements


  1. 8GB Ram (at least)
  2. 10GB Free Hard Drive Space (after installing Anaconda)


  1. Download and Install Anaconda with Python 3.6.

Note: Anaconda provides support for two different versions of Python. Make sure to install the "Python 3.6" version.

PC only


  • Google Chrome, Firefox, etc.



We'll use Slack for our class communications platform. Slack is a messaging platform where you can chat with your peers and instructors. We will use Slack to share information about the course, discuss lessons, and submit projects. Our Slack homepage is GA-LDN-DATASCIENCE.

Pro Tip: If you've never used Slack before, check out these resources:


We'll set these together!

Office Hours

Every week, your instructional team will hold office hours where you can get in touch to ask questions about anything relating to the course. This is a great opportunity to follow up on questions or ask for more details about any topics covered so far.

More info in week 1!