Learn about baseline DSI materials and sequence
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
1.0.0-sample-orientation-presentation
1.1.1-command-line
1.1.2-intro-to-git
1.1.3-types-lists-dictionaries
1.1.4-iteration-control-flows-functions
1.2.1-python-control-flow-lab
1.2.2-lab-python-functions
1.2.3-python-iteration
1.2.4-python-movies-lab
1.3.1-list-comprehensions
1.3.2-list-comprehensions
1.3.3-distributions-numpy
1.3.4-lab-distributions-numpy
1.4.1-intro-to-python-visualization/slides
1.4.4-python-movies-lab
1.5.1-week1-review
1.5.2-group-python-activity
10.1.1-github_for_teams
10.1.2-github_for_teams-lab
10.1.3-intro_to_time_series_data
10.1.3-kaggle_competition_1
10.2.3-timeseries-autocorrelation
10.2.4-kaggle-comp-pt2
10.3.1-ARIMA
10.4.1-timeseries-LSTM
10.5.1-nn-groups
11.1.1-intro-to-R
11.2.1-recommendations-engine
2.1.0-combinations-permutations
2.1.1-experimental-design/code
2.1.2-intro-frequentist-stats
2.1.3-intro-to-pandas
2.2.1-pandas-masking-indexing-viz
2.2.2-pandas-eda-lab
2.2.3-intro-to-data-cleaning
2.2.4-lab-cleaning-lab
2.3.1-long-wide-pivot-melt
2.3.2-pivot-melt-lab
2.3.3-grouping-with-pandas-plus-lab
2.4.1-merging-grouping
2.4.2-merge-split-apply-combine
2.4.3-pandas-joins
2.4.4-lab-pandas-joins
2.5.1-weekly-review
2.5.2-EDA-practice
3.1.1-coding-linear-regression-plus-lab
3.1.3-linear-regression-with-statsmodels-and-scikit-learn
3.1.4-linear-regression-lab
3.2.1-regression-evaluation-loss-functions
3.2.2-loss-functions-regression-lab
3.2.3-cross-validation-and-train-test-splits
3.3.1-patsy-and-feature-scaling/code/starter-code
3.3.2-regularization
3.3.3-regularization-lab
3.4.1-bias-variance-tradeoff
3.4.2-bias-variance-lab
3.4.3-gridsearch/code
3.5.1-week3-review
3.5.2-group-eda-2
4.1.1-classification-knn
4.1.2-knn-classification-imputation-lab
4.1.3-confusion-mat-eva-class
4.1.4-classification-evaluation-lab
4.2.1-logistic-regression
4.2.2-logistic-regression-lab
4.2.3-ROC-curves
4.2.4-roc-curve-lab
4.3.1-classification_visualization_with_tableau
4.3.2-web-scraping-1
4.4.1-precision-recall
4.4.2-precision-recall-lab
4.5.1-weekly-review
4.5.2-capstone_introduction
5.1.1-database_fundamentals
5.1.2-intro_to_sql
5.1.3-database_lab_1
5.1.4-database_lab_2
5.2.1-gradient-descent
5.2.2-sgd-sklearn-lab
5.2.3-journal-club-logistic-knn
5.2.4-object-oriented-programming
5.3.1-sql_together
5.3.2-pgsql-lab
5.4.1-support-vector-machines
5.4.2-svm-lab
5.5.1-intro-bootstrapping
5.5.2-group-project
6.1.1-CARTs
6.1.2-CART-lab
6.1.3-web_servers_and_apis
6.1.4-api_lab
6.2.1-sql_joins
6.2.2-join_and_api_lab
6.2.3-ensembles-bagging
6.2.4-rand-forest-feat-import/code
6.3.1-pipelines-sklearn
6.3.2-boosting
6.3.3-ensembles-lab
6.4.1-intro_to_nlp
6.4.2-nlp_lab
6.4.3-topic-modeling/slides
6.5.1-review
6.5.2-group-nlp-indeed-scraping
7.1.1-intro-to-clustering-kmeans
7.1.3-tuning_kmeans_demo_and_lab
7.2.1-intro-to-PCA
7.2.2-PCA-speed-dating
7.2.3-PCA-lab
7.3.1-hierarchical_clustering
7.3.2-sentiment-analysis
7.3.3-sentiment-analysis-lab
7.4.1-DBSCAN-by-hand
7.5.2-group_eda
8.1.1-intro-to-bayes
8.1.2-priors-posteriors-lab
8.1.3-battle-of-clusterers
8.2.1-intro-to-big-data
8.2.2-hadoop-lab
8.2.3-intro-to-pymc3
8.3.1-spark_overview
8.3.2-spark-lab-1
8.3.3-optional-mrjob-lab
8.3.4-optional-hive-lab
8.4.1-MCMC
8.4.3-intro-to-aws
9.1.1-bayesian-regression
9.1.2-api-case-study-nlp-twitter
9.2.1-capstone-and-aws-review-qa
9.2.3-bayes-split-testing
9.3.1-multi-arm-bandit
9.4.1-machine-learning-with-spark
9.5.1-naive_bayes_insult_lab
project-01
project-02
project-03
project-04
project-05
project-06
project-07
project-capstone
resource-datasets
resource-install
resource-instructor-files
resource-utils
student-onboarding-materials
table-of-contents
LICENSE.txt
README.md
contributing.md

README.md

<<<<<<< HEAD

Data Science Immersive

Welcome to Data Science! We are building a global community of lifelong learners who are excited about using data to solve real world problems.

In this program, you’ll take on real world problems by analyzing data sets for insights and presenting findings using statistics, programming, data modeling, and business knowledge.

Course Value Proposition

This course is designed to give you the deep dive into the world of Data Science, focusing on the ability to analyze and convey data-driven facts in order to predict what happens next using modeling and pattern recognition. Our course prepares students to take full-time roles as Data Analysis, Data Scientists, Business Intelligence Analysts, and other roles that require advanced fluency with data. Our projects immerse students in formal data-driven scenarios in order to help them create a polished portfolio of work showcasing their ability to create and communicate machine learning insights.

What Our Students Learn

  • Data Analysis & Python:
  • Perform visual and statistical analysis on data using Python and its associated libraries and tools.
  • Machine Learning & Modeling Techniques:
  • Explore the differences between supervised and unsupervised learning through the application of various modeling techniques such as classification, regression, and clustering.
  • Git, SQL, & Relational Databases:
  • Gather, store, and organize your data using the data science toolkit: SQL, Git, and UNIX.
  • Critical Thinking & Synthesis:
  • Apply your analysis and modeling skills to real world data problems in fields like finance, marketing, and public policy.
  • Visualization, Presentation, & Reporting:
  • Learn to create reproducible presentations and reports and use data visualisation tools to present your findings to key stakeholders.

By the End of This Course, Students Will Be Able To:

  • Collect, extract, query, clean, and aggregate data for analysis
  • Perform visual and statistical analysis on data using Python and its associated libraries and tools.
  • Build, implement, and evaluate data science problems using appropriate machine learning models and algorithms
  • Use appropriate data visualization tools to communicate findings
  • Present clear and reproducible reports to stakeholders
  • Identify big data problems and understand how distributed systems and parallel computing technologies are solving these challenges.
  • Apply question, modeling, and validation problem solving processes to datasets from various industries to gain insight into real-world problems and solutions.

To Get Started

Please take at least 1 hour to read through the following on-boarding documents, in the order provided, to get a better understanding of your responsibilities as an instructor, student responsibilities, and the scope, sequence, and value proposition of this course. Each document links to the next at the bottom of the file!

Document Description
Students Student personas and course demographics
Materials What we provide and what you should build
Format Course syllabus and schedule
Projects & Assessments Course projects and grading expectations
Expectations Planning and communication responsibilities
Technology Tools used in this course
Supplemental Resources Common course issues and suggestions

After reading these docs, we welcome you to jump into the #dsi-instructors channel on Slack and join the conversation!


⑃ Forking and Collaborating

The structure of this repository provides a way for us to organize our information and resources.

We encourage the teaching team for each cohort to fork this repository directly, and use it to create resources for your own instance. Please make sure to submit new materials back to the master so we can share them with students and instructors world-wide!

If you have any questions about the organization of resources, or about the scope of our curriculum, feel free to open an issue.

Please check out our contributing guidelines for more details.

Licensing

  1. All content is licensed under a CC-BY-NC-SA 4.0 license.
  2. All software code is licensed under GNU GPLv3. For commercial use or alternative licensing, please contact legal@ga.co. =======

table-of-contents

Learn about baseline DSI materials and sequence

d986c61c22664013d34616bb7e315f9dc5937b45