Permalink
Browse files

Add files via upload

  • Loading branch information...
jfkoehler committed Dec 2, 2019
1 parent c703fc4 commit 7981e838fc3c33b90060e3395ef2780eaa8f9ef0
Showing with 18,460 additions and 0 deletions.
  1. +1,282 −0 clustering/01_intro-to-kmeans.ipynb
  2. +1,949 −0 clustering/02_clustering_adv.ipynb
  3. +44 −0 clustering/CHANGELOG.md
  4. +85 −0 clustering/README.md
  5. BIN clustering/assets/clustering-centroids.png
  6. BIN clustering/assets/dbscan.png
  7. BIN clustering/assets/density-clusters.png
  8. BIN clustering/assets/hierarchical-clustering.png
  9. BIN clustering/assets/kmeans1.png
  10. BIN clustering/assets/kmeans10.png
  11. BIN clustering/assets/kmeans2.png
  12. BIN clustering/assets/kmeans3.png
  13. BIN clustering/assets/kmeans4.png
  14. BIN clustering/assets/kmeans5.png
  15. BIN clustering/assets/kmeans6.png
  16. BIN clustering/assets/kmeans7.png
  17. BIN clustering/assets/kmeans8.png
  18. BIN clustering/assets/kmeans9.png
  19. BIN clustering/assets/supervised-vs-unsupervised.jpg
  20. BIN clustering/assets/unsupervised-coin.png
  21. BIN clustering/assets/voronoi.png
  22. +789 −0 clustering/data/aggregation.csv
  23. +21 −0 clustering/data/beer.txt
  24. +400 −0 clustering/data/compound.csv
  25. +3,101 −0 clustering/data/d31.csv
  26. +241 −0 clustering/data/flame.csv
  27. +151 −0 clustering/data/iris.csv
  28. +374 −0 clustering/data/jain.csv
  29. +33 −0 clustering/data/mtcars.csv
  30. +91 −0 clustering/data/nhl.csv
  31. +28 −0 clustering/data/nutrients.txt
  32. +301 −0 clustering/data/pathbased.csv
  33. +601 −0 clustering/data/r15.csv
  34. +1 −0 clustering/data/seeds.csv
  35. +313 −0 clustering/data/spiral.csv
  36. +364 −0 clustering/practice/DBSCAN_kmeans_and_hierarchical-lab.ipynb
  37. +31 −0 clustering/practice/README.md
  38. +275 −0 clustering/practice/cluster_evaluation-lab.ipynb
  39. +298 −0 clustering/practice/kmeans_clustering-lab.ipynb
  40. +320 −0 clustering/practice/practice_dbscan-lab.ipynb
  41. +671 −0 clustering/practice/practice_solution-code/DBSCAN_kmeans_and_hierarchical-lab-solutions.ipynb
  42. +814 −0 clustering/practice/practice_solution-code/cluster_evaluation-lab-solutions.ipynb
  43. +935 −0 clustering/practice/practice_solution-code/kmeans_clustering-lab-solutions.ipynb
  44. +1,341 −0 clustering/practice/practice_solution-code/practice_dbscan-lab-solutions.ipynb
  45. +3,606 −0 clustering/solution-code/02_clustering_adv-solution.ipynb

Large diffs are not rendered by default.

Oops, something went wrong.

Large diffs are not rendered by default.

Oops, something went wrong.
@@ -0,0 +1,44 @@
### v0.1 | 01.16.18

_Editor: Jeff Boykin_

- Reorganizing assets
- Adding updates from DS-NYC


### v0.1 | 09.05.17

_Editor: Sam Stack_

- Uniform names in practice folder

- Links to notebooks in practice folder

### v0.1 | 09.05.17

_Editor: Sam Stack_

- Broke down individual practice files into a simpler structure.

- Editted file path links in notebooks

- removed `.txt` versions of files as they were not being used.

### v0.1 | 09.04.17

_Editor: Sam Stack_

- Added CHANGELOG.md

- Changed `datasets` to `data` in clustering-eval_metrics-lab and updated paths in starter and solution.

- Changed `datasets` to `data` in clustering-dbscan-lab and updated paths in starter and solution.

- Changed `datasets` to `data` in clustering-intro_to_clustering_kmeans and updated paths in starter and solution.



### v0.0

_Author: Sinan Uozdemir(Lesson), Kiefer Katovich(Lab: Battle of Clusters),
Joseph Nelson(Lab(s): Cluster Evaluation, DBscan, K-Means Clustering), Haley Boyan & Sam Stack(Lab: K-Means Clustering) _
@@ -0,0 +1,85 @@
# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Clustering

> Unit 4: Flex
---

## Materials We Provide

| Topic | Description | Link |
| --- | --- | --- |
| Lesson | Part 1: Kmeans | [Here](./01_intro-to-kmeans.ipynb) |
| Lesson | Part 2: Dbscan (*Optional*) | [Here](.02_clustering_adv.ipynb) |
| Solution | Part 2: Solution code for questions and exercises | [Here](./solution-code/02_clustering_adv-solution.ipynb) |
| Datasets | Beer nutrition and cost | [Here](./data/beer.txt) |
| Extra Practice | Four additional labs for practice | [Here](./practice/) |

> This lesson uses a small beer dataset describing beer name, calories, sodium content, alcohol percentage, and cost. This data set is ideal because it is easy to read it all and clusters into identifiable categories.
---

## Learning Objectives

After this lesson, students will be able to:

### Part One: KMeans
- Determine the difference between supervised and unsupervised learning.
- Demonstrate how to apply k-means clustering.

### Part Two: DBScan
- Demonstrate how to apply density-based clustering (DBSCAN).
- Define the Silhouette Coefficient and explain how it relates to clustering.

---

## Student Requirements

Before this lesson(s), students should already be able to:
- Define basic principles of supervised learning.
- Intuit relevant information from k-NN and Voronoi diagrams.
- Prepare features and create models using scikit-learn.
- Graph data using Matplotlib.

---

## Lesson Outline
> TOTAL (170 min)
> **Rapid Schedule:** For a half-lesson, consider only covering part one (k-means). If additional time is needed, the k-means metric explanation could be skipped.

### OUTLINE: PART ONE (K-MEANS)
> Total: 80 min
- Unsupervised Learning (15 min)
- Unsupervised Learning Example: Coin Clustering
- Common Types of Unsupervised Learning
- Using Multiple Types of Learning Together
- Clustering (15 min)
- K-Means: Centroid Clustering (30 min)
- Visual Demo
- K-Means Assumptions
- K-Means Demo (20 min)
- K-Means Clustering
- Repeat With Scaled Data
### OUTLINE: PART TWO (DBScan)
> Total: 90 min
- DBSCAN: Density-Based Clustering (25 min)
- Visual Demo
- DBSCAN Clustering Demo (10 min)
- Hierarchical Clustering (20 min)
- Clustering Metrics (15 min)
- Clustering, Classification, and Regression (15 min)
- Comparing Clustering Algorithms (5 min)
- Lesson Summary

---


## Additional Resources
- [Scikit-learn Clustering Methods](http://scikit-learn.org/stable/modules/clustering.html)
- [K-Means Clustering (video)](https://www.youtube.com/watch?v=0MQEt10e4NM)
- [Clustering Overview](http://www.holehouse.org/mlclass/13_Clustering.html)
- [Cluster Analysis and K-Means (PDF)](http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf)
- [K-Means Wikipedia Article](http://en.wikipedia.org/wiki/K-means_clustering)
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Oops, something went wrong.

0 comments on commit 7981e83

Please sign in to comment.