Find file History
Latest commit 7981e83 Dec 2, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets Add files via upload Dec 2, 2019
data Add files via upload Dec 2, 2019
practice Add files via upload Dec 2, 2019
solution-code Add files via upload Dec 2, 2019
01_intro-to-kmeans.ipynb Add files via upload Dec 2, 2019
02_clustering_adv.ipynb Add files via upload Dec 2, 2019 Add files via upload Dec 2, 2019 Add files via upload Dec 2, 2019


Unit 4: Flex

Materials We Provide

Topic Description Link
Lesson Part 1: Kmeans Here
Lesson Part 2: Dbscan (Optional) Here
Solution Part 2: Solution code for questions and exercises Here
Datasets Beer nutrition and cost Here
Extra Practice Four additional labs for practice Here

This lesson uses a small beer dataset describing beer name, calories, sodium content, alcohol percentage, and cost. This data set is ideal because it is easy to read it all and clusters into identifiable categories.

Learning Objectives

After this lesson, students will be able to:

Part One: KMeans

  • Determine the difference between supervised and unsupervised learning.
  • Demonstrate how to apply k-means clustering.

Part Two: DBScan

  • Demonstrate how to apply density-based clustering (DBSCAN).
  • Define the Silhouette Coefficient and explain how it relates to clustering.

Student Requirements

Before this lesson(s), students should already be able to:

  • Define basic principles of supervised learning.
  • Intuit relevant information from k-NN and Voronoi diagrams.
  • Prepare features and create models using scikit-learn.
  • Graph data using Matplotlib.

Lesson Outline

TOTAL (170 min)

Rapid Schedule: For a half-lesson, consider only covering part one (k-means). If additional time is needed, the k-means metric explanation could be skipped.


Total: 80 min

  • Unsupervised Learning (15 min)
    • Unsupervised Learning Example: Coin Clustering
    • Common Types of Unsupervised Learning
    • Using Multiple Types of Learning Together
  • Clustering (15 min)
  • K-Means: Centroid Clustering (30 min)
    • Visual Demo
    • K-Means Assumptions
  • K-Means Demo (20 min)
    • K-Means Clustering
    • Repeat With Scaled Data


Total: 90 min

  • DBSCAN: Density-Based Clustering (25 min)
    • Visual Demo
  • DBSCAN Clustering Demo (10 min)
  • Hierarchical Clustering (20 min)
  • Clustering Metrics (15 min)
  • Clustering, Classification, and Regression (15 min)
  • Comparing Clustering Algorithms (5 min)
  • Lesson Summary

Additional Resources