# DAT-MS/cluster-ms930

Latest commit 7981e83 Dec 2, 2019
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
assets Dec 2, 2019
data Dec 2, 2019
practice Dec 2, 2019
solution-code Dec 2, 2019
01_intro-to-kmeans.ipynb Dec 2, 2019
CHANGELOG.md Dec 2, 2019

# Clustering

Unit 4: Flex

## Materials We Provide

Lesson Part 1: Kmeans Here
Lesson Part 2: Dbscan (Optional) Here
Solution Part 2: Solution code for questions and exercises Here
Datasets Beer nutrition and cost Here
Extra Practice Four additional labs for practice Here

This lesson uses a small beer dataset describing beer name, calories, sodium content, alcohol percentage, and cost. This data set is ideal because it is easy to read it all and clusters into identifiable categories.

## Learning Objectives

After this lesson, students will be able to:

### Part One: KMeans

• Determine the difference between supervised and unsupervised learning.
• Demonstrate how to apply k-means clustering.

### Part Two: DBScan

• Demonstrate how to apply density-based clustering (DBSCAN).
• Define the Silhouette Coefficient and explain how it relates to clustering.

## Student Requirements

Before this lesson(s), students should already be able to:

• Define basic principles of supervised learning.
• Intuit relevant information from k-NN and Voronoi diagrams.
• Prepare features and create models using scikit-learn.
• Graph data using Matplotlib.

## Lesson Outline

TOTAL (170 min)

Rapid Schedule: For a half-lesson, consider only covering part one (k-means). If additional time is needed, the k-means metric explanation could be skipped.

### OUTLINE: PART ONE (K-MEANS)

Total: 80 min

• Unsupervised Learning (15 min)
• Unsupervised Learning Example: Coin Clustering
• Common Types of Unsupervised Learning
• Using Multiple Types of Learning Together
• Clustering (15 min)
• K-Means: Centroid Clustering (30 min)
• Visual Demo
• K-Means Assumptions
• K-Means Demo (20 min)
• K-Means Clustering
• Repeat With Scaled Data

### OUTLINE: PART TWO (DBScan)

Total: 90 min

• DBSCAN: Density-Based Clustering (25 min)
• Visual Demo
• DBSCAN Clustering Demo (10 min)
• Hierarchical Clustering (20 min)
• Clustering Metrics (15 min)
• Clustering, Classification, and Regression (15 min)
• Comparing Clustering Algorithms (5 min)
• Lesson Summary