# Clustering

Unit 4: Flex

## Materials We Provide

Topic | Description | Link |
---|---|---|

Lesson | Part 1: Kmeans | Here |

Lesson | Part 2: Dbscan (Optional) |
Here |

Solution | Part 2: Solution code for questions and exercises | Here |

Datasets | Beer nutrition and cost | Here |

Extra Practice | Four additional labs for practice | Here |

This lesson uses a small beer dataset describing beer name, calories, sodium content, alcohol percentage, and cost. This data set is ideal because it is easy to read it all and clusters into identifiable categories.

## Learning Objectives

After this lesson, students will be able to:

### Part One: KMeans

- Determine the difference between supervised and unsupervised learning.
- Demonstrate how to apply k-means clustering.

### Part Two: DBScan

- Demonstrate how to apply density-based clustering (DBSCAN).
- Define the Silhouette Coefficient and explain how it relates to clustering.

## Student Requirements

Before this lesson(s), students should already be able to:

- Define basic principles of supervised learning.
- Intuit relevant information from k-NN and Voronoi diagrams.
- Prepare features and create models using scikit-learn.
- Graph data using Matplotlib.

## Lesson Outline

TOTAL (170 min)

Rapid Schedule:For a half-lesson, consider only covering part one (k-means). If additional time is needed, the k-means metric explanation could be skipped.

### OUTLINE: PART ONE (K-MEANS)

Total: 80 min

- Unsupervised Learning (15 min)
- Unsupervised Learning Example: Coin Clustering
- Common Types of Unsupervised Learning
- Using Multiple Types of Learning Together

- Clustering (15 min)
- K-Means: Centroid Clustering (30 min)
- Visual Demo
- K-Means Assumptions

- K-Means Demo (20 min)
- K-Means Clustering
- Repeat With Scaled Data

### OUTLINE: PART TWO (DBScan)

Total: 90 min

- DBSCAN: Density-Based Clustering (25 min)
- Visual Demo

- DBSCAN Clustering Demo (10 min)
- Hierarchical Clustering (20 min)
- Clustering Metrics (15 min)
- Clustering, Classification, and Regression (15 min)
- Comparing Clustering Algorithms (5 min)
- Lesson Summary