![](https://eevibes.com/wp-content/uploads/2025/02/export.png)
Table of Contents
Introduction to Hierarchical Clustering
Hierarchical clustering is a method of cluster analysis that builds a hierarchy of clusters. There are two main types of hierarchical clustering: Agglomerative (bottom-up) and Divisive (top-down). In this example, we’ll focus on Agglomerative Hierarchical Clustering, which starts by treating each data point as a single cluster and then iteratively merges the closest pairs of clusters until all data points are in a single cluster.
Example Problem:
Let’s consider the following dataset of 5 points in 2D space:
We will use Euclidean distance as the distance metric and single linkage (minimum distance between clusters) as the linkage criterion.
Step 1: Compute the Distance Matrix
First, compute the pairwise Euclidean distances between all points. The Euclidean distance between two points
and
is:
The initial distance matrix is:
Step 2: Merge the Closest Clusters
The smallest distance in the matrix is 1 (between A and B). Merge A and B into a new cluster AB.
Update the distance matrix using single linkage (minimum distance between clusters):
- Distance between AB and C:
- Distance between AB and D:
- Distance between AB and E:
The updated distance matrix is:
Step 3: Merge the Next Closest Clusters
The smallest distance in the updated matrix is 1.41 (between C and D). Merge C and D into a new cluster CD.
Update the distance matrix using single linkage:
- Distance between AB and CD:
- Distance between CD and E:
The updated distance matrix is:
Step 4: Merge the Next Closest Clusters
The smallest distance in the updated matrix is 1.41 (between CD and E). Merge CD and E into a new cluster CDE.
Update the distance matrix using single linkage:
- Distance between AB and CDE:
The updated distance matrix is:
Step 5: Merge the Final Clusters
The smallest distance in the updated matrix is 2.83 (between AB and CDE). Merge AB and CDE into a single cluster ABCDE.
Dendrogram
The hierarchical clustering process can be visualized using a dendrogram:
Final Clusters:
- At a distance threshold of 1, the clusters are: {A, B}, {C}, {D}, {E}
- At a distance threshold of 1.41, the clusters are: {A, B}, {C, D}, {E}
- At a distance threshold of 2.83, the clusters are: {A, B}, {C, D, E}
- At a distance threshold of 5.66, all points are in one cluster: {A, B, C, D, E}