12.7 - Pseudo Code

- Begin with n clusters, each containing one object and we will number the clusters 1 through n.
- Compute the between-cluster distance D(r, s) as the between-object distance of the two objects in r and s respectively, r, s =1, 2, ..., n. Let the square matrix D = (D(r, s)). If the objects are represented by quantitative vectors we can use Euclidean distance.
- Next, find the most similar pair of clusters r and s, such that the distance, D(r, s), is minimum among all the pairwise distances.
- Merge r and s to a new cluster t and compute the between-cluster distance D(t, k) for any existing cluster k ≠ r, s . Once the distances are obtained, delete the rows and columns corresponding to the old cluster r and s in the D matrix, because r and s do not exist anymore. Then add a new row and column in D corresponding to cluster t.
- Repeat Step 3 a total of n − 1 times until there is only one cluster left.
Example
Here is an example to show you how this algorithm works. In this case we have 11 points consisting of single numbers. Every point is assigned a label. Visually this is what these different approaches might look like…
Another Example
Agglomerative clustering of a data set containing 100 points into 9 clusters.
With single linkage, below, the result is often not appealing. For instance, if we look at the purple square at the lower left area, a single point is a cluster, and there are other clusters comprising single points. And then all of the red circle points are grouped into one big cluster. Why does this happen?
Single linkage is also sometimes called 'friends of friends' linkage method. The single linkage method always picks the minimum distance when it updates. Therefore, two points might be considered close under the single linkage scheme if they can be connected by a chain of points with small distances between any two consecutive points down the chain. The distance between the two end points of the chain is only as big as the longest link along the chain. The number of links along the chain does not matter.
Complete linkage, below, gives us very different result. It tends to generate clusters that are not very well separated.
Average linkage results:
Ward's clustering, below, tends to generate results similar to k-means. It is kind of a greedy version of k-means or a bottom-up version of k-means because the optimization criterion of k-means is the same as the criterion used for picking clusters to merge in Ward’s clustering. K-means operates in a top-down fashion whereas Ward's clustering operates in a bottom up fashion.