14.4  Agglomerative Hierarchical Clustering
14.4  Agglomerative Hierarchical ClusteringCombining Clusters in the Agglomerative Approach
In the agglomerative hierarchical approach, we define each data point as a cluster and combine existing clusters at each step. Here are four different methods for this approach:

Single Linkage: In single linkage, we define the distance between two clusters as the minimum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters with the smallest single linkage distance.

Complete Linkage: In complete linkage, we define the distance between two clusters to be the maximum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallestcomplete linkage distance.

Average Linkage: In average linkage, we define the distance between two clusters to be the average distance between data points in the first cluster and data points in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallest average linkage distance.

Centroid Method: In centroid method, the distance between two clusters is the distance between the two mean vectors of the clusters. At each stage of the process we combine the two clusters that have the smallest centroid distance.

Ward’s Method: This method does not directly define a measure of distance between two points or clusters. It is an ANOVA based approach. Oneway univariate ANOVAs are done for each variable with groups defined by the clusters at that stage of the process. At each stage, two clusters merge that provide the smallest increase in the combined error sum of squares.
In the following table the mathematical form of the distances are provided. The graph gives a geometric interpretation.
Notationally, define
 \(\boldsymbol{X _ { 1 , } X _ { 2 , } , \dots , X _ { k }}\) = Observations from cluster 1
 \(\boldsymbol{Y _ { 1 , } Y _ { 2 , } , \dots , Y _ { l }}\) = Observations from cluster 2
 d(x,y) = Distance between a subject with observation vector x and a subject with observation vector y
Linkage Methods or Measuring Association d_{12} Between Clusters 1 and 2
Single Linkage  \(d_{12} = \displaystyle \min_{i,j}\text{ } d(\mathbf{X}_i, \mathbf{Y}_j)\)  This is the distance between the closest members of the two clusters. 

Complete Linkage  \(d_{12} = \displaystyle \max_{i,j}\text{ } d(\mathbf{X}_i, \mathbf{Y}_j)\)  This is the distance between the members that are farthest apart (most dissimilar) 
Average Linkage  \(d_{12} = \frac{1}{kl}\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}d(\mathbf{X}_i, \mathbf{Y}_j)\)  This method involves looking at the distances between all pairs and averages all of these distances. This is also called UPGMA  Unweighted Pair Group Mean Averaging. 
Centroid Method 
\( d_{12} = d(\bar{\mathbf{x}},\bar{\mathbf{y}})\) 
This involves finding the mean vector location for each of the clusters and taking the distance between the two centroids. 
The following video shows the the linkage method types listed on the right for a visual representation of how the distances are determined for each method.