This is an alternative approach for performing cluster analysis. Basically, it looks at cluster analysis as an analysis of variance problem, instead of using distance metrics or measures of association.

This method involves an agglomerative clustering algorithm. It will start out at the leaves and work its way to the trunk, so to speak. It looks for groups of leaves that form into branches, the branches into limbs and eventually into the trunk. Ward's method starts out with *n* clusters of size 1 and continues until all the observations are included into one cluster.

This method is most appropriate for quantitative variables, and not binary variables.

Based on the notion that clusters of multivariate observations should be approximately elliptical in shape, we assume that the data from each of the clusters have been realized in a multivariate distribution. Therefore, it would follow that they would fall into an elliptical shape when plotted in a *p*-dimensional scatter plot.

Let \(X _ { i j k }\) denote the value for variable *k* in observation *j* belonging to cluster *i*.

Furthermore, we define:

We sum over all variables, and all of the units within each cluster. We compare individual observations for each variable against the cluster means for that variable.

**Note!**When the Error Sum of Squares is small, it suggests that our data are close to their cluster means, implying that we have a cluster of like units.

The total sums of squares is defined the same as always. Here we compare the individual observations for each variable against the grand mean for that variable.

This \(r^{2}\) value is interpreted as the proportion of variation explained by a particular clustering of the observations.

**Error Sum of Squares**: \(ESS = \sum_{i}\sum_{j}\sum_{k}|X_{ijk} - \bar{x}_{i\cdot k}|^2\)**Total Sum of Squares**: \(TSS = \sum_{i}\sum_{j}\sum_{k}|X_{ijk} - \bar{x}_{\cdot \cdot k}|^2\)**R-Square**: \(r^2 = \frac{\text{TSS-ESS}}{\text{TSS}}\)

Using Ward's Method we start out with all sample units in *n * clusters of size 1 each. In the first step of the algorithm, *n* - 1 clusters are formed, one of size two and the remaining of size 1. The error sum of squares and \(r^{2}\) values are then computed. The pair of sample units that yield the smallest error sum of squares, or equivalently, the largest \(r^{2}\) value will form the first cluster. Then, in the second step of the algorithm, *n* - 2 clusters are formed from that *n* - 1 clusters defined in step 2. These may include two clusters of size 2, or a single cluster of size 3 including the two items clustered in step 1. Again, the value of \(r^{2}\) is maximized. Thus, at each step of the algorithm, clusters or observations are combined in such a way as to minimize the results of error from the squares or alternatively maximize the \(r^{2}\) value. The algorithm stops when all sample units are combined into a single large cluster of size *n*.

##
Example 14-3: Woodyard Hammock Data (Ward's Method)
Section* *

We will take a look at the implementation of Ward's Method using the SAS program below. Minitab implementation is also similar. Minitab is not shown separately.

Download the SAS Program here: wood5.sas

As you can see, this program is very similar to the previous program, (wood1.sas), that was discussed earlier in this lesson. The only difference is that we have specified that **method=ward** in the cluster procedure as highlighted above. The tree procedure is used to draw the tree diagram shown below, as well as to assign cluster identifications. Here we will look at four clusters.

The break in the plot shows four highlighed clusters. It looks as though there are two very well defined clusters because it shows a large break between the first and second branches of the tree. The partitioning results in 4 clusters yielding clusters of sizes 31, 24, 9, and 8.

Referring back to the SAS output, the results of the ANOVAs are copied here for discussion.

Results of ANOVA's |
|||

Code |
Species |
F |
p-value |

carcar | Ironwood |
67.42 | < 0.0001 |

corflo | Dogwood | 2.31 | 0.0837 |

faggra | Beech |
7.13 | 0.0003 |

ileopa | Holly |
5.38 | 0.0022 |

liqsty | Sweetgum | 0.76 | 0.5188 |

maggra | Magnolia | 2.75 | 0.0494 |

nyssyl | Blackgum | 1.36 | 0.2627 |

ostvir | Blue Beech |
32.91 | < 0.0001 |

oxyarb | Sourwood | 3.15 | 0.0304 |

pingla | Spruce Pine | 1.03 | 0.3839 |

quenig | Water Oak | 2.39 | 0.0759 |

quemic | Swamp Chestnut Oak | 3.44 | 0.0216 |

symtin | Horse Sugar |
120.95 | < 0.0001 |

*d.f. * = 3, 68

We boldfaced the species whose F-values, using a Bonferoni correction, show significance. These include Ironwood, Beech, Holly, Blue Beech and Horse Sugar.

Next we look at the cluster Means for these significant species:

Cluster |
||||

Code |
1 |
2 |
3 |
4 |

carcar | 2.8 | 18.5 |
1.0 | 7.4 |

faggra | 10.6 |
6.0 | 5.9 | 6.4 |

ileopa | 7.5 | 4.3 | 12.3 |
7.9 |

ostvir | 5.4 | 3.1 | 18.3 |
7.5 |

symtin | 1.3 | 0.7 | 1.4 | 18.8 |

Again, we boldfaced the values that show an abundance of that species within the different clusters.

- Cluster 1: Beech (faggra): Canopy species typical of old-growth forests.
- Cluster 2: Ironwood (carcar): Understory species that favors wet habitats.
- Cluster 3: Holly (ileopa) and Blue Beech (ostvir): Understory species that favor dry habitats.
- Cluster 4: Horse Sugar(symtin): Understory species typically found in disturbed habitats.

**Note! **This interpretation is cleaner than the interpretation obtained earlier from the complete linkage method. This suggests that Ward's method may be preferred for the current data.

The results are summarized in the following dendrogram:

In summary, this method is performed in essentially the same manner as the previous method the only difference is that the cluster analysis is based on Analysis of Variance instead of distances.