# 14.6 - Cluster Description

14.6 - Cluster Description

The next step of the cluster analysis is to describe the identified clusters.

#### Using SAS

The SAS program shows how this is implemented.

Notice that in the cluster procedure we created a new SAS dataset called clust1. This contains the information required by the tree procedure to draw the tree diagram.

In the tree procedure, we chose to investigate 6 clusters with ncluster=6. A new SAS dataset called clust2 is output with the id numbers of each site and the cluster that site belongs stored in a new variable called cluster. We need to merge this back with the original data to describe the characteristics of each of the 6 clusters.

Now an Analysis of Variance for each species is carried out with a class statement for the grouping variable, cluster. We also include the means statement to get the cluster means.

#### Using Minitab

View the video below to get a walkthrough of how to perform a cluster analysis using the Minitab statistical software application.

#### Analysis

We perform an analysis of variance for each of the tree species, comparing the means of the species across clusters. The Bonferroni method is applied to control the experiment-wise error rate. This means that we will reject the null hypothesis of equal means among clusters at level $\alpha$ if the p-value is less than $\alpha/ p$. Here, $p = 13$ so for an $\alpha = 0.05$ level test, we reject the null hypothesis of equality of cluster means if the p-value is less than $0.05/13$ or $0.003846$ .

Here is the output for the species carcar.

Cluster Analysis - Woodyard Hammock - Complete Linkage

Pr > F
Model 5 4340.834339 868.166868 62.94 < 0.0001
Error 66 910.443439 13.794598
Corrected Total 71 5251.277778
R-Square Coeff Var Root MSE carcar Mean
0.826624 44.71836 3.714108 8.305556
Source DF Type I SS Mean Square F Value Pr > F
CLUSTER 5 4340.834339 868.166868 62.94 < 0.0001
Source DF Type III SS Mean Square F Value Pr > F
CLUSTER 5 4340.834339 868.166868 62.94 < 0.0001

We collected the results of the individual species ANOVA's in the table below. The species names in boldface indicate significant results suggesting that there was significant variation among the clusters for that particular species.

Note! The d.f. are presented beneath the table.
 Code Species F p-value carcar Ironwood 62.94 < 0.0001 corflo Dogwood 1.55 0.1870 faggra Beech 7.11 < 0.0001 ileopa Holly 3.42 0.0082 liqsty Sweetgum 5.87 0.0002 maggra Magnolia 3.97 0.0033 nyssyl Blackgum 1.66 0.1567 ostvir Blue Beech 17.70 < 0.0001 oxyarb Sourwood 1.42 0.2294 pingla Spruce Pine 0.43 0.8244 quenig Water Oak 2.23 0.0612 quemic Swamp Chestnut Oak 4.12 0.0026 symtin Horse Sugar 75.57 < 0.0001

d.f. = 5, 66

The results indicate that there are significant differences among clusters for ironwood, beech, sweetgum, magnolia, blue beech, swamp chestnut oak, and horse sugar.

Next, SAS computed the cluster means for each of the species. Here is a sample of the output with a couple of significant species highlighted.

We collected the cluster means for each of the significant species indicated above and placed the values in the table below:

 Cluster Code 1 2 3 4 5 6 carcar 3.8 24.4 18.5 1.2 8.2 6.0 faggra 11.4 6.4 5.9 5.9 8.6 2.7 liqsty 7.2 17.4 6.4 6.8 6.6 18.0 maggra 5.3 3.8 2.8 3.2 4.6 0.7 ostvir 4.3 2.8 2.9 13.8 3.6 14.0 quemic 5.3 5.2 9.4 4.1 7.0 2.3 symtin 0.9 0.0 0.7 2.0 18.0 20.0

The boldface values highlight the clusters where each species is abundant. For example, carcar (ironwood) is abundant in clusters 2 and 3. This operation is carried out across the rows of the table.

Each cluster is then characterized by the species that are highlighted in its column. For example, cluster 1 is characterized by a high abundance of faggra, or beech trees. This operation is carried out across the columns of the table.

In summary, we find:

• Cluster 1: primarily Beech (faggra)
• Cluster 2: Ironwood (carcar) and Sweetgum (liqsty)
• Cluster 3: Ironwood (carcar) and Swamp Chestnut Oak(quemic)
• Cluster 4: primarily Blue Beech (ostvir)
• Cluster 5: Beech (faggra), Swamp Chestnut Oak(quemic) and Horse Sugar(symtin)
• Cluster 6: Sweetgum (liqsty), Blue Beech (ostvir) and Horse Sugar(symtin)

It is also useful to summarize the results in the cluster diagram:

We can see that the two ironwood clusters (2 and 3) are joined. Ironwood is an understory species that tends to be found in wet regions that may be frequently flooded. Cluster 2 also contains sweetgum, an overstory species found in disturbed habitats, while cluster 3 contains swamp chestnut oak, an overstory species characteristic of undisturbed habitats.

Clusters 5 and 6 both contain horse sugar, an understory species characteristic of light gaps in the forest. Cluster 5 also contains beech and swamp chestnut oak, two overstory species characteristic of undisturbed habitats. These are likely to be saplings of the two species growing in the horse sugar light gaps. Cluster 6 also contains blue beech, an understory species similar to ironwood, but characteristic of uplands.

Cluster 4 is dominated by blue beech, an understory species characteristic of uplands

Cluster 1 is dominated by beech, an overstory species most abundant in undisturbed habitats.

From the above description, you can see that a meaningful interpretation of the results of a cluster analysis is best obtained using subject-matter knowledge.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility