# 12.8 - R Scripts (Agglomerative Clustering)

12.8 - R Scripts (Agglomerative Clustering)## R

### 1. Acquire Data

**Diabetes data**

The diabetes data set is taken from the UCI machine learning database on Kaggle: Pima Indians Diabetes Database

- 768 samples in the dataset
- 8 quantitative variables
- 2 classes; with or without signs of diabetes

Load data into R as follows:

```
# set the working directory
setwd("C:/STAT 897D data mining")
# comma-delimited data and no header for each variable
RawData = read.table("diabetes.data",sep = ",",header=FALSE)
```

In `RawData`, the response variable is its last column; and the remaining columns are the predictor variables.

```
responseY = RawData[,dim(RawData)[2]]
predictorX = RawData[,1:(dim(RawData)[2]-1)]
```

### 2. Agglomerative Clustering

In R, library `cluster` implements hierarchical clustering using the agglomerative nesting algorithm (`agnes`). The first argument x in `agnes` specifies the input data matrix or the dissimilarity matrix, depending on the value of the `diss` argument. If `diss=TRUE`, x is assumed to be a dissimilarity matrix. If `diss=FALSE`, x is treated as a matrix of observations. The argument `stand = TRUE` indicates that the data matrix is standardized before calculating the dissimilarities.

Each variable (a column in the data matrix) is standardized by first subtracting the mean value of the variable and then dividing the result by the mean absolute deviation of the variable. If x is already a dissimilarity matrix, this argument will be ignored.

To merge two clusters into a new cluster, the argument method specifies the measurement of between-cluster distance. `method="single"` is for single linkage clustering, `method="complete" for complete linkage clustering, and method="average" for average linkage clustering. The default is method="average".`

For clarity of illustration, we use only the first 25 observations to run the agglomerative nesting algorithm (`agnes`). The function `as.dendrogram` generates a dendrogram using as input the agglomerative clustering result obtained by `agnes`.

```
library(cluster)
agn = agnes(x=predictorX[1:25,], diss = FALSE, stand = TRUE,
method = "average")
DendAgn =as.dendrogram(agn)
plot(DendAgn)
```

Figure 1 shows the clustering result by average linkage clustering.

Figure 2 shows the clustering result by single linkage, executed by the codes below.

```
agn = agnes(x=predictorX[1:25,], diss = FALSE, stand = TRUE,
method = "single")
DendAgn =as.dendrogram(agn)
plot(DendAgn)
```

Figure 3 shows the result by complete linkage, executed by the codes below.

```
agn = agnes(x=predictorX[1:25,], diss = FALSE, stand = TRUE,
method = "complete")
DendAgn =as.dendrogram(agn)
plot(DendAgn)
```