# 8.4 - Example: Pottery Data - Checking Model Assumptions

8.4 - Example: Pottery Data - Checking Model Assumptions

## Example 8-1 Pottery Data (MANOVA)

Before carrying out a MANOVA, first check the model assumptions:

1. The data from group i has common mean vector $\boldsymbol{\mu}_{i}$
2. The data from all groups have common variance-covariance matrix $\Sigma$.
3. Independence: The subjects are independently sampled.
4. Normality: The data are multivariate normally distributed.

#### Assumptions

1. Assumption 1: The data from group i has common mean vector $\boldsymbol{\mu}_{i}$

This assumption says that there are no subpopulations with different mean vectors. Here, this assumption might be violated if pottery collected from the same site had inconsistencies.

2. Assumption 3: Independence: The subjects are independently sampled. This assumption is satisfied if the assayed pottery are obtained by randomly sampling the pottery collected from each site. This assumption would be violated if, for example, pottery samples were collected in clusters. In other applications, this assumption may be violated if the data were collected over time or space.

3. Assumption 4: Normality: The data are multivariate normally distributed.

Note!
• For large samples, the Central Limit Theorem says that the sample mean vectors are approximately multivariate normally distributed, even if the individual observations are not.
• For the pottery data, however, we have a total of only N = 26 observations, including only two samples from Caldicot. With small N, we cannot rely on the Central Limit Theorem.

Diagnostic procedures are based on the residuals, computed by taking the differences between the individual observations and the group means for each variable:

$\hat{\epsilon}_{ijk} = Y_{ijk}-\bar{Y}_{i.k}$

Thus, for each subject (or pottery sample in this case), residuals are defined for each of the p variables. Then, to assess normality, we apply the following graphical procedures:

• Plot the histograms of the residuals for each variable. Look for a symmetric distribution.
• Plot a matrix of scatter plots. Look for elliptical distributions and outliers.
• Plot three-dimensional scatter plots. Look for elliptical distributions and outliers.

If the histograms are not symmetric or the scatter plots are not elliptical, this would be evidence that the data are not sampled from a multivariate normal distribution in violation of Assumption 4. In this case, a normalizing transformation should be considered.

#### Using SAS

The SAS program below will help us check this assumption.

View the video explanation of the SAS code.

#### Using Minitab

Minitab procedures are not shown separately.

These can be handled using procedures already known.

• Histograms suggest that, except for sodium, the distributions are relatively symmetric. However, the histogram for sodium suggests that there are two outliers in the data. Both of these outliers are in Llanadyrn.
• Two outliers can also be identified from the matrix of scatter plots.
• Removal of the two outliers results in a more symmetric distribution for sodium.

The results of MANOVA can be sensitive to the presence of outliers. One approach to assessing this would be to analyze the data twice, once with the outliers and once without them. The results may then be compared for consistency. The following analyses use all of the data, including the two outliers.

Assumption 2: The data from all groups have common variance-covariance matrix $\Sigma$.

This assumption can be checked using Bartlett's test for homogeneity of variance-covariance matrices. To obtain Bartlett's test, let $\Sigma_{i}$ denote the population variance-covariance matrix for group i . Consider testing:

$H_0\colon \Sigma_1 = \Sigma_2 = \dots = \Sigma_g$

against

$H_0\colon \Sigma_i \ne \Sigma_j$ for at least one $i \ne j$

Under the alternative hypothesis, at least two of the variance-covariance matrices differ on at least one of their elements. Let:

$\mathbf{S}_i = \dfrac{1}{n_i-1}\sum\limits_{j=1}^{n_i}\mathbf{(Y_{ij}-\bar{y}_{i.})(Y_{ij}-\bar{y}_{i.})'}$

denote the sample variance-covariance matrix for group i . Compute the pooled variance-covariance matrix

$\mathbf{S}_p = \dfrac{\sum_{i=1}^{g}(n_i-1)\mathbf{S}_i}{\sum_{i=1}^{g}(n_i-1)}= \dfrac{\mathbf{E}}{N-g}$

Bartlett's test is based on the following test statistic:

$L' = c\left\{(N-g)\log |\mathbf{S}_p| - \sum_{i=1}^{g}(n_i-1)\log|\mathbf{S}_i|\right\}$

where the correction factor is

$c = 1-\dfrac{2p^2+3p-1}{6(p+1)(g-1)}\left\{\sum_\limits{i=1}^{g}\dfrac{1}{n_i-1}-\dfrac{1}{N-g}\right\}$

The version of Bartlett's test considered in the lesson of the two-sample Hotelling's T-square is a special case where g = 2. Under the null hypothesis of homogeneous variance-covariance matrices, L' is approximately chi-square distributed with

$\dfrac{1}{2}p(p+1)(g-1)$

degrees of freedom. Reject $H_0$ at level $\alpha$ if

$L' > \chi^2_{\frac{1}{2}p(p+1)(g-1),\alpha}$

## Example 8-2: Pottery Data

#### Using SAS

Here we will use the Pottery SAS program.

View the video explanation of the SAS code.

#### Using Minitab

Minitab procedures are not shown separately.

These can be handled using procedures already known.

#### Analysis

We find no statistically significant evidence against the null hypothesis that the variance-covariance matrices are homogeneous (L' = 27.58; d.f. = 45; p = 0.98).

#### Notes

• If we were to reject the null hypothesis of homogeneity of variance-covariance matrices, then we would conclude that assumption 2 is violated.
• MANOVA is not robust to violations of the assumption of homogeneous variance-covariance matrices.
• If the variance-covariance matrices are determined to be unequal then the solution is to find a variance-stabilizing transformation.
• Note that the assumptions of homogeneous variance-covariance matrices and multivariate normality are often violated together.
• Therefore, a normalizing transformation may also be a variance-stabilizing transformation.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility