If we look at the ANOVA mixed model in general terms, we have:
response = fixed effects + random effects + errors
In the case of repeated measures with measures taken at \(p\) time points, the covariance structure of the errors can be expressed as a matrix. The diagonal elements of this matrix are the error variances at each time point. The off-diagonals are the covariances between successive time points. In general, the variance-covariance matrix can be expressed as follows:
\(\Sigma_i=\begin{bmatrix}
\sigma_{1}^2 & \sigma_{12} & \ldots & \sigma_{1p}\\
\sigma_{21} & \sigma^2_{2} & &\sigma_{2p}\\
\vdots & & \ddots & \vdots\\
\sigma_{p1} & \sigma_{p2} & \ldots & \sigma^2_{p}
\end{bmatrix}\)
The structure shown above does not assume any specific properties of the variances and covariances and is called an unstructured covariance structure. Note that there are \(p\) variances and \(p(p-1)/2\) covariances, which adds to \(p(p+1)/2\) unknown quantities to define this matrix. So, even for a small number of time points, a substantial number of parameters will have to be estimated. Therefore, in practice, specific structures are imposed to reduce the number of distinct parameters that need to be estimated, which will be discussed in Section 11.3.
Example
To understand the correlation structure of errors, let us use SAS to generate the variance-covariance matrix of the errors for a repeated measures model using hypothetical data stored in Repeated Measures Example Data. The data consists of a single treatment with 3 levels. Subjects are assigned a treatment level at random (CRD) and then are measured at \(p=3\) time points. The SAS code which is given below fits a factorial model and generates the errors along with the correlations among responses taken at three time points.
data rmanova;
input trt $ time subject resp;
datalines;
A 1 1 10
A 1 2 12
A 1 3 13
A 2 1 16
A 2 2 19
A 2 3 20
A 3 1 25
A 3 2 27
A 3 3 28
B 1 4 12
B 1 5 11
B 1 6 10
B 2 4 18
B 2 5 20
B 2 6 22
B 3 4 25
B 3 5 26
B 3 6 27
C 1 7 10
C 1 8 12
C 1 9 13
C 2 7 22
C 2 8 23
C 2 9 22
C 3 7 31
C 3 8 34
C 3 9 33
;
We can run a simple model and obtain the residuals.
/* 2-factor factorial for trt and time - saving residuals */
proc mixed data=rmanova method=type3;
class trt time subject;
model resp=trt time trt*time / ddfm=kr outpm=outmixed;
title 'Two_factor_factorial';
run; title;
Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|
Effect | Num DF | Den DF | F Value | Pr > F |
trt | 2 | 18 | 14.52 | 0.0002 |
time | 2 | 18 | 292.72 | <.0001 |
trt*time | 4 | 18 | 4.67 | 0.0092 |
/*re-organize the residuals to (unstacked data for correlation) */
data one;
set outmixed;
where time=1; time1=resid;
keep time1;
run;
data two; set outmixed; where time=2; time2=resid; keep time2; run;
data three; set outmixed; where time=3; time3=resid; keep time3; run;
data corrcheck; merge one two three;
proc print data=corrcheck;
run;
proc corr data=corrcheck nosimple; var time1 time2 time3; run;
The residuals then are:
The Print Procedure
Obs | time1 | time2 | time3 |
---|---|---|---|
1 | -1.66667 | -2.33333 | -1.66667 |
2 | 0.33333 | 0.66667 | 0.33333 |
3 | 1.33333 | 1.66667 | 1.33333 |
4 | 1.00000 | -2.00000 | -1.00000 |
5 | 0.00000 | 0.00000 | 0.00000 |
6 | -1.00000 | 2.00000 | 1.00000 |
7 | -1.66667 | -0.33333 | -1.66667 |
8 | 0.33333 | 0.66667 | 1.33333 |
9 | 1.33333 | -0.33333 | 0.33333 |
The correlations of responses between time points are:
The CORR Procedure
3 Variables: | time1 time2 time3 |
---|
Pearson Correlation Coefficients, N = 9 Prob > |r| under H0: Rho=0 |
|||
---|---|---|---|
time1 | time2 | time3 | |
time1 |
1.00000
|
0.19026
0.6239
|
0.55882
0.1178
|
time2 |
0.19026
0.6239
|
1.00000
|
0.83239
0.0054
|
time3 |
0.55882
0.1178
|
0.83239
0.0054
|
1.00000
|
Notice that in the above code, the repeated nature of the data is not being utilized. To incorporate this into the model, the repeated statement in proc mixed can be used. As in the code given below, in the repeated statement, the option of subject= specifies what experimental (or observational) units the repeated measures are made on. The type= can be used to specify one of many types of structures for these correlations. Here we specified the unstructured covariance structure and obtained the same correlations that were generated earlier with simple statistics.
proc mixed data=rmanova ;
class trt time subject;
model resp=trt time trt*time / ddfm=kr solution ;
repeated /subject=subject(trt) type=UN rcorr;
title 'Repeated Measures';
run; title;
Estimated R Correlation Matrix for subject 1 | |||
---|---|---|---|
Row | Col1 | Col2 | Col3 |
1 | 1.0000 | 0.1903 | 0.5588 |
2 | 0.1903 | 1.0000 | 0.8324 |
3 | 0.5588 | 0.8324 | 1.0000 |
Finding the best covariance structure is much of the work when it comes to modeling repeated measures and is usually done by considering a subset of candidate structures. These include UN (Unstructured), CS (Compound Symmetry), and AR(1) (Autoregressive lag 1) if time intervals are evenly spaced, or SP(POW) (Spatial Power) if time intervals are unequally spaced.
Choosing the best covariance structure is based on Fit Statistics (also known as information criteria). PROC MIXED in SAS automatically generates four of such Fit Statistics measures when using the `type = ` option in the Repeated statement. For this example, they are:
Fit Statistics | |
---|---|
-2 Res Log Likelihood | 63.0 |
AIC (Smaller is Better) | 75.0 |
AICC (Smaller is Better) | 82.6 |
BIC (Smaller is Better) | 76.2 |
The process amounts to trying various candidate structures and then selecting the covariance structure producing the smallest or most negative values, as smaller values indicate a better fit to the data. The information criteria listed above are usually similar in value, but for small sample sizes, the AICC criterion is recommended. The topic of covariance structures for a general setting is discussed in the next section.