11: Introduction to Repeated Measures

Overview

Studies can often be expanded by introducing time as a potential covariate. In the greenhouse example, the growth of plants can be measured weekly over a period of time, allowing time to also be included as a predictor in the statistical model. Another example is to compare the effect of two anti-cancer drugs on disease status at different intervals of time. In both these examples, the response has to be measured multiple times from the same experimental unit, hence the term ‘repeated measures’. The repeated measurements made on the same experimental unit cannot be assumed independent which means that the model errors may not be uncorrelated anymore and the statistical model should be modified accordingly.

There are two common, fundamental types of repeated measures: repeated measures in time and crossover repeated measures. In repeated measures in time designs, experimental units receive treatment and are followed with repeated measures on the response variable at several time points. In contrast, in a crossover design (studied in our next lesson), experiments involve administering all treatment levels in a sequence to each experimental unit.

Repeated measures are frequently encountered in clinical trials including longitudinal studies, growth models, and situations in which experimental units are difficult to acquire.

Objectives

Upon completion of this lesson, you should be able to:

Recognize repeated measures designs in time.
Understand the different covariance structures that can be imposed on model error.
Use software such as SAS, Minitab, and R for fitting repeated measures ANOVA.

11.1 - Historical Methods

Repeated measures in time have historically been handled as either a multivariate analysis or as a univariate split-plot in time. The focus in this course is limited to only the latter.

A split-plot in time approach looks at each subject (experimental unit) as a main plot which is then split into sub-plots (time periods). Historically, the default assumption in split-plot in time data analyses has been that the correlations among responses at different time points are the same for all treatment levels (compound symmetry). However, depending on the study and nature of data, other correlation structures can be more appropriate (e.g. autoregressive lag 1).

Most of the current software facilitates the inclusion of different correlation structures which now allows for repeated measures to accommodate the presence of different correlated structures in residuals.

11.2 - Correlated Residuals

Note! The first part of the section uses a hypothetical data set to illustrate the origin of the covariance structure by capturing the residuals for each time point and looking at the simple correlations for pairs of time points. Therefore, the software code used for this purpose is NOT what we would ordinarily use in conducting a repeated measures analysis as generating the residuals of a fitted model as well as their variances and covariances is automatically done by software. The variances and covariances of the residuals will be outputted as the diagonals and the off-diagonals of the variance-covariance matrix in SAS or R. Minitab currently does not accommodate various covariance structures, opting instead to treat repeated measures as 'split-plot in time' (which assumes compound symmetry).

If we look at the ANOVA mixed model in general terms, we have:

response = fixed effects + random effects + errors

In the case of repeated measures with measures taken at \(p\) time points, the covariance structure of the errors can be expressed as a matrix. The diagonal elements of this matrix are the error variances at each time point. The off-diagonals are the covariances between successive time points. In general, the variance-covariance matrix can be expressed as follows:

\(\Sigma_i=\begin{bmatrix}
\sigma_{1}^2 & \sigma_{12} & \ldots & \sigma_{1p}\\
\sigma_{21} & \sigma^2_{2} & &\sigma_{2p}\\
\vdots & & \ddots & \vdots\\
\sigma_{p1} & \sigma_{p2} & \ldots & \sigma^2_{p}
\end{bmatrix}\)

The structure shown above does not assume any specific properties of the variances and covariances and is called an unstructured covariance structure. Note that there are \(p\) variances and \(p(p-1)/2\) covariances, which adds to \(p(p+1)/2\) unknown quantities to define this matrix. So, even for a small number of time points, a substantial number of parameters will have to be estimated. Therefore, in practice, specific structures are imposed to reduce the number of distinct parameters that need to be estimated, which will be discussed in Section 11.3.

SAS® Example

To understand the correlation structure of errors, let us use SAS to generate the variance-covariance matrix of the errors for a repeated measures model using hypothetical data stored in Repeated Measures Example Data. The data consists of a single treatment with 3 levels. Subjects are assigned a treatment level at random (CRD) and then are measured at \(p=3\) time points. The SAS code which is given below fits a factorial model and generates the errors along with the correlations among responses taken at three time points.

data rmanova;
input trt $ time subject resp;
datalines;
A    1    1   10
A    1    2   12
A    1    3   13
A    2    1   16
A    2    2   19
A    2    3   20
A    3    1   25
A    3    2   27
A    3    3   28
B    1    4   12
B    1    5   11
B    1    6   10
B    2    4   18
B    2    5   20
B    2    6   22
B    3    4   25
B    3    5   26
B    3    6   27
C    1    7   10
C    1    8   12
C    1    9   13
C    2    7   22
C    2    8   23
C    2    9   22
C    3    7   31
C    3    8   34
C    3    9   33
;

We can run a simple model and obtain the residuals.

/* 2-factor factorial for trt and time - saving residuals */
proc mixed data=rmanova method=type3;
class trt time subject;
model resp=trt time trt*time / ddfm=kr outpm=outmixed;
title 'Two_factor_factorial';
run; title;


Type 3 Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
trt	2	18	14.52	0.0002
time	2	18	292.72	<.0001
trt*time	4	18	4.67	0.0092

/*re-organize the residuals to (unstacked data for correlation) */
data one; 
set outmixed; 
where time=1; time1=resid; 
keep time1; 
run;
data two; set outmixed; where time=2; time2=resid; keep time2; run;
data three; set outmixed; where time=3; time3=resid; keep time3; run;
data corrcheck; merge one two three;
proc print data=corrcheck;
run;
proc corr data=corrcheck nosimple; var time1 time2 time3; run;

The residuals then are:

The Print Procedure


Obs	time1	time2	time3
1	-1.66667	-2.33333	-1.66667
2	0.33333	0.66667	0.33333
3	1.33333	1.66667	1.33333
4	1.00000	-2.00000	-1.00000
5	0.00000	0.00000	0.00000
6	-1.00000	2.00000	1.00000
7	-1.66667	-0.33333	-1.66667
8	0.33333	0.66667	1.33333
9	1.33333	-0.33333	0.33333

The correlations of responses between time points are:

The CORR Procedure


3 Variables:	time1 time2 time3


Pearson Correlation Coefficients, N = 9 Prob > \|r\| under H0: Rho=0
time1		time2	time3
time1	1.00000	0.19026 0.6239	0.55882 0.1178
time2	0.19026 0.6239	1.00000	0.83239 0.0054
time3	0.55882 0.1178	0.83239 0.0054	1.00000

Notice that in the above code, the repeated nature of the data is not being utilized. To incorporate this into the model, the repeated statement in proc mixed can be used. As in the code given below, in the repeated statement, the option of subject= specifies what experimental (or observational) units the repeated measures are made on. The type= can be used to specify one of many types of structures for these correlations. Here we specified the unstructured covariance structure and obtained the same correlations that were generated earlier with simple statistics.

proc mixed data=rmanova ;
class trt time subject;
model resp=trt time trt*time / ddfm=kr solution ;
repeated /subject=subject(trt) type=UN rcorr;
title 'Repeated Measures';
run; title;


Estimated R Correlation Matrix for subject 1
Row	Col1	Col2	Col3
1	1.0000	0.1903	0.5588
2	0.1903	1.0000	0.8324
3	0.5588	0.8324	1.0000

Finding the best covariance structure is much of the work when it comes to modeling repeated measures and is usually done by considering a subset of candidate structures. These include UN (Unstructured), CS (Compound Symmetry), and AR(1) (Autoregressive lag 1) if time intervals are evenly spaced, or SP(POW) (Spatial Power) if time intervals are unequally spaced.

Choosing the best covariance structure is based on Fit Statistics (also known as information criteria). PROC MIXED in SAS automatically generates four of such Fit Statistics measures when using the `type = ` option in the Repeated statement. For this example, they are:


Fit Statistics
-2 Res Log Likelihood	63.0
AIC (Smaller is Better)	75.0
AICC (Smaller is Better)	82.6
BIC (Smaller is Better)	76.2

The process amounts to trying various candidate structures and then selecting the covariance structure producing the smallest or most negative values, as smaller values indicate a better fit to the data. The information criteria listed above are usually similar in value, but for small sample sizes, the AICC criterion is recommended. The topic of covariance structures for a general setting is discussed in the next section.

11.3 - More on Covariance Structures

Variance Components (VC)

\(\begin{bmatrix} \sigma_1^2 & 0 & 0 & 0 \\ 0 & \sigma_1^2 & 0 & 0 \\ 0 & 0 &\sigma_1^2 & 0 \\ 0 & 0 & 0 & \sigma_1^2 \end{bmatrix}\)

The variance component structure (VC) is the simplest, where the correlations of errors within a subject are presumed to be 0. This structure is the default setting in proc mixed, but is not a reasonable choice for most repeated measures designs. It is included in the exploration process to get a sense of the effect of fitting other structures.

Compound Symmetry

\(\sigma^2 \begin{bmatrix} 1.0 & \rho & \rho & \rho \\ \rho & 1.0 & \rho & \rho \\ \rho & \rho & 1.0 & \rho \\ \rho & \rho & \rho & 1.0 \end{bmatrix} = \begin{bmatrix} \sigma_b^2+\sigma_e^2 & \sigma_b^2 & \sigma_b^2 & \sigma_b^2 \\ \sigma_b^2& \sigma_b^2+\sigma_e^2 & \sigma_b^2 & \sigma_b^2 \\ \sigma_b^2 & \sigma_b^2 & \sigma_b^2+\sigma_e^2 & \sigma_b^2 \\ \sigma_b^2 & \sigma_b^2 & \sigma_b^2 & \sigma_b^2+\sigma_e^2 \end{bmatrix}\)

The simplest covariance structure that includes within-subject correlated errors is compound symmetry (CS). Here we see correlated errors between time points within subjects, and note that these correlations are presumed to be the same for each set of times, regardless of how distant in time the repeated measures are made.

First Order Autoregressive AR(1)

\(\sigma^2 \begin{bmatrix} 1.0 & \rho & \rho^2 & \rho^3 \\ \rho & 1.0 & \rho & \rho^2 \\ \rho^2 & \rho & 1.0 & \rho \\ \rho^3 & \rho^2 & \rho & 1.0 \end{bmatrix}\)

The autoregressive (Lag 1) structure considers correlations to be highest between adjacent times, and a systematically decreasing correlation with increasing distance between time points. For one subject, the error correlation between time 1 and time 2 would be \(\rho^{t_2-t_1}\). Between time 1 and time 3 the correlation would be less, and equal to \(\rho^{t_3-t_1}\). Between time 1 and 4, the correlation is lesser, as \(\rho^{t_4-t_1}\), and so on. Note that this structure is only applicable for evenly spaced time intervals for the repeated measure; so that consecutive correlations are \(\rho\) raised to powers of 1, 2, 3, etc.

Spatial Power

\(\sigma^2 \begin{bmatrix} 1.0 & \rho^{\frac{|t_1-t_2|}{|t_1-t_2|}} & \rho^{\frac{|t_1-t_3|}{|t_1-t_2|}} & \rho^{\frac{|t_1-t_4|}{|t_1-t_2|}} \\ \rho^{\frac{|t_1-t_2|}{|t_1-t_2|}} & 1.0 & \rho^{\frac{|t_2-t_3|}{|t_1-t_2|}} & \rho^{\frac{|t_2-t_4|}{|t_1-t_2|}} \\ \rho^{\frac{|t_1-t_3|}{|t_1-t_2|}} & \rho^{\frac{|t_2-t_3|}{|t_1-t_2|}} & 1.0 & \rho^{\frac{|t_3-t_4|}{|t_1-t_2|}} \\ \rho^{\frac{|t_1-t_4|}{|t_1-t_2|}} & \rho^{\frac{|t_2-t_4|}{|t_1-t_2|}} & \rho^{\frac{|t_3-t_4|}{|t_1-t_2|}} & 1.0 \end{bmatrix}\)

When time intervals are not evenly spaced, a covariance structure equivalent to the AR(1) is the spatial power (SP(POW)). The concept is the same as the AR(1) but instead of raising the correlation to powers of 1, 2, 3, etc, the correlation coefficient is raised to a power that is the actual difference in times (e.g. \(t_2-t_1\) for the correlation between time 1 and time 2). This method requires quantitative values for the time variable in the data so that the exponents in the SP(POW) structure can be calculated. If an analysis is run wherein the repeated measures are equally spaced in time, the AR(1) and SP(POW) structures yield identical results.

Unstructured Covariance

\( \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{13} & \sigma_{14} \\ \sigma_{12} & \sigma_2^2 & \sigma_{23} & \sigma_{24} \\ \sigma_{13} & \sigma_{23} & \sigma_3^2 & \sigma_{34}\\ \sigma_{14} & \sigma_{24} & \sigma_{34} & \sigma_4^2 \end{bmatrix}\)

The unstructured covariance structure (UN) is the most complex because it is estimating unique correlations for each pair of time points. As there are many parameters (all distinct correlations), the estimates most times will not be computable. SAS for instance may return an error message indicating that there are too many parameters to estimate with the data.

Choosing the Best Covariance Structure

The fit statistics used for model selection can also be utilized in choosing the best covariance matrix. The model selections most commonly supported by software are -2 Res Log Likelihood, Akaike’s information criterion - corrected (AICC), and Bayesian Information Criteria (BIC). These statistics are functions of the log likelihood and can be compared across different models, as well as different covariance structures provided the effects are the same in each model. The smaller the criterion statistics value is, the better the model is, and if the criterion values are close, the simpler model is preferred.

BIC tends to choose simpler models compared to AICC. Choosing a model that is too simple however inflates the Type I error rate. Therefore, if controlling Type I error is of importance, AICC may be the better criterion. On the other hand, if loss of power is of more concern, BIC might be preferable (Guerin and Stroup, 2000).

In addition to using the above fit statistics, graphical approaches are also available. See Graphical Approach for more details. In some situations, combining information from both approaches to make the final choice may also prove to be beneficial.

11.4 - Repeated Measures: Example

SAS® Example

For the example dataset Repeated Measures Example Data we introduced in the ‘Correlated Residuals’ section, we can plot the data as follows.

The values of the response are plotted at each of the three time points for each of the 9 subjects.

We can obtain the results for the split-plot in time approach using the following:

/*Split-Plot in Time */
proc mixed data=rmanova method=type3;
class trt time subject;
model resp=trt time trt*time / ddfm=kr;
random subject(trt); title 'Split-Plot in Time';
run;

Type 3 Analysis of Variance
Source	DF	Sum of Squares	Mean Square	Error Term	F Value	Pr > F
trt	2	64.518519	32.259259	MS(subject(trt))	7.14	0.0259
time	2	1300.962963	650.481481	MS(Residual)	605.62	< .0001
trt*time	4	41.481481	10.370370	MS(Residual)	9.66	0.0010
subject(trt)	6	27.111111	4.518519	MS(Residual)	4.21	0.0165
Residual	12	12.888889	1.074074

Next, we run the analysis as a repeated-measures ANOVA, which allows us to evaluate which covariance structure fits best.

/*Repeated Measures Approach*/
/*Fitting Covariance structires: */
/*Note: the code begining with "ods output ..." for each
run of the Mixed procedure generates an output that
is tabulated at the end to enable comparison of
the candidate covariance structure*/
proc mixed data=rmanova;
class trt time subject;
model resp=trt time trt*time / ddfm=kr;
repeated time/subject=subject(trt) type=cs rcorr;
ods output FitStatistics=FitCS (rename=(value=CS))
FitStatistics=FitCSp;
title 'Compound Symmetry'; run;
title ' '; run;
proc mixed data=rmanova;
class trt time subject;
model resp=trt time trt*time / ddfm=kr;
repeated time/subject=subject(trt) type=ar(1) rcorr;
ods output FitStatistics=FitAR1 (rename=(value=AR1))
FitStatistics=FitAR1p;
title 'Autoregressive Lag 1'; run;
title ' '; run;
proc mixed data=rmanova;
class trt time subject;
model resp=trt time trt*time / ddfm=kr;
repeated time/subject=subject(trt) type=un rcorr;
ods output FitStatistics=FitUN (rename=(value=UN))
FitStatistics=FitUNp;
title 'Unstructured'; run;
title ' '; run;
data fits;
merge FitCS FitAR1 FitUN;
by descr;
run;
ods listing; proc print data=fits; run;

We get the following Summary Table:

Obs	Descr	CS	AR1	UN
1	-2 Res Log Likelihood	70.9	71.9	63.0
2	AIC (smaller is better)	74.9	75.9	75.0
3	AICC (smaller is better)	75.7	76.7	82.6
4	BIC (smaller is better)	75.3	76.3	76.2

Using the AICC as our criteria, we would choose the compound symmetry (CS) covariance structure.

The output from this would be:

Type 3 Test of Fixed Effect
Effect	Num DF	Den DF	F Value	Pr > F
trt	2	6	7.14	0.0259
time	2	12	605.62	< .0001
trt*time	4	12	9.66	0.0010

Note! The p-values obtained are identical to the split-plot in time approach for this case because a CS covariance structure was used.

11.5 - Lesson 11 Summary

This lesson introduced us to the topic of repeated measures designs. The focus was on repeated measures in time where each experimental unit is assigned to exactly one treatment level the response is observed over several time periods. The responses from the same experimental unit observed over time can be correlated and the model assumption of independent observations is no longer valid. Therefore, an appropriate covariance structure should be imposed to account for the correlated nature of the response, and the best is chosen based on fit statistics.

Other scenarios can result in repeated measures, not necessarily over time. In a repeated measure design, the important feature is that multiple measurements are being made on the same experimental unit. Another case of this is the cross-over design wherein the treatments themselves are switched on the same experimental unit during the course of the experiment. This will be the topic of the next lesson.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility