Lesson 14: TwoFactor Analysis of Variance
Lesson 14: TwoFactor Analysis of VarianceOverview
In the previous lesson, we learned how to conduct an analysis of variance in an attempt to learn whether a (that's one!) factor played a role in the observed responses. For example, we investigated whether the learning method (the factor) influenced a student's exam score (the response). We also investigated whether tire brand (the factor) influenced a car's stopping distance (the response).
What happens if we're not interested in whether one factor is associated with the observed responses, but whether two or three or more factors are associated with the observed responses. For example, we might be interested in learning whether smoking history (one factor) and type of stress test (a second factor) are associated with the time until maximum oxygen uptake (the response). That's the kind of data that we'll learn to analyze in this lesson. Specifically, we'll learn how to conduct a twofactor analysis of variance, so that we can test whether either of the two factors or their interaction are associated with some continuous response.
The reality is this online lesson only contains an example of a twofactor analysis of variance. For the theoretical development, you are asked to refer to the textbook chapter on TwoFactor Analysis of Variance. Pedagogically, it is material that lends itself well to getting practice at learning a new statistical method solely from the formal presentation of a statistical textbook.
14.1  An Example
14.1  An ExampleExample 141
A physiologist was interested in learning whether smoking history and different types of stress tests influence the timing of a subject's maximum oxygen uptake, as measured in minutes. The researcher classified a subject's smoking history as either heavy smoking, moderate smoking, or nonsmoking. He was interested in seeing the effects of three different types of stress tests — a test performed on a bicycle, a test on a treadmill, and a test on steps. The physiologist recruited 9 nonsmokers, 9 moderate smokers, and 9 heavy smokers to participate in his experiment, for a total of n = 27 subjects. He then randomly assigned each of his recruited subjects to undergo one of the three types of stress tests. Here is his resulting data:
Smoking History  Test  

Bicycle (1)  Treadmill (2)  Step Test (3)  
Nonsmoker (1)  12.8, 13.5, 11.2  16.2, 18.1, 17.8  22.6, 19.3, 18.9 
Moderate (2)  10.9, 11.1, 9.8  15.5, 13.8, 16.2  20.1, 21.0, 15.9 
Heavy (3)  8.7, 9.2, 7.5  14.7, 13.2, 8.1  16.2, 16.1, 17.8 
Is there sufficient evidence at the \(\alpha = 0.05\) level to conclude that smoking history has an effect on the time to maximum oxygen uptake? Is there sufficient evidence at the \(\alpha = 0.05\) level to conclude that the type of stress test has an effect on the time to maximum oxygen uptake? And, is there evidence of an interaction between smoking history and the type of stress test?
Answer
Let's start by stating our analysis of variance model, as well as any assumptions that we'll make. Let \(X_{ijk}\) denote the time, in minutes, until maximum oxygen uptake for smoking history \(i = 1, 2, 3\), type of test \(j = 1, 2, 3\), and replicate \(k = 1, 2, 3\). So, for example, \(X_{111} = 12.8 , X_{112} = 13.5\), and so on. Let's assume the \(X_{ijk}\) are mutually independent normal random variables with common variance \(\sigma^2\) and mean:
\(\mu_{ij}=\mu+\alpha_i+\beta_j+\gamma_{ij}\)
subject to the following constraints:
\(\sum\limits_{i=1}^a \alpha_i=0\), \(\sum\limits_{j=1}^b \beta_j=0\), \(\sum\limits_{i=1}^a \gamma_{ij}=0\), and \(\sum\limits_{j=1}^b \gamma_{ij}=0\)
In that case, testing whether or not there is an interaction between smoking history and the type of stress test involves testing the null hypothesis:
\(H_0:\gamma_{ij}=0,for\quad i=1,2,3, and \quad j=1,2,3\)
against all of the possible alternatives. We'll definitely want to engage Minitab in conducting the necessary analysis of variance! To do so, we first enter the data into a Minitab worksheet in an unstacked manner. We then do the following:

Under the Stat menu, we select ANOVA, and then Balanced ANOVA... (our data are "balanced" because every cell contains the same number of measurements, 3).

In the popup window that appears, we specify the Response and the Model:
You might want to take particular note of the way we specify the interaction between smoking status and the type of test in Minitab, namely, as Smoker*Test.

We select OK, and the resulting output appears in the Session Window.
Here's what the output looks like with the row pertaining to the interaction term highlighted in yellow:
Factor  Type  Levels  Values 

Smoker  fixed  3  1, 2, 3 
Test  fixed  3  1, 2, 3 
Source  DF  SS  MS  F  P 

Smoker  2  84.899  42.449  12.90  0.000 
Test  2  298.072  149.036  45.28  0.000 
Smoker*Test  4  2.815  0.704  0.21  0.927 
Error  18  59.247  3.291  
Total  26  445.032 
S = 1.81424 RSq = 86.69% RSq (adj) = 80.77%
As you can see, the Pvalue, 0.927, is very large. We do not reject the null hypothesis that the interaction terms are all zero. That is, there is insufficient evidence at the 0.05 level to conclude that there is an interaction between smoking history and the type of stress test.
Now, testing whether or not smoking history has an effect on the timing of maximum oxygen uptake involves testing the null hypothesis:
\(H_0:\alpha_1=\alpha_2=\alpha_3=0\)
against all of the possible alternatives. Here's what the output looks like with the row pertaining to the smoking history term highlighted in yellow:
Factor  Type  Levels  Values 

Smoker  fixed  3  1, 2, 3 
Test  fixed  3  1, 2, 3 
Source  DF  SS  MS  F  P 

Smoker  2  84.899  42.449  12.90  0.000 
Test  2  298.072  149.036  45.28  0.000 
Smoker*Test  4  2.815  0.704  0.21  0.927 
Error  18  59.247  3.291  
Total  26  445.032 
S = 1.81424 RSq = 86.69% RSq (adj) = 80.77%
As you can see, the Pvalue is very small (< 0.001). We reject the null hypothesis that the smoking history parameters are all zero. That is, there is sufficient evidence at the 0.05 level to conclude that smoking history has an effect on the timing of maximum oxygen uptake.
Now, testing whether or not the type of stress test has an effect on the timing of maximum oxygen uptake involves testing the null hypothesis:
\(H_0:\beta_1=\beta_2=\beta_3=0\)
against all of the possible alternatives. Here's what the output looks like with the row pertaining to the type of stress test term highlighted in yellow:
Factor  Type  Levels  Values 

Smoker  fixed  3  1, 2, 3 
Test  fixed  3  1, 2, 3 
Source  DF  SS  MS  F  P 

Smoker  2  84.899  42.449  12.90  0.000 
Test  2  298.072  149.036  45.28  0.000 
Smoker*Test  4  2.815  0.704  0.21  0.927 
Error  18  59.247  3.291  
Total  26  445.032 
S = 1.81424 RSq = 86.69% RSq (adj) = 80.77%
As you can see, again, the Pvalue is very small (< 0.001). We reject the null hypothesis that the stress test parameters are all zero. That is, there is sufficient evidence at the 0.05 level to conclude that the type of stress test has an effect on the timing of maximum oxygen uptake.
In summary, based on these data, the physiologist can conclude that there appears to be an effect due to smoking history and the type of stress test, but that the data do not suggest that the two factors interact in any way.
Note!
We were able to include an interaction term in our model in the previous example, because we had multiple observations (three, to be exact) falling in each of the cells. That is, if there is only one observation in each cell, we cannot include an interaction term in our model.