9.6 - Interactions Between Quantitative Predictors
9.6 - Interactions Between Quantitative PredictorsIntroduction
We introduced interactions in the previous lesson, where we created interaction terms between indicator variables and quantitative predictors to allow for different "slopes" for levels of a categorical predictor. We can also create interaction terms between quantitative predictors, which allow the relationship between the response and one predictor to vary with the values of another predictor. Interestingly, this provides another way to introduce curvature into a multiple linear regression model. For example, consider a model with two quantitative predictors, which we can visualize in a three-dimensional scatterplot with the response values placed vertically as usual and the predictors placed along the two horizontal axes. A multiple linear regression model with just these two predictors results in a fitted regression plane that looks like a flat piece of paper. If, however, we include an interaction between the predictors in our model, then the fitted regression plane looks like a rectangular piece of paper that has been warped (with edges that slope at different angles). This creates a three-dimensional plane that has been warped, as illustrated below for the equation \(y = 3x_1 + 5x_2 + 4x_1 x_2\).
Note that for fixed \(x_1\), \(y\) is a function of \(x_2\) only, and the slopes in the cross-section vary: for example, when \(x_1 = -4\), the slope is \(5 + 4(-4) = -11\), but when \(x_1 = 4\), the slope is \(5 + 4(4) = 21\). Similarly, for fixed \(x_2\), \(y\) is a function of \(x_1\) only, and the slopes in the cross-section vary: for example, when \(x_2 = -4\), the slope is \(3 + 4(-4) = -9\), but when \(x_2 = 4\), the slope is \(3 + 4(4) = 19\).
Typically, regression models that include interactions between quantitative predictors adhere to the hierarchy principle, which says that if your model includes an interaction term, \(X_1X_2\), and \(X_1X_2\) is shown to be a statistically significant predictor of Y, then your model should also include the "main effects," \(X_1\) and \(X_2\), whether or not the coefficients for these main effects are significant. Depending on the subject area, there may be circumstances where the main effect could be excluded, but this tends to be the exception.
We can use interaction terms in any multiple linear regression model. Here we consider an example with two quantitative predictors and one indicator variable for a categorical predictor. In Lesson 5 we looked at some data resulting from a study in which the researchers (Colby, et al, 1987) wanted to determine if nestling bank swallows alter the way they breathe to survive the poor air quality conditions of their underground burrows. In reality, the researchers studied not only the breathing behavior of nestling bank swallows, but that of adult bank swallows as well.
To refresh your memory, the researchers conducted the following randomized experiment on 120 nestling bank swallows. In an underground burrow, they varied the percentage of oxygen at four different levels (13%, 15%, 17%, and 19%) and the percentage of carbon dioxide at five different levels (0%, 3%, 4.5%, 6%, and 9%). Under each of the resulting 5×4 = 20 experimental conditions, the researchers observed the total volume of air breathed per minute for each of the 6 nestling bank swallows. They replicated the same randomized experiment on 120 adult bank swallows. In this way, they obtained the following data (Swallows data) on n = 240 swallows:
- Response (y): percentage increase in "minute ventilation", (Vent), i.e., the total volume of air breathed per minute.
- Potential predictor (\(x_{1} \colon \) percentage of oxygen (O2) in the air the swallows breathe
- Potential predictor (\(x_{2} \colon \) percentage of carbon dioxide (CO2)in the air the baby birds breathe
- Potential qualitative predictor (\(x_{3} \colon \) (Type) 1 if the bird is an adult, 0 if the bird is a nestling
Here's a plot of the resulting data for the adult swallows:
and a plot of the resulting data for the nestling bank swallows:
As mentioned previously, the "best fitting" function through each of the above plots will be some sort of surface like a sheet of paper. If you drag the button to the right, you will see one possible estimate of the surface for the nestlings:
What we don't know is if the best fitting function — that is, the sheet of paper — through the data will be curved or not. Including interaction terms in the regression model allows the function to have some curvature while leaving interaction terms out of the regression model forces the function to be flat.
Let's consider the research question "is there any evidence that the adults differ from the nestlings in terms of their minute ventilation as a function of oxygen and carbon dioxide?"
We could start by formulating the following multiple regression model with two quantitative predictors and one qualitative predictor:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+ \epsilon_i\)
where:
- \(y_{i}\) is the percentage of minute ventilation for swallow i
- \(x_{i1}\) is the percentage of oxygen for swallow i
- \(x_{i2}\) is the percentage of carbon dioxide for swallow i
- \(x_{i3}\) is the type of bird (0, if nestling and 1, if adult) for swallow i
and the independent error terms \(\epsilon_{i}\) follow a normal distribution with mean 0 and equal variance \(\sigma^{2}\).
We now know, however, that there is a risk in omitting an important interaction term. Therefore, let's instead formulate the following multiple regression model with three interaction terms:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\beta_{23}x_{i2}x_{i3})+ \epsilon_i\)
where:
- \(y_{i}\) is the percentage of minute ventilation for swallow i
- \(x_{i1}\) is the percentage of oxygen for swallow i
- \(x_{i2}\) is the percentage of carbon dioxide for swallow i
- \(x_{i3}\) is the type of bird (0, if nestling and 1, if adult) for swallow i
- \(x_{i1} x_{i2}, x_{i1} x_{i3}, \text{ and } x_{i2} x_{i3} \) are interaction terms
By setting the predictor \(x_{3}\) to equal 0 and 1 and doing a little bit of algebra we see that our formulated model yields two response functions — one for each type of bird:
Type of bird | Formulated regression function |
---|---|
If a nestling, then \(x_{i3} = 0 \) and ... | \(\mu_Y=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_{12}x_{i1}x_{i2}\) |
If an adult, then \(x_{i3} = 1 \) and ... | \(\mu_Y=(\beta_0+\beta_3)+(\beta_1+\beta_{13})x_{i1}+(\beta_2+\beta_{23})x_{i2}+\beta_{12}x_{i1}x_{i2}\) |
The \(\beta_12 x_{i1} x_{i2}\) interaction term appearing in both functions allows the two functions to have the same curvature. The additional \(\beta_{13}\) parameter appearing before the \(x_{i1}\) predictor in the regression function for the adults allows the adult function to be shifted from the nestling function in the \(x_{i1 }\) direction by \(\beta_{13}\) units. And, the additional \(\beta_{23}\) parameter appearing before the \(x_{i2}\) predictor in the regression function for the adults allows the adult function to be shifted from the nestling function in the \(x_{i2}\) direction by \(\beta_{23}\) units.
To add interaction terms to a model in Minitab click the "Model" tab in the Regression Dialog, then select the predictor terms you wish to create interaction terms for, then click "Add" for "Interaction through order 2." You should see the appropriate interaction terms added to the list of "Terms in the model." If we do this in Minitab to estimate the regression function above, we obtain:
Analysis of Variance
Source | DF | Seq SS | Seq MS | F-Value | P-Value |
---|---|---|---|---|---|
Regression | 6 | 2387540 | 397923 | 14.51 | 0.000 |
O2 | 1 | 93651 | 93651 | 3.42 | 0.066 |
CO2 | 1 | 2247696 | 2247696 | 81.9 | 0.000 |
Type | 1 | 5910 | 5910 | 0.22 | 0.643 |
TypeO2 | 1 | 14735 | 14735 | 0.45 | 0.464 |
TypeCO2 | 1 | 2884 | 2884 | 0.11 | 0.746 |
CO2O2 | 1 | 22664 | 22664 | 0.83 | 0.364 |
Error | 233 | 6388603 | 27419 | ||
Lack-of-Fit | 33 | 485700 | 14718 | 0.50 | 0.990 |
Pure Error | 200 | 5902903 | 29515 | ||
Total | 239 | 8776143 |
Model Summary
S | R-sq | R-sq(adj) | R-sq(pred) |
---|---|---|---|
165.587 | 27.20% | 25.33% | 22.44% |
Regression Equation
\(\widehat{Vent} = -18 + 1.19 O2 + 54.3 CO2 +112 Type - 7.01 TypeO2 + 2.31 TypeCO2 - 1.45 CO2O2\)
Again, however, we should minimize the number of hypothesis tests we perform — and thereby reduce the chance of committing a Type I error on — our data, by instead conducting a partial F-test for testing \(H_0 \colon \beta_{12} = \beta_{13} = \beta_{23} = 0\) simultaneously:
Analysis of Variance
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Regression | 6 | 2387540 | 397923 | 14.51 | 0.000 |
Residual Error | 233 | 6388603 | 27419 | ||
Total | 239 | 8776143 |
Source | DF | Seq SS |
---|---|---|
O2 | 1 | 93651 |
CO2 | 1 | 2247696 |
Type | 1 | 5910 |
TypeO2 | 1 | 14735 |
TypeCO2 | 1 | 2884 |
CO2O2 | 1 | 22664 |
The Minitab output allows us to determine that the partial F-statistic is:
\(F^*=\dfrac{(14735+2884+22664)/3}{27419}=0.49\)
And the following Minitab output:
F distribution with 3 DF in numerator and 233 DF in denominator
x | P(X ≤ x) |
---|---|
0.49 | 0.310445 |
This tells us that the probability of observing an F-statistic less than 0.49, with 3 numerator and 233 denominator degrees of freedom, is 0.31. Therefore, the probability of observing an F-statistic greater than 0.49, with 3 numerator and 233 denominator degrees of freedom, is 1-0.31 or 0.69. That is, the P-value is 0.69. There is insufficient evidence at the 0.05 level to conclude that at least one of the interaction parameters is not 0.
The residual versus fits plot:
also suggests that there is something not quite right about the fit of the model containing interaction terms.
Incidentally, if we go back and re-examine the two scatter plots of the data — one for the adults:
and one for the nestlings:
we see that it is believable that there are no interaction terms. If you tried to "draw" the "best fitting" function through each scatter plot, the two functions would probably look like two parallel planes.
So, let's go back to formulating the model with no interactions terms:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)
where:
- \(y_{i}\) is the percentage of minute ventilation for swallow i
- \(x_{i1}\) is the percentage of oxygen for swallow i
- \(x_{i2}\) is the percentage of carbon dioxide for swallow i
- \(x_{i3}\) is the type of bird (0, if nestling and 1, if adult) for swallow i
and the independent error terms \(\epsilon_{i}\) follow a normal distribution with mean 0 and equal variance \(\sigma^{2}\).
Using Minitab to estimate the regression function, we obtain:
Analysis of Variance
Source | DF | Seq SS | Seq MS | F-Value | P-Value |
---|---|---|---|---|---|
Regression | 3 | 2347257 | 782419 | 28.72 | 0.000 |
O2 | 1 | 93651 | 93651 | 3.44 | 0.065 |
CO2 | 1 | 2247696 | 2247696 | 82.51 | 0.000 |
Type | 1 | 5910 | 5910 | 0.22 | 0.642 |
Error | 236 | 6428886 | 27241 | ||
Lack-of-Fit | 36 | 525983 | 14611 | 0.50 | 0.993 |
Pure Error | 200 | 5902903 | 29515 | ||
Total | 239 | 8776143 |
\(\widehat{Vent} = 136.8 - 8.83 O2 + 32.26 CO2 + 9.9 Type\)
Let's finally answer our primary research question: "is there any evidence that the adult swallows differ from the nestling swallows in terms of their minute ventilation as a function of oxygen and carbon dioxide?" To answer the question, we need only test the null hypothesis \(H_0 \colon \beta_3 = 0\). And, Minitab reports that the P-value is 0.642. We fail to reject the null hypothesis at any reasonable significance level. There is insufficient evidence to conclude that adult swallows differ from nestling swallows concerning their minute ventilation.
Incidentally, we should have evaluated the model, before using the model to answer the research question. All is fine, however. The residuals versus fits plot for the model with no interaction terms:
shows a marked improvement over the residuals versus fits plot for the model with the interaction terms. Perhaps there is a little bit of fanning? A little bit, but perhaps not enough to worry about.
And, the normal probability plot:
suggests there is no reason to worry about non-normal error terms.