8.6 - Interaction Effects
Now that we've clarified what additive effects are, let's take a look at an example where including "interaction terms" is appropriate.
Example
Some researchers (Daniel, 1999) were interested in comparing the effectiveness of three treatments for severe depression. For the sake of simplicity, we denote the three treatments A, B, and C. The researchers collected the following data (depression.txt) on a random sample of n = 36 severely depressed individuals:
- yi = measure of the effectiveness of the treatment for individual i
- xi1 = age (in years) of individual i
- xi2 = 1 if individual i received treatment A and 0, if not
- xi3 = 1 if individual i received treatment B and 0, if not
A scatter plot of the data with treatment effectiveness on the y-axis and age on the x-axis looks like:
The blue circles represent the data for individuals receiving treatment A, the red squares represent the data for individuals receiving treatment B, and the green diamonds represent the data for individuals receiving treatment C.
In the previous example, the two estimated regression functions had the same slopes —that is, they were parallel. If you tried to draw three best fitting lines through the data of this example, do you think the slopes of your lines would be the same? Probably not! In this case, we need to include what are called "interaction terms" in our formulated regression model.
A (second-order) multiple regression model with interaction terms is:
\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\epsilon_i\]
where:
- yi = measure of the effectiveness of the treatment for individual i
- xi1 = age (in years) of individual i
- xi2 = 1 if individual i received treatment A and 0, if not
- xi3 = 1 if individual i received treatment B and 0, if not
and the independent error terms εi follow a normal distribution with mean 0 and equal variance σ2. Perhaps not surprisingly, the terms xi1xi2 and xi1xi3 are the interaction terms in the model.
Let's investigate our formulated model to discover in what way the predictors have an "interaction effect" on the response. We start by determining the formulated regression function for each of the three treatments. In short —after a little bit of algebra (see below) —we learn that the model defines three different regression functions —one for each of the three treatments:
Treatment | Formulated regression function |
If patient receives A, then (xi2 = 1, xi3 = 0) and ... |
\(\mu_Y=(\beta_0+\beta_2)+(\beta_1+\beta_{12})x_{i1}\) |
If patient receives B, then (xi2 = 0, xi3 = 1) and ... |
\(\mu_Y=(\beta_0+\beta_3)+(\beta_1+\beta_{13})x_{i1}\) |
If patient receives C, then (xi2 = 0, xi3 = 0) and ... |
\(\mu_Y=\beta_0+\beta_{1}x_{i1}\) |
So, in what way does including the interaction terms, xi1xi2 and xi1xi3, in the model imply that the predictors have an "interaction effect" on the mean response? Note that the slopes of the three regression functions differ —the slope of the first line is β1 + β12, the slope of the second line is β1 + β13, and the slope of the third line is β1. What does this mean in a practical sense? It means that...
- the effect of the individual's age (x1) on the treatment's mean effectiveness (μY) depends on the treatment (x2 and x3), and ...
- the effect of treatment (x2 and x3) on the treatment's mean effectiveness (μY) depends on the individual's age (x1).
In general, then, what does it mean for two predictors "to interact"?
- Two predictors interact if the effect on the response variable of one predictor depends on the value of the other.
- A slope parameter can no longer be interpreted as the change in the mean response for each unit increase in the predictor, while the other predictors are held constant.
And, what are "interaction effects"?
A regression model contains interaction effects if the response function is not additive and cannot be written as a sum of functions of the predictor variables. That is, a regression model contains interaction effects if:
\[\mu_Y \ne f_1(x_1)+f_1(x_1)+ \cdots +f_{p-1}(x_{p-1})\]
For our example concerning treatment for depression, the mean response:
\[\mu_Y=\beta_0+\beta_1x_{1}+\beta_2x_{2}+\beta_3x_{3}+\beta_{12}x_{1}x_{2}+\beta_{13}x_{1}x_{3}\]
can not be separated into distinct functions of each of the individual predictors. That is, there is no way of "breaking apart" β12x1x2 and β13x1x3 into distinct pieces. Therefore, we say that x1 and x2 interact, and x1 and x3 interact.
In returning to our example, let's recall that the appropriate steps in any regression analysis are:
- Model building
- Model formulation
- Model estimation
- Model evaluation
- Model use
So far, within the model building step, all we've done is formulate the regression model as:
\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\epsilon_i\]
After estimating the model, the regression equation is:
Now, if we plug the possible values for x2 and x3 into the estimated regression function, we obtain the three "best fitting" lines —one for each treatment (A, B and C) —through the data. Here's the algebra for determining the estimated regression function for patients receiving treatment A.
Doing similar algebra for patients receiving treatments B and C, we obtain:
Treatment | Estimated regression function |
If patient receives A, then (x2 = 1, x3 = 0) and ... |
\(\hat{y}=47.5+0.33x_1\) |
If patient receives B, then (x2 = 0, x3 = 1) and ... |
\(\hat{y}=28.9+0.52x_1\) |
If patient receives C, then (x2 = 0, x3 = 0) and ... |
\(\hat{y}=6.21+1.03x_1\) |
And, plotting the three "best fitting" lines, we obtain:
What do the estimated slopes tell us?
- For patients in this study receiving treatment A, the effectiveness of the treatment is predicted to increase 0.33 units for every additional year in age.
- For patients in this study receiving treatment B, the effectiveness of the treatment is predicted to increase 0.52 units for every additional year in age.
- For patients in this study receiving treatment C, the effectiveness of the treatment is predicted to increase 1.03 units for every additional year in age.
In short, the effect of age on the predicted treatment effectiveness depends on the treatment given. That is, age appears to interact with treatment in its impact on treatment effectiveness. The interaction is exhibited graphically by the "nonparallelness" (is that a word?) of the lines.
Of course, our primary goal is not to draw conclusions about this particular sample of depressed individuals, but rather about the entire population of depressed individuals. That is, we want to use our estimated model to draw conclusions about the larger population of depressed individuals. Before we do so, however, we first should evaluate the model.
The residuals versus fits plot:
exhibits all of the "good" behavior, suggesting that the model fits well, there are no obvious outliers, and the error variances are indeed constant. And, the normal probability plot:
exhibits linear trend and a large P-value, suggesting that the error terms are indeed normally distributed.
Having successfully built —formulated, estimated, and evaluated —a model, we now can use the model to answer our research questions. Let's consider two different questions that we might want answered.
First research question. For every age, is there a difference in the mean effectiveness for the three treatments? As is usually the case, our formulated regression model helps determine how to answer the research question. Our formulated regression model suggests that answering the question involves testing whether the population regression functions are identical.
That is, we need to test the null hypothesis H0 : β2 = β3 = β12 = β13 = 0 against the alternative HA : at least one of these slope parameters is not 0.
We know how to do that! The relevant software output:
tells us that the appropriate partial F-statistic for testing the above hypothesis is:
\[F=\frac{(803.8+1.19+375+328.42)/4}{15.4}=24.49.\]
To find the P-value:
Thus the probability of observing an F-statistic —with 4 numerator and 30 denominator degrees of freedom —less than our observed test statistic 24.49 is > 0.999. Therefore, our P-value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that there is a significant difference in the mean effectiveness for the three treatments.
Second research question. Does the effect of age on the treatment's effectiveness depend on treatment? Our formulated regression model suggests that answering the question involves testing whether the two interaction parameters β12 and β13 are significant. That is, we need to test the null hypothesis H0 : β12 = β13 = 0 against the alternative HA : at least one of the interaction parameters is not 0.
The relevant software output:
tells us that the appropriate partial F-statistic for testing the above hypothesis is:
\[F=\frac{(375+328.42)/2}{15.4}=22.84.\]
To find the P-value:
Thus the probability of observing an F-statistic — with 2 numerator and 30 denominator degrees of freedom — less than our observed test statistic 22.84 is > 0.999. Therefore, our P-value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that the effect of age on the treatment's effectiveness depends on the treatment.