8.6 - Interaction Effects

Now that we've clarified what additive effects are, let's take a look at an example where including "interaction terms" is appropriate.

Example

Some researchers (Daniel, 1999) were interested in comparing the effectiveness of three treatments for severe depression. For the sake of simplicity, we denote the three treatments A, B, and C. The researchers collected the following data (depression.txt) on a random sample of n = 36 severely depressed individuals:

y_i = measure of the effectiveness of the treatment for individual i
x_i₁ = age (in years) of individual i
x_i₂ = 1 if individual i received treatment A and 0, if not
x_i₃ = 1 if individual i received treatment B and 0, if not

A scatter plot of the data with treatment effectiveness on the y-axis and age on the x-axis looks like:

depression treatments scatterplot with groups

The blue circles represent the data for individuals receiving treatment A, the red squares represent the data for individuals receiving treatment B, and the green diamonds represent the data for individuals receiving treatment C.

In the previous example, the two estimated regression functions had the same slopes —that is, they were parallel. If you tried to draw three best fitting lines through the data of this example, do you think the slopes of your lines would be the same? Probably not! In this case, we need to include what are called "interaction terms" in our formulated regression model.

A (second-order) multiple regression model with interaction terms is:

\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\epsilon_i\]

where:

y_i = measure of the effectiveness of the treatment for individual i
x_i₁ = age (in years) of individual i
x_i₂ = 1 if individual i received treatment A and 0, if not
x_i₃ = 1 if individual i received treatment B and 0, if not

and the independent error terms ε_i follow a normal distribution with mean 0 and equal variance σ². Perhaps not surprisingly, the terms x_i1x_i2 and x_i1x_i3 are the interaction terms in the model.

Let's investigate our formulated model to discover in what way the predictors have an "interaction effect" on the response. We start by determining the formulated regression function for each of the three treatments. In short —after a little bit of algebra (see below) —we learn that the model defines three different regression functions —one for each of the three treatments:

Treatment	Formulated regression function
If patient receives A, then (x_i₂ = 1, x_i₃ = 0) and ...	\(\mu_Y=(\beta_0+\beta_2)+(\beta_1+\beta_{12})x_{i1}\)
If patient receives B, then (x_i₂ = 0, x_i₃ = 1) and ...	\(\mu_Y=(\beta_0+\beta_3)+(\beta_1+\beta_{13})x_{i1}\)
If patient receives C, then (x_i₂ = 0, x_i₃ = 0) and ...	\(\mu_Y=\beta_0+\beta_{1}x_{i1}\)

So, in what way does including the interaction terms, x_i1x_i2 and x_i1x_i3, in the model imply that the predictors have an "interaction effect" on the mean response? Note that the slopes of the three regression functions differ —the slope of the first line is β₁ + β₁₂, the slope of the second line is β₁ + β₁₃, and the slope of the third line is β₁. What does this mean in a practical sense? It means that...

the effect of the individual's age (x₁) on the treatment's mean effectiveness (μ_Y) depends on the treatment (x₂ and x₃), and ...
the effect of treatment (x₂ and x₃) on the treatment's mean effectiveness (μ_Y) depends on the individual's age (x₁).

In general, then, what does it mean for two predictors "to interact"?

Two predictors interact if the effect on the response variable of one predictor depends on the value of the other.
A slope parameter can no longer be interpreted as the change in the mean response for each unit increase in the predictor, while the other predictors are held constant.

And, what are "interaction effects"?

A regression model contains interaction effects if the response function is not additive and cannot be written as a sum of functions of the predictor variables. That is, a regression model contains interaction effects if:

\[\mu_Y \ne f_1(x_1)+f_1(x_1)+ \cdots +f_{p-1}(x_{p-1})\]

For our example concerning treatment for depression, the mean response:

\[\mu_Y=\beta_0+\beta_1x_{1}+\beta_2x_{2}+\beta_3x_{3}+\beta_{12}x_{1}x_{2}+\beta_{13}x_{1}x_{3}\]

can not be separated into distinct functions of each of the individual predictors. That is, there is no way of "breaking apart" β₁₂x₁x₂ and β₁₃x₁x₃ into distinct pieces. Therefore, we say that x₁ and x₂interact, and x₁ and x₃ interact.

In returning to our example, let's recall that the appropriate steps in any regression analysis are:

Model building
- Model formulation
- Model estimation
- Model evaluation
Model use

So far, within the model building step, all we've done is formulate the regression model as:

\[y_i=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3}+\beta_{12}x_{i1}x_{i2}+\beta_{13}x_{i1}x_{i3}+\epsilon_i\]

After estimating the model, the regression equation is:

minitab output

Now, if we plug the possible values for x₂ and x₃ into the estimated regression function, we obtain the three "best fitting" lines —one for each treatment (A, B and C) —through the data. Here's the algebra for determining the estimated regression function for patients receiving treatment A.

Doing similar algebra for patients receiving treatments B and C, we obtain:

Treatment	Estimated regression function
If patient receives A, then (x₂ = 1, x₃ = 0) and ...	\(\hat{y}=47.5+0.33x_1\)
If patient receives B, then (x₂ = 0, x₃ = 1) and ...	\(\hat{y}=28.9+0.52x_1\)
If patient receives C, then (x₂ = 0, x₃ = 0) and ...	\(\hat{y}=6.21+1.03x_1\)

And, plotting the three "best fitting" lines, we obtain:

depression treatments scatterplot with groups and fitted lines

What do the estimated slopes tell us?

For patients in this study receiving treatment A, the effectiveness of the treatment is predicted to increase 0.33 units for every additional year in age.
For patients in this study receiving treatment B, the effectiveness of the treatment is predicted to increase 0.52 units for every additional year in age.
For patients in this study receiving treatment C, the effectiveness of the treatment is predicted to increase 1.03 units for every additional year in age.

In short, the effect of age on the predicted treatment effectiveness depends on the treatment given. That is, age appears to interact with treatment in its impact on treatment effectiveness. The interaction is exhibited graphically by the "nonparallelness" (is that a word?) of the lines.

Of course, our primary goal is not to draw conclusions about this particular sample of depressed individuals, but rather about the entire population of depressed individuals. That is, we want to use our estimated model to draw conclusions about the larger population of depressed individuals. Before we do so, however, we first should evaluate the model.

The residuals versus fits plot:

residual vs fitted value plot

exhibits all of the "good" behavior, suggesting that the model fits well, there are no obvious outliers, and the error variances are indeed constant. And, the normal probability plot:

normal probability plot

exhibits linear trend and a large P-value, suggesting that the error terms are indeed normally distributed.

Having successfully built —formulated, estimated, and evaluated —a model, we now can use the model to answer our research questions. Let's consider two different questions that we might want answered.

First research question. For every age, is there a difference in the mean effectiveness for the three treatments? As is usually the case, our formulated regression model helps determine how to answer the research question. Our formulated regression model suggests that answering the question involves testing whether the population regression functions are identical.

That is, we need to test the null hypothesis H₀ : β₂ = β₃ = β₁₂ = β₁₃ = 0 against the alternative H_A: at least one of these slope parameters is not 0.

We know how to do that! The relevant software output:

Minitab output

tells us that the appropriate partial F-statistic for testing the above hypothesis is:

\[F=\frac{(803.8+1.19+375+328.42)/4}{15.4}=24.49.\]

To find the P-value:

minitab output

Thus the probability of observing an F-statistic —with 4 numerator and 30 denominator degrees of freedom —less than our observed test statistic 24.49 is > 0.999. Therefore, our P-value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that there is a significant difference in the mean effectiveness for the three treatments.

Second research question. Does the effect of age on the treatment's effectiveness depend on treatment? Our formulated regression model suggests that answering the question involves testing whether the two interaction parameters β₁₂ and β₁₃ are significant. That is, we need to test the null hypothesis H₀ : β₁₂ = β₁₃ = 0 against the alternative H_A: at least one of the interaction parameters is not 0.

The relevant software output:

Minitab output

tells us that the appropriate partial F-statistic for testing the above hypothesis is:

\[F=\frac{(375+328.42)/2}{15.4}=22.84.\]

To find the P-value:

minitab output

Thus the probability of observing an F-statistic — with 2 numerator and 30 denominator degrees of freedom — less than our observed test statistic 22.84 is > 0.999. Therefore, our P-value is < 0.001. We can reject our null hypothesis. There is sufficient evidence at the α = 0.05 level to conclude that the effect of age on the treatment's effectiveness depends on the treatment.

8.6 - Interaction Effects

Example

Navigation

Start Here!

Lessons

Resources