8.7 - Leaving an Important Interaction Out of a Model

Before we take a look at another example of a regression model containing interaction terms, let's take a little detour to explore the impact of leaving a necessary interaction term out of the model. To do so, let's consider some contrived data for which there is a response y and two predictors —one quantitative predictor x and one qualitative predictor that takes on values 0 or 1. Looking at a plot of the data:

scatterplot

consider two questions:

  • Does the plot suggest x is related to y? Sure! For the 0 group, as x increases, y decreases, while for the 1 group, as x increases, y also increases.
  • Does the plot suggest there is a treatment effect? Yes! If you look at any one particular x value, say 1 for example, the mean response y is about 8 for the 0 group and about 2 for the 1 group. In this sense, there is a treatment effect.

As we now know, the answer to the first question suggests that the effect of x on y depend on the group. That is, the group and x appear to interact. Therefore, our regression model should contain an interaction term between the two predictors. But, let's see what happens if we ignore our intuition and don't add the interaction term! That is, let's formulate our regression model as:

\[y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2})+\epsilon_i\]

where:

  • yi is the response
  • xi1 is the quantitative predictor you want to "adjust for "
  • xi2 is the qualitative group predictor, where 0 denotes the first group and 1 denotes the second group

and the independent error terms εi follow a normal distribution with mean 0 and equal variance σ2.

Now, let's see what conclusions we draw when we fit our contrived data to our formulated model with no interaction term:

minitab output

Consider our two research questions:

  • Is x related to y? The P-value for testing H0: β1 = 0 is 0.831. There is insufficient evidence at the 0.05 level to conclude that x is related to y. What?! This conclusion contradicts what we'd expect from the plot.
  • Is there a treatment effect? The P-value for testing H0: β2 = 0 is 0.125. There is insufficient evidence at the 0.05 level to conclude that there is a treatment effect. Again, this conclusion contradicts what we'd expect from the plot.

A side note. By conducting the above two tests independently, we increase our chance of making at least one Type I error. Since we are interested in answering both research questions, we could minimize our chance of making a Type I error by conducting the partial F-test for testing, H0: β1 = β2 = 0, that is, that both parameters are simultaneously zero.

Now, let's try to understand why our conclusions don't agree with our intuition based on the plot. If we plug the values 0 and 1 into the group variable of the estimated regression equation we obtain two parallel lines —one for each group.

A plot of the resulting estimated regression functions:

scatterplot with regression lines

suggest that the lines don't fit the data very well. By leaving the interaction term out of the model, we have forced the "best fitting lines" to be parallel, when they clearly shouldn't be. The residuals versus fits plot:

residual plot

provides further evidence that our formulated model does not fit the data well. We now know that the resulting cost is conclusions that just don't make sense.

Let's analyze the data again, but this time with a more appropriately formulated model. Consider the regression model with the interaction term:

\[y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_{12}x_{i1}x_{i2})+\epsilon_i\]

where:

  • yi is the response
  • xi1 is the quantitative predictor you want to "adjust for "
  • xi2 is the qualitative group predictor, where 0 denotes the first group and 1 denotes the second group
  • xi1xi2 is the "missing" interaction term

and the independent error terms εi follow a normal distribution with mean 0 and equal variance σ2.

Upon fitting the data to the model with an interaction term, the estimated regression equation is:

minitab output

If we now plug the values 0 and 1 into the group variable of the estimated regression equation we obtain two intersecting lines —one for each group:

scatterplot with regression lines

Wow —what a difference! Our formulated regression model now allows the slopes of the two lines to be different. As a result, the lines do a much better job of summarizing the trend in the data. The residuals versus fits plot is about as good as it gets!

residual plot

The plot provides further evidence that the model with the interaction term does a good job of describing the data.

Okay, so the model with the interaction term does a better job of describing the data than the model with no interaction term. Does it also provide answers to our research questions that make sense?

Let's first consider the question "does the effect of x on response y depend on the group?" That is, is there an interaction between x and group? The software output:

minitab output

tells us that the P-value for testing H0 : β12 = 0 is < 0.001. There is strong evidence at the 0.05 level to reject the null hypothesis and conclude that there is indeed an interaction between x and group. Aha —our formulated model and resulting analysis yielded a conclusion that makes sense!

Now, what about the research questions "is x related to y?" and "is there a treatment effect?" Because there is an interaction between x and group, it really doesn't make sense to talk about the effect of x on y without taking into account group. And, it doesn't really make sense to talk about differences in the two groups without taking into account x. That is, neither of these two research questions make sense in the presence of the interaction. This is why you'll often hear statisticians say "never interpret a main effect in the presence of an interaction."

In short, the moral of the story of this little detour that we took is that if we leave an important interaction term out of our model, our analysis can lead us to make erroneous conclusions.