Lesson 12: Multicollinearity & Other Regression Pitfalls

Overview Section

So far, in our study of multiple regression models, we have ignored something that we probably shouldn't have — and that's what is called multicollinearity. We're going to correct our blissful ignorance in this lesson.

Multicollinearity exists when two or more of the predictors in a regression model are moderately or highly correlated. Unfortunately, when it exists, it can wreak havoc on our analysis and thereby limit the research conclusions we can draw. As we will soon learn, when multicollinearity exists, any of the following pitfalls can be exacerbated:

  • the estimated regression coefficient of any one variable depends on which other predictors are included in the model
  • the precision of the estimated regression coefficients decreases as more predictors are added to the model
  • the marginal contribution of any one predictor variable in reducing the error sum of squares depends on which other predictors are already in the model
  • hypothesis tests for \(\beta_k = 0\) may yield different conclusions depending on which predictors are in the model

In this lesson, we'll take a look at an example or two that illustrates each of the above outcomes. Then, we'll spend some time learning how not only to detect multicollinearity but also how to reduce it once we've found it.

We'll also consider other regression pitfalls, including extrapolation, nonconstant variance, autocorrelation, overfitting, excluding important predictor variables, missing data, power, and sample size.

Objectives

Upon completion of this lesson, you should be able to:

  • Distinguish between structural multicollinearity and data-based multicollinearity.
  • Know what multicollinearity means.
  • Understand the effects of multicollinearity on various aspects of regression analyses.
  • Understand the effects of uncorrelated predictors on various aspects of regression analyses.
  • Understand variance inflation factors, and how to use them to help detect multicollinearity.
  • Know the two ways of reducing data-based multicollinearity.
  • Understand how centering the predictors in a polynomial regression model helps to reduce structural multicollinearity.
  • Know the main issues surrounding other regression pitfalls, including extrapolation, nonconstant variance, autocorrelation, overfitting, excluding important predictor variables, missing data, and power, and sample size.

Lesson 12 Code Files Section

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson12.zip

  • allentest.txt
  • allentestn23.txt
  • bloodpress.txt
  • cement.txt
  • exerimmun.txt
  • poverty.txt
  • uncorrelated.txt
  • uncorrpreds.txt