Lesson 5: Multiple Linear Regression

Overview Section

In this lesson, we make our first (and last?!) major jump in the course. We move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors. That is, we use the adjective "simple" to denote that our model has only predictors, and we use the adjective "multiple" to indicate that our model has at least two predictors.

In the multiple regression setting, because of the potentially large number of predictors, it is more efficient to use matrices to define the regression model and the subsequent analyses. This lesson considers some of the more important multiple regression formulas in matrix form. If you're unsure about any of this, it may be a good time to take a look at this Matrix Algebra Review.

The good news!

The good news is that everything you learned about the simple linear regression model extends — with at most minor modifications — to the multiple linear regression model. Think about it — you don't have to forget all of that good stuff you learned! In particular:

  • The models have similar "LINE" assumptions. The only real difference is that whereas in simple linear regression we think of the distribution of errors at a fixed value of the single predictor, with multiple linear regression we have to think of the distribution of errors at a fixed set of values for all the predictors. All of the model-checking procedures we learned earlier are useful in the multiple linear regression framework, although the process becomes more involved since we now have multiple predictors. We'll explore this issue further in Lesson 7.
  • The use and interpretation of \(R^2\) in the context of multiple linear regression remains the same. However, with multiple linear regression, we can also make use of an "adjusted" \(R^2\) value, which is useful for model-building purposes. We'll explore this measure further in Lesson 10.
  • With a minor generalization of the degrees of freedom, we use t-tests and t-intervals for the regression slope coefficients to assess whether a predictor is significantly linearly related to the response, after controlling for the effects of all the other predictors in the model.
  • With a minor generalization of the degrees of freedom, we use prediction intervals for predicting an individual response and confidence intervals for estimating the mean response. We'll explore these further in Lesson 7.

Objectives

Upon completion of this lesson, you should be able to:

  • Know how to calculate a confidence interval for a single slope parameter in the multiple regression setting.
  • Be able to interpret the coefficients of a multiple regression model.
  • Understand what the scope of the model is in the multiple regression model.
  • Understand the calculation and interpretation of R2 in a multiple regression setting.
  • Understand the calculation and use of adjusted R2 in a multiple regression setting.

Lesson 5 Code Files Section

Below is a zip file that contains all the data sets used in this lesson:

STAT501_Lesson05.zip

  • babybirds.txt
  • bodyfat.txt
  • hospital_infct.txt
  • fev_dat.txt
  • iqsize.txt
  • pastry.txt
  • soapsuds.txt
  • stat_females.txt