Lesson 5: Multiple Linear Regression (MLR) Model & Evaluation

Overview of this Lesson

In this lesson, we make our first (and last?!) major jump in the course. We move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors. That is, we use the adjective "simple" to denote that our model has only predictor, and we use the adjective "multiple" to indicate that our model has at least two predictors.

In the multiple regression setting, because of the potentially large number of predictors, it is more efficient to use matrices to define the regression model and the subsequent analyses. This lesson considers some of the more important multiple regression formulas in matrix form. If you're unsure about any of this, it may be a good time to take a look at this Matrix Algebra Review.

The good news is that everything you learned about the simple linear regression model extends — with at most minor modification — to the multiple linear regression model. Think about it — you don't have to forget all of that good stuff you learned! In particular:

  • The models have similar "LINE" assumptions. The only real difference is that whereas in simple linear regression we think of the distribution of errors at a fixed value of the single predictor, with multiple linear regression we have to think of the distribution of errors at a fixed set of values for all the predictors. All of the model checking procedures we learned earlier are useful in the multiple linear regression framework, although the process becomes more involved since we now have multiple predictors. We'll explore this issue further in Lesson 6.
  • The use and interpretation of r2 (which we'll denote R2 in the context of multiple linear regression) remains the same. However, with multiple linear regression we can also make use of an "adjusted" R2 value, which is useful for model building purposes. We'll explore this measure further in Lesson 11.
  • With a minor generalization of the degrees of freedom, we use t-tests and t-intervals for the regression slope coefficients to assess whether a predictor is significantly linearly related to the response, after controlling for the effects of all the opther predictors in the model.
  • With a minor generalization of the degrees of freedom, we use confidence intervals for estimating the mean response and prediction intervals for predicting an individual response. We'll explore these further in Lesson 6.

For the simple linear regression model, there is only one slope parameter about which one can perform hypothesis tests. For the multiple linear regression model, there are three different hypothesis tests for slopes that one could conduct. They are:

  • a hypothesis test for testing that one slope parameter is 0
  • a hypothesis test for testing that all of the slope parameters are 0
  • a hypothesis test for testing that a subset — more than one, but not all — of the slope parameters are 0

In this lesson, we also learn how to perform each of the above three hypothesis tests.

Key Learning Goals for this Lesson:
  • Be able to interpret the coefficients of a multiple regression model.
  • Understand what the scope of the model is in the multiple regression model.
  • Understand the calculation and interpretation of R2 in a multiple regression setting.
  • Understand the calculation and use of adjusted R2 in a multiple regression setting.
  • Translate research questions involving slope parameters into the appropriate hypotheses for testing.
  • Know how to calculate a confidence interval for a single slope parameter in the multiple regression setting.
  • Understand the general idea behind the general linear F-test.
  • Understand the decomposition of a regression sum of squares into a sum of sequential sums of squares.
  • Calculate a sequential sums of squares using either of the two definitions.
  • Know how to obtain a two (or more)-degree-of-freedom sequential sum of squares.
  • Perform a general hypothesis test using the general linear F-test and relevant statistical software output.
  • Know how to specify the null and alternative hypotheses and be able to draw a conclusion given appropriate software output for the overall F-test for H0: β1 = ... = βk = 0.
  • Know how to specify the null and alternative hypotheses and be able to draw a conclusion given appropriate software output for the general linear F-test for any subset of the slope parameters.
  • Know how to specify the null and alternative hypotheses and be able to draw a conclusion given appropriate software output for the t-test or general linear F-test for H0: βp = 0.
  • Understand that the t-test for a slope parameter tests the marginal significance of the predictor after adjusting for the other predictors in the model (as can be justified by the equivalence of the t-test and the corresponding general linear F-test for one slope).
  • Calculate and understand partial R2.