Lesson 11: Model Building
Overview of this Lesson
For all of the regression analyses that we have performed so far in this course, it has been obvious which of the major predictors we should include in our regression model. Unfortunately, this is typically not the case. More often than not, a researcher has a large set of candidate predictor variables from which to try to identify the most appropriate predictors to include in the regression model.
Of course, the larger the number of candidate predictor variables, the larger the number of possible regression models. For example, if a researcher has (only) 10 candidate predictor variables, there are 210 = 1024 possible regression models from which to choose. Clearly, some assistance would be needed in evaluating all of the possible regression models. That's where two variable selection methods — stepwise regression and best subsets regression — come in handy.
In this lesson, we'll learn about the above two variable selection methods. Our goal throughout will be to choose a small subset of predictors from the larger set of candidate predictors so that the resulting regression model is simple yet useful. That is, as always, our resulting regression model should:
- provide a good summary of the trend in the response,
- provide good predictions of the response, and
- provide good estimates of the slope coefficients.
Note. The data sets herein are not really all that large. For the sake of illustration, they necessarily have to be small, so that the largeness of the data set does not obscure the pedagogical point being made.
Key Learning Goals for this Lesson: |
|