Lesson 5: Regression Shrinkage Methods

Key Learning Goals for this Lesson:

Textbook reading: Consult Course Schedule

Prediction:

Linear regression: $E(Y_j | X) = X \beta$;
Or for a more general regression function: $E(Y_j | X) = f (X)$;
In a prediction context, there is less concern about the values of the components of the right hand side, rather interest is on the total contribution.

Variable Selection:

The desire for a parsimonious regression model (one that is simpler and easier to interpret);

The need for greater accuracy in prediction.

The notion of what makes a variable "important" is still not well understood, but one interpretation (Breiman, 2001) is that a variable is important if dropping it seriously affects prediction accuracy.

Selecting variables in regression models is a complicated problem, and there are many conflicting views on which type of variable selection procedure is best, e.g. LRT, F-test, AIC, and BIC.

There are two main types of stepwise procedures in regression:

Backward elimination: eliminate the least important variable from the selected ones.
Forward selection: add the most important variable from the remaining ones.
A hybrid version that incorporates ideas from both main types: alternates backwards and forwards steps, and stops when all variables have either been retained for inclusion or removed.

Criticisms of Stepwise Methods:

There is no guarantee that the subsets obtained from stepwise procedures will contain the same variables or even be the "best" subset.
When there are more variables than observations (p > n), backward elimination is typically not a feasible procedure.
The maximum or minimum of a set of correlated F statistics is not itself an F statistic.
It produces a single answer (a very specific subset) to the variable selection problem, although several different subsets may be equally good for regression purposes.
The computing is easy by the use of R function step() or regsubsets(). However, to specify a practically good answer, you must know the practical context in which your inference will be used.

Scott Zeger on 'how to pick the wrong model': Turn your scientific problem over to a computer that, knowing nothing about your science or your question, is very good at optimizing AIC, BIC, ...

Navigation