# Lesson 5: Regression Shrinkage Methods Printer-friendly version

### Introduction

 Key Learning Goals for this Lesson: Introducing biased regression methods to reduce variance Implementation of Ridge and Lasso regression Textbook reading: Consult Course Schedule

Prediction:

• Linear regression: $E(Y_j | X) = X \beta$ ;
• Or for a more general regression function: $E(Y_j | X) = f (X)$;
• In a prediction context, there is less concern about the values of the components of the right hand side, rather interest is on the total contribution.

Variable Selection:

• The driving force behind variable selection:
• The desire for a parsimonious regression model (one that is simpler and easier to interpret);
• The need for greater accuracy in prediction.
• The notion of what makes a variable "important" is still not well understood, but one interpretation (Breiman, 2001) is that a variable is important if dropping it seriously a ffects prediction accuracy.
• Selecting variables in regression models is a complicated problem, and there are many conflicting views on which type of variable selection procedure is best, e.g. LRT, F-test, AIC, and BIC.

There are two main types of stepwise procedures in regression:

• Backward elimination: eliminate the least important variable from the selected ones.

• Forward selection: add the most important variable from the remaining ones.

• A hybrid version that incorporates ideas from both main types: alternates backwards and forwards steps, and stops when all variables have either been retained for inclusion or removed.

Criticisms of Stepwise Methods:

• There is no guarantee that the subsets obtained from stepwise procedures will contain the same variables or even be the "best" subset.

• When there are more variables than observations (p > n), backward elimination is typically not a feasible procedure.

• The maximum or minimum of a set of correlated F statistics is not itself an F statistic.

• It produces a single answer (a very specific subset) to the variable selection problem, although several different subsets may be equally good for regression purposes.

• The computing is easy by the use of R function step() or regsubsets(). However, to specify a practically good answer, you must know the practical context in which your inference will be used.

Scott Zeger on 'how to pick the wrong model': Turn your scientific problem over to a computer that, knowing nothing about your science or your question, is very good at optimizing AIC, BIC, ...