Printer-friendly versionPrinter-friendly version

When fitting linear shrinkage/regularization models (ridge and lasso), the predictors, $X$, should be standardized (for each predictor subtract the mean and then divide by the standard deviation). For a brand-new X, the prediction model is

\hat{y}_{new} = \hat{\beta_0} +  \sum_{j=1}^p \hat{\beta_j} \frac{X_{new,j}-mean(X_{train},j)}{sd(X_{train},j)}
where $\hat{\beta}$'s are estimated from the training data. $mean(X_{train},j)$ and $sd(X_{train},j)$ are the mean and standard deviation of the training data. The R function glmnet performs this standardization by default.

Useful R functions:

  • Perform ridge regression and the lasso: glmnet and cv.glmnet (glmnet).
  • (Alternative.) Fit a linear model by ridge regression: lm.ridge (MASS).
  • (Alternative.) Fit a linear model by lasso: lars and cv.lars (lars).
  • (Alternative.) Penalized regression (lasso and ridge) with cross-validation routines: (penalized).

The Hitters example from the text book contains specific details on using glmnet. You'll need to understand this in order to complete the project, which will use the diabetes data in the lars package.

After completing the reading for this lesson, please finish the Quiz and R Lab on ANGEL (check the course schedule for due dates).