Backward Elimination & Stepwise Selection Procedures
We will begin here by using two subset selection procedures in SAS Proc Logistic for choosing variables related to the response:
- Backward elimination
- Stepwise selection
Which Model Should I Fit?
Take a look at this SAS program (water_level3.sas):
The data are input, the variables identified and then the PROC LOGISTIC procedure is called specifying a model where Y (subjects passed, 1 or failed, 0) is the response. Notice, highlighted in purple, the use of the word 'backward' and 'stepwise' to specify the two different subset selection procedures.
Backward Elimination
In the output, the procedure begins by entering all of the variables:
and then one by one the variables are removed...
Each time the model is re-fit until at the end of the procedure the note below is reported along with the four variables that were removed from the model fit.
Directly after this the procedure lists the variables that are retained in the model as their p-values and all < 0.05:
along with the coefficients that make up the fitted model.
Stepwise Selection
This procedure takes the opposite approach beginning with one variable and subsequently adding additional variables, on at a time, to the model, fitting it each time.
until at the end of the procedure the following note is given:
and a summary list of the variables that remain in the model is displayed:
Odds Ratio Estimates
If we look at the Odds Ratio Estimates for both procedures:
Backward Elimination
Stepwise Selection
The two procedures each selected 6 variables with 5 in common; backward elimination chose ‘gravity’ while stepwise chose ‘totphysics’. The odds ratio and confidence interval estimates are quite close for all variables.
Furthermore, neither model includes the variable ‘sex’. We conclude that adjusted for these 6 independent variables ‘sex’ does not affect passing/failing.
This handout covers this information as well: WaterStudyModelSelection.pdf