12.10 - Model Validation

Statisticians have recommended a number of approaches to evaluate a model. One approach involves partitioning the data set into an estimation data set and a validation data set (usually in a two-thirds versus one-third split).

The estimation data set is used to build the model, and hence, estimate the parameters. The validation data set is used to validate the model by inserting a patient’s set of observed regressors into the estimated model equation and predicting the outcome response for that subject.

If the predicted outcome is relatively close to the observed outcome for the subjects in the validation data set, then the model is considered valid.

Another approach is called the “leave-one-out” method and consists of eliminating the first patient from the data set with n subjects, estimating the model equation based on the remaining \(n - 1\) patients, calculating the predicted outcome for the first patient, and then comparing the first patient’s predicted and observed outcomes.

This process is performed for each of the n patients and an overall validation statistic is constructed.

These validation procedures work fine for nonrandomized studies, but for randomized clinical trials, they probably should be applied only to secondary and exploratory statistical analyses.