6.6 - Lack of Fit Testing in the Multiple Regression Setting

Formal lack of fit testing can also be performed in the multiple regression setting; however, the ability to achieve replicates can be more difficult as more predictors are added to the model. Note that the corresponding ANOVA table below is similar to that introduced for the simple linear regression setting. However, now we have p regression parameters and c unique X vectors. Further, each predictor must have the same value for at least two observations for it to be considered a replicate. For example, suppose we have 3 predictors for our model. The observations (40, 10, 12) and (40, 10, 7) are unique levels for our X vectors, whereas the observations (10, 5, 13) and (10, 5, 13) would constitute a replicate.

Source	df	SS	MS	F
Regression	p - 1	SSR	MSR	MSR / MSE
Error	n - p	SSE	MSE
Lack of Fit	c - p	SSLOF	MSLOF	MSLOF / MSPE
Pure Error	n - c	SSPE	MSPE
Total	n - 1	SSTO

Formal lack of fit testing in multiple regression can be difficult due to sparse data unless we're analyzing an experiment that was designed to include replicates. However, other methods can be employed for lack of fit testing when we do not have replicates. Such methods involve data subsetting. The basic approach is to establish criteria by introducing indicator variables, which in turn create coded variables. By coding the variables, you can artificially create replicates and then you can proceed with lack of fit testing. Another approach with data subsetting is to look at central regions of the data and treat this as a reduced data set. Then compare this reduced fit to the full fit (i.e., the fit with all of the data), for which the formulas for a lack of fit test can be employed. Be forewarned that these methods should only be used as exploratory methods and they are heavily dependent on what sort of data subsetting method is used.