8.3 - Cautions with Linear Regression

8.3 - Cautions with Linear Regression

Extrapolation is applying a regression model to X-values outside the range of sample X-values to predict values of the response variable Y. For example, Bob would not want to use the number of critical features to predict dollar amount using a regression model based on an urban area if Bob’s town is rural.

Second, if no linear relationship (i.e. correlation is zero) exists it does not imply there is no relationship . The scatter plot will reveal whether other possible relationships may exist. The figure below gives an example where X, Y are related, but not linearly related i.e. the correlation is zero.

Outliers and Influential Observations

Influential observations are points whose removal causes the regression equation to change considerably. It is flagged by Minitab in the unusual observation list and denoted as X. Outliers are points that lie outside the overall pattern of the data. Potential outliers are flagged by Minitab in the unusual observation list and denoted as R. The following is the Minitab output for the unusual observations within Bob’s study:

Fits and Diagnostics for Unusual Observations
Obs Cost Fit Resid Std Resid    
1 72.714 71.725 0.989 0.98   X
2 78.825 75.829 2.996 2.93 R X
6 81.967 84.507 -2.540 -2.44 R  
7 83.490 85.640 -2.150 -2.06 R  
85 113.540 111.440 2.100 2.01 R  

R Large Residual

X Unusual X

Some observations may be both outliers and influential, and these are flagged by R. Those observational points will merit particular attention because these points are not well “fit” by the model and maybe influencing conclusions or indicate an alternative model is needed.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility