5.5 - Two Warnings about Regression

  1.  First Warning: Avoid Extrapolation

    Do not use the regression equation to predict values of the response variable (y) for explanatory variable (x) values that are outside the range found with the original data. Remember not all relationships are linear (most are not) so when we look at a scatterplot we can only confirm that there is a linear pattern within the range of data at hand. The pattern may very well change shapes outside that range so using a line for extrapolation is inappropriate. With Example 5.4 prediction is restricted to quiz scores that lie between 56 points and 94 points, as shown in Figures 5.8. With Example 5.6, the blood alcohol content is linear in the range of the data. But clearly, the linear pattern can be true for, say 60 beers (the line would predict that your blood is more than 100% alcohol at that point!)

  2.  Second Warning: Logical Interpretation of the y-intercept in the context of a problem

    This is restricted to when you have data where x = 0 is in the sample. For example, the y-intercept for the regression equation in Example 5.6 is -0.0127, but clearly, it is impossible for BAC to be negative.  In fact, in the actual experiment, the police officer taking the BAC measurements using the breathalyzer machine tested all participants before the experiment started to be sure they registered with a BAC = 0. As another example, suppose that you have data from a particular school district that was used to determine a regression equation relating salary (in \$) to years of service (ranging from 0 years to 25 years).   The resulting regression equation is: 

    \(Salary=\$ 29,000+\dfrac{\$ 1,500}{year} \times (Years\ of\ Service) \)

    Even if you had not been told that "years of service (the x variable)" = 0 was in the sample, you would expect that there would be values with "years of service" = 0 since starting salaries would be in the data set. Therefore, the y-intercept has a logical interpretation of this problem.  However, many samples do not contain x = 0 in the data set and we cannot logically interpret those y-intercepts.