In this lesson, we learned the distinction between outliers and high leverage data points, and how each of their existences can impact our regression analyses differently. A data point is influential if it unduly influences any part of a regression analysis, such as the predicted responses, the estimated slope coefficients, or the hypothesis test results. We learned how to detect outliers, high leverage data points, and influential data points using the following measures:
- studentized residuals (or internally studentized residuals) [which Minitab calls standardized residuals]
- (unstandardized) deleted residuals (or PRESS prediction errors)
- studentized deleted residuals (or externally studentized residuals) [which Minitab calls deleted residuals]
- difference in fits (DFFITS)
- Cook's distance measure
We also learned a strategy for dealing with problematic data points once we've discovered them.