In this lesson, we learn about how data observations can potentially be influential in different ways. If an observation has a response value that is very different from the predicted value based on a model, then that observation is called an outlier. On the other hand, if an observation has a particularly unusual combination of predictor values (e.g., one predictor has a very different value for that observation compared with all the other data observations), then that observation is said to have high leverage. Thus, there is a distinction between outliers and high-leverage observations, and each can impact our regression analyses differently. It is also possible for an observation to be both an outlier and have high leverage. Thus, it is important to know how to detect outliers and high-leverage data points. Once we've identified any outliers and/or high-leverage data points, we then need to determine whether or not the points actually have an undue influence on our model. This lesson addresses all these issues using the following measures:
- studentized residuals (or internally studentized residuals) [which Minitab calls standardized residuals]
- (unstandardized) deleted residuals (or PRESS prediction errors)
- studentized deleted residuals (or externally studentized residuals) [which Minitab calls deleted residuals]
- difference in fits (DFFITS)
- Cook's distance measure
- Understand the concept of an influential data point.
- Know how to detect outlying y values by way of studentized residuals or studentized deleted residuals.
- Understand leverage, and know how to detect outlying x values using leverages.
- Know how to detect potentially influential data points by way of DFFITS and Cook's distance measure.
Lesson 11 Code Files Section
Below is a zip file that contains all the data sets used in this lesson: