Lesson 9: Influential Points

Overview of this Lesson

In this lesson, we learn about how data observations can potentially be influential in different ways. If an observation has a response value that is very different from the predicted value based on a model, then that observation is called an outlier. On the other hand, if an observation has a particularly unusual combination of predictor values (e.g., one predictor has a very different value for that observation compared with all the other data observations), then that observation is said to have high leverage. Thus, there is a distinction between outliers and high leverage observations, and each can impact our regression analyses differently. It is also possible for an observation to be both an outlier and have high leverage. Thus, it is important to know how to detect outliers and high leverage data points. Once we've identified any outliers and/or high leverage data points, we then need to determine whether or not the points actually have an undue influence on our model. This lesson addresses all these issues using the following measures:

  • leverages
  • residuals
  • standardized residuals
  • deleted residuals (or PRESS prediction errors)
  • studentized residuals
  • difference in fits (DFFITS)
  • Cook's distances
Key Learning Goals for this Lesson:
  • Understand the concept of an influential data point.
  • Know how to detect outlying y values by way of standardized residuals or studentized residuals.
  • Understand leverage, and know how to detect extreme x values using leverages.
  • Know how to detect potentially influential data points by way of DFFITS and Cook's distance.