Lesson 4: SLR Assumptions, Estimation & Prediction

Overview of this Lesson

A typical regression analysis involves the following steps:

  1. Model formulation
  2. Model estimation
  3. Model evaluation
  4. Model use

So far, we have learned how to formulate and estimate a simple linear regression model. We have also learned about some methods for evaluating the model. The first part of this lesson continues the topic of evaluating the model.

How do we evaluate a model? How do we know if the model we are using is good? One way to consider these questions is to assess whether the assumptions underlying the simple linear regression model seem reasonable when applied to the dataset in question. Since the assumptions relate to the (population) prediction errors, we do this through the study of the (sample) estimated errors, the residuals.

We focus in this lesson on graphical residual analysis. When we revisit this topic in the context of multiple linear regression in Lesson 6 we'll also study some statistical tests for assessing the assumptions. We'll consider various remedies for when linear regression model assumptions fail throughout the rest of the course, but particularly in Lesson 7.

In the second part of this lesson, we focus our efforts on using the model to answer two specific research questions, namely:

  • What is the average response for a given value of the predictor x?
  • What is the value of the response likely to be for a given value of the predictor x?

In particular, we will learn how to calculate and interpret:

  • A confidence interval for estimating the mean response for a given value of the predictor x.
  • A prediction interval for predicting a new response for a given value of the predictor x.
Key Learning Goals for this Lesson:
  • Understand why we need to check the assumptions of our model.
  • Know the things that can go wrong with the linear regression model.
  • Know how we can detect various problems with the model using a residuals vs. fits plot.
  • Know how we can detect various problems with the model using a residuals vs. predictor plot.
  • Know how we can detect a certain kind of dependent error terms using a residuals vs. order plot.
  • Know how we can detect non-normal error terms using a normal probability plot.
  • Distinguish between estimating a mean response (confidence interval) and predicting a new observation (prediction interval).
  • Understand the various factors that affect the width of a confidence interval for a mean response.
  • Understand why a prediction interval for a new response is wider than the corresponding confidence interval for a mean response.
  • Know the formula for a prediction interval depends strongly on the condition that the error terms are normally distributed, while the formula for the confidence interval is not so dependent on this condition for large samples.
  • Know the types of research questions that can be answered using the materials and methods of this lesson.