Example 4-1: A Good Residual Plot Section
Below is a plot of residuals versus fits after a straight-line model was used on data for y = handspan (cm) and x = height (inches), for n = 167 students (Hand and Height dataset).
Interpretation: This plot looks good in that the variance is roughly the same all the way across and there are no worrisome patterns. There seem to be no difficulties with the model or data.
Example 4-2: Residual Plot Resulting from Using the Wrong Model Section
Below is a plot of residuals versus fits after a straight-line model was used on data for y = concentration of a chemical solution and x = time after the solution was made (Solutions Concentration dataset).
Interpretation: This plot of residuals versus plots shows two difficulties. First, the pattern is curved which indicates that the wrong type of equation was used. Second, the variance (vertical spread) increases as the fitted values (predicted values) increase.
Example 4-3: Indications that Assumption of Constant Variance is Not Valid Section
Below is a plot of residuals versus fits after a straight-line model was used on data for y = sale price of a home and x = square foot area of the home (Real estate dataset).
Interpretation: This plot of residuals versus fits shows that the residual variance (vertical spread) increases as the fitted values (predicted values of the sale price) increase. This violates the assumption of constant error variance.
Example 4-4: Indications that Assumption of Normal Distribution for Errors is Valid Section
The graphs below are a histogram and a normal probability plot of the residuals after a straight-line model was used for fitting y = time to the next eruption and x = duration of the last eruption for eruptions (Old Faithful dataset).
Interpretation: The histogram is roughly bell-shaped so it is an indication that it is reasonable to assume that the errors have a normal distribution. The pattern of the normal probability plot is straight, so this plot also provides evidence that it is reasonable to assume that the errors have a normal distribution.
Example 4-5: Indications that Assumption of Normal Distribution for Errors is Not Valid Section
Below is a normal probability plot for the residuals from a straight-line regression with y = infection risk in a hospital and x = average length of stay in the hospital. The observational units are hospitals and the data are taken from regions 1 and 2 (Infection Risk dataset).
Interpretation: The plot shows some deviation from the straight-line pattern indicating a distribution with heavier tails than a normal distribution.
Example 4-6: Stopping Distance Data Section
We investigate how transforming y can sometimes help us with nonconstant variance problems. We will look at the stopping distance data with y = stopping distance of a car and x = speed of the car when the brakes were applied (Car Stopping data). A graph of the data is given below.
Fitting a simple linear regression model to these data leads to problems with both curvature and nonconstant variance. One possible remedy is to transform y. With some trial and error, we find that there is an approximately linear relationship between \(\sqrt{y}\) and x with no suggestion of nonconstant variance.
The Minitab output below gives the regression equation for square root distance on speed along with predicted values and prediction intervals for speeds of 10, 20, 30, and 40 mph. The predictions are for the square root of the stopping distance.
Regression Equation
sqrtdist = 0.918 + .253 Speed
Speed | Fit | 95% PI |
---|---|---|
10 | 3.44 | 1.98, 4.90 |
20 | 5.97 | 4.52, 7.42 |
30 | 8.50 | 7.03, 9.97 |
40 | 11.03 | 9.53, 13.53 |
Then, the output below shows predicted values and prediction intervals when we square the results (i.e., transform back to the scale of the original data).
Speed | Fit | 95% PI |
---|---|---|
10 | 11.83 | 3.92, 24.01 |
20 | 35.64 | 20.43, 55.06 |
30 | 72.25 | 49.42, 99.40 |
40 | 121.66 | 90.82, 156.75 |
Notice that the predicted values coincide more or less with the average pattern in the scatterplot of speed and stopping distance above. Also, notice that the prediction intervals for stopping distance are becoming increasingly widespread as speed increases. This reflects the nonconstant variance in the original data.
We cover transformations like this in more detail in Lesson 9.