9.2.5 - Other Inferences and Considerations

Inferences about Mean Response for New Observation

Let’s go back to the height and weight example:

Example

If you are asked to estimate the weight of a STAT 500 student, what will you use as a point estimate? If I tell you that the height of the student is 70 inches, can you give a better estimate of the person's weight?

Now that we have our regression equation, we can use height to provide a better estimate of weight. We would want to report a mean response value for the provided height, i.e 70 inches.

The mean response at a given X value is given by:

\(E(Y)=\beta_0+\beta_1X\)

This is an unknown but fixed value. The point estimate for mean response at \(X=x\) is given by \(\hat{\beta}_0+\hat{\beta}_1x\).

The example for finding this mean response for height and weight is shown later in the lesson.

Inferences about Outcome for New Observation

The point estimate for the outcome at \(X = x\) is provided above. The interval to estimate the mean response is called the confidence interval. Minitab calculates this for us.

The interval used to estimate (or predict) an outcome is called prediction interval. For a given x value, the prediction interval and confidence interval have the same center, but the width of the prediction interval is wider than the width of the confidence interval. That makes good sense since it is harder to estimate a value for a single subject (say predict your weight based on your height) than it would be to estimate the average for subjects (say predict the mean weight of people who are your height). Again, Minitab will calculate this interval as well.

Cautions with Linear Regression

First, use extrapolation with caution. Extrapolation is applying a regression model to X-values outside the range of sample X-values to predict values of the response variable \(Y\). For example, you would not want to use your age (in months) to predict your weight using a regression model that used the age of infants (in months) to predict their weight.

Second, the fact that there is no linear relationship (i.e. correlation is zero) does not imply there is no relationship altogether. The scatter plot will reveal whether other possible relationships may exist. The figure below gives an example where X, Y are related, but not linearly related i.e. the correlation is zero.

Outliers and Influential Observations

Influential observations are points whose removal causes the regression equation to change considerably. It is flagged by Minitab in the unusual observation list and denoted as X. Outliers are points that lie outside the overall pattern of the data. Potential outliers are flagged by Minitab in the unusual observation list and denoted as R.

The following is the Minitab output for the unusual observations within the height and weight example:

Fits and Diagnostics for Unusual Observations

Obs	weight	Fit	Residual	St Resid
24	200.00	139.74	60.26	3.23R

R Large Residual

Some observations may be both outliers and influential, and these are flagged by R and X (R X). Those observational points will merit particular attention. In our height and weight example, we have an R (potential outlier) observation, but it is not an influential point (RX observation).

Estimating the standard deviation of the error term

Our simple linear regression model is:

\(Y=\beta_0+\beta_1X+\epsilon\)

The errors for the \(n\) observations are denoted as \(\epsilon_i\), for \(i=1, …, n\). One of our assumptions is that the errors have equal variance (or equal standard deviation). We can estimate the standard deviation of the error by finding the standard deviation of the residuals, \(\epsilon_i=y_i-\hat{y}_i\). Minitab also provides the estimate for us, denoted as \(S\), under the Model Summary. We can also calculate it by:

\(s=\sqrt{\text{MSE}}\)

Find the MSE in the ANOVA table, under the MS column and the Error row.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility