3.4 - Further Example

Example 3-1: Hospital Infection Data Section

empty hospital bed

The hospital infection risk dataset consists of a sample of n = 58 hospitals in the east and north-central U.S. (Hospital Infection Data Region 1 and 2 data). The response variable is y = infection risk (percent of patients who get an infection) and the predictor variable is x = average length of stay (in days). Minitab output for a simple linear regression model fit to these data follows:

Regression Analysis: InfctRsk versus Stay

Analysis of Variance

Source DF Adj SS Adj Ms F-Value P-Value
Regression 1 38.3059 38.3059 36.50 0.000
    Stay 1 38.3059 38.3059 36.50 0.000
Error 56 58.7763 1.0496    
    Lack-of-Fit 54 58.5513 1.0843 9.64 0.098
    Pure error 2 0.2250 0.1125    
Total 57 97.0822      

Model Summary

S R-Sq R-Sq (adj) R-Sq (pred)
1.02449 39.46% 38.38% 35.07%

Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant -1.160 0.956 -1.21 0.230  
Stay 0.5689 0.0942 6.04 0.000 1.00

Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Minitab output with information for x = 10.

Prediction for InfctRsk

Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Variable Setting
Stay 10
Fit SE Fit 95% CI 95% PI
4.52885 0.134602 (4.25921, 4.79849) (2.45891, 6.59878)

We can make the following observations:

  1. For the interval given under 95% CI, we say with 95% confidence we can estimate that in hospitals in which the average length of stay is 10 days, the mean infection risk is between 4.25921 and 4.79849.
  2. For the interval given under 95% PI, we say with 95% confidence that for any future hospital where the average length of stay is 10 days, the infection risk is between 2.45891 and 6.59878.
  3. The value under Fit is calculated as \(\hat{y} = −1.160 + 0.5689(10) = 4.529\).
  4. The value under SE Fit is the standard error of \(\hat{y}\) and it measures the accuracy of \(\hat{y}\) as an estimate of E(Y ).
  5. Since df = n − 2 = 58 − 2 = 56, the multiplier for 95% confidence is 2.00324. The 95% CI for E(Y) is calculated as \begin{align} &=4.52885 \pm (2.00324 × 0.134602)\\ &= 4.52885 \pm 0.26964\\ &= (4.259, 4.798)\end{align}
  6. Since S = \(\sqrt{MSE}\) = 1.02449, the 95% PI is calculated as \begin{align} &=4.52885 \pm (2.00324 × \sqrt{1.02449^2 + 0.134602^2})\\ &= 4.52885 \pm 2.0699 = (2.459, 6.599)\end{align}

The following figure provides plots showing the difference between the confidence intervals (CI) and prediction intervals (PI) we have been considering.

scatterplots

There are also some things to note:

  1. Notice that the limits for E(Y) are close to the line. The purpose of those limits is to estimate the "true" location of the line.
  2. Notice that the prediction limits (on the right) bracket most of the data. Those limits describe the location of individual y-values.
  3. Notice that the prediction intervals are wider than the confidence intervals. This is something that can be noted by the formulas.