# 3.4 - Further Example

3.4 - Further Example## Example 3-1: Hospital Infection Data

The hospital infection risk dataset consists of a sample of* n* = 58 hospitals in the east and north-central U.S. (Hospital Infection Data Region 1 and 2 data). The response variable is *y* = infection risk (percent of patients who get an infection) and the predictor variable is *x* = average length of stay (in days). Minitab output for a simple linear regression model fit to these data follows:

## Regression Analysis: InfctRsk versus Stay

### Analysis of Variance

Source | DF | Adj SS | Adj Ms | F-Value | P-Value |
---|---|---|---|---|---|

Regression | 1 | 38.3059 | 38.3059 | 36.50 | 0.000 |

Stay | 1 | 38.3059 | 38.3059 | 36.50 | 0.000 |

Error | 56 | 58.7763 | 1.0496 | ||

Lack-of-Fit | 54 | 58.5513 | 1.0843 | 9.64 | 0.098 |

Pure error | 2 | 0.2250 | 0.1125 | ||

Total | 57 | 97.0822 |

### Model Summary

S | R-Sq | R-Sq (adj) | R-Sq (pred) |
---|---|---|---|

1.02449 | 39.46% | 38.38% | 35.07% |

### Coefficients

Term | Coef | SE Coef | T-Value | P-Value | VIF |
---|---|---|---|---|---|

Constant |
-1.160 | 0.956 | -1.21 | 0.230 | |

Stay |
0.5689 | 0.0942 | 6.04 | 0.000 | 1.00 |

### Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Minitab output with information for *x* = 10.

## Prediction for InfctRsk

### Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Variable | Setting |
---|---|

Stay | 10 |

Fit | SE Fit | 95% CI | 95% PI |
---|---|---|---|

4.52885 | 0.134602 | (4.25921, 4.79849) | (2.45891, 6.59878) |

We can make the following observations:

- For the interval given under 95% CI, we say with 95% confidence we can estimate that in hospitals in which the average length of stay is 10 days, the mean infection risk is between 4.25921 and 4.79849.
- For the interval given under 95% PI, we say with 95% confidence that for any future hospital where the average length of stay is 10 days, the infection risk is between 2.45891 and 6.59878.
- The value under Fit is calculated as \(\hat{y} = −1.160 + 0.5689(10) = 4.529\).
- The value under SE Fit is the standard error of \(\hat{y}\) and it measures the accuracy of \(\hat{y}\) as an estimate of E(
*Y*). - Since
*df*=*n*− 2 = 58 − 2 = 56, the multiplier for 95% confidence is 2.00324. The 95% CI for E(Y) is calculated as \begin{align} &=4.52885 \pm (2.00324 × 0.134602)\\ &= 4.52885 \pm 0.26964\\ &= (4.259, 4.798)\end{align} - Since S = \(\sqrt{MSE}\) = 1.02449, the 95% PI is calculated as \begin{align} &=4.52885 \pm (2.00324 × \sqrt{1.02449^2 + 0.134602^2})\\ &= 4.52885 \pm 2.0699 = (2.459, 6.599)\end{align}

The following figure provides plots showing the difference between the confidence intervals (CI) and prediction intervals (PI) we have been considering.

There are also some things to note:

- Notice that the limits for E(
*Y*) are close to the line. The purpose of those limits is to estimate the "true" location of the line. - Notice that the prediction limits (on the right) bracket most of the data. Those limits describe the location of individual
*y*-values. - Notice that the prediction intervals are wider than the confidence intervals. This is something that can be noted by the formulas.