3.4 - Further Example

Example 3-1: Hospital Infection Data

The hospital infection risk dataset consists of a sample of n = 58 hospitals in the east and north-central U.S. (Hospital Infection Data Region 1 and 2 data). The response variable is y = infection risk (percent of patients who get an infection) and the predictor variable is x = average length of stay (in days). Minitab output for a simple linear regression model fit to these data follows:

Regression Analysis: InfctRsk versus Stay

Analysis of Variance

Source	DF	Adj SS	Adj Ms	F-Value	P-Value
Regression	1	38.3059	38.3059	36.50	0.000
Stay	1	38.3059	38.3059	36.50	0.000
Error	56	58.7763	1.0496
Lack-of-Fit	54	58.5513	1.0843	9.64	0.098
Pure error	2	0.2250	0.1125
Total	57	97.0822

Model Summary

S	R-Sq	R-Sq (adj)	R-Sq (pred)
1.02449	39.46%	38.38%	35.07%

Coefficients

Term	Coef	SE Coef	T-Value	P-Value	VIF
Constant	-1.160	0.956	-1.21	0.230
Stay	0.5689	0.0942	6.04	0.000	1.00

Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Minitab output with information for x = 10.

Prediction for InfctRsk

Regression Equation

InfctRsk = -1.160 + 0.5689 Stay

Variable	Setting
Stay	10

Fit	SE Fit	95% CI	95% PI
4.52885	0.134602	(4.25921, 4.79849)	(2.45891, 6.59878)

We can make the following observations:

For the interval given under 95% CI, we say with 95% confidence we can estimate that in hospitals in which the average length of stay is 10 days, the mean infection risk is between 4.25921 and 4.79849.
For the interval given under 95% PI, we say with 95% confidence that for any future hospital where the average length of stay is 10 days, the infection risk is between 2.45891 and 6.59878.
The value under Fit is calculated as \(\hat{y} = −1.160 + 0.5689(10) = 4.529\).
The value under SE Fit is the standard error of \(\hat{y}\) and it measures the accuracy of \(\hat{y}\) as an estimate of E(Y ).
Since df = n − 2 = 58 − 2 = 56, the multiplier for 95% confidence is 2.00324. The 95% CI for E(Y) is calculated as \begin{align} &=4.52885 \pm (2.00324 × 0.134602)\\ &= 4.52885 \pm 0.26964\\ &= (4.259, 4.798)\end{align}
Since S = \(\sqrt{MSE}\) = 1.02449, the 95% PI is calculated as \begin{align} &=4.52885 \pm (2.00324 × \sqrt{1.02449^2 + 0.134602^2})\\ &= 4.52885 \pm 2.0699 = (2.459, 6.599)\end{align}

The following figure provides plots showing the difference between the confidence intervals (CI) and prediction intervals (PI) we have been considering.

There are also some things to note:

Notice that the limits for E(Y) are close to the line. The purpose of those limits is to estimate the "true" location of the line.
Notice that the prediction limits (on the right) bracket most of the data. Those limits describe the location of individual y-values.
Notice that the prediction intervals are wider than the confidence intervals. This is something that can be noted by the formulas.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility