5.2 - Comparison of Estimators

Compare the regression estimate to the estimate \(\bar{y}\)

To compare the regression estimate to the estimate \(\bar{y}\), (which does not use auxiliary result of x), we see that:

\(\hat{V}ar(\bar{y})=\dfrac{N-n}{N}\cdot \dfrac{s^2}{n}\)

\(s^2\) for y values is: \((15.11)^2\)

Try it!

  1. What is the \(Var(\bar{y})\)?

    \begin{align}
    \hat{V}ar(\bar{y}) &= \dfrac{486-10}{486 \times 10} \cdot 228.31 \\
    &= 22.36\\
    \end{align}

  2. Next, what is an approximate 95% CI for μ?

    \(\bar{y} \pm t_{n-1}\sqrt{\hat{V}ar(\bar{y})}\)
    \begin{array}{lcl}
       & = & 76 \pm 2.262 \times \sqrt{22.36} \\
       & = & 76 \pm 10.70
    \end{array}

Recall that the 95% confidence interval using regression estimate is 80.63 \(\pm\) 6.28; a much shorter confidence interval.

This regression estimate is more precise than \(\bar{y}\).

Additionally, we have another estimator that we can look at.

Compare \(\hat{\mu}_L\) to the ratio estimator \(\hat{\mu}_r\)

Next, Minitab was used to find out the mean and standard deviation for X and Y.

Variable N Mean StDev SE Mean
X 10 46.00 16.58 5.24
Y 10 76.00 15.11 4.78

The ratio estimate is inappropriate for this example. However, just to show a counter-example, we can compute the variance of the ratio estimate using the following Minitab printout and compare this to the regression estimate.

X Y Y -rX
39 65 0.572
43 78 6.964
21 52 17.308
64 82 -23.728
57 92 -2.164
47 89 11.356
28 73 26.744
75 98 -25.900
34 56 -0.168
52 75 -10.904

The sum of squares (uncorrected) of Y -rX = 2550.03

Note!

For the Calculus Scores example, we should not use the ratio estimator  \(\hat{\mu}_r\) because the p-value for the constant term is 0.001. This implies that it does not go through the origin and for this reason the ratio estimate is not appropriate. But for the purposes of a counter-example, we will work it out here anyway:

\(\hat{\mu}_r=r\mu_x=\dfrac{\bar{y}}{\bar{x}}\cdot \mu_x=\dfrac{76}{46}\cdot 52=85.91\)

Next, we need to figure out the variance and for this, we need the MSE while using a ratio estimate. From the Minitab output, we have the SS / n-1, therefore, the

\(s^2_r=\dfrac{1}{10-1} \sum\limits_{i=1}^{10} (y_i-rx_i)^2=283.33\) (this is huge!)

Now we can compute the variance:

Try it!

What is the variance of \(\hat{\mu}_r\)?

\begin{align}
\hat{V}ar(\hat{\mu}_r) &=\dfrac{N-n}{N}\cdot \dfrac{s^2_r}{n}\\
&= \dfrac{486-10}{486}\cdot \dfrac{283.33}{10}=27.75\\
\end{align}

Now we can compute a 95% confidence interval for \(\mu\).

Try it!

What is an approximate 95% confidence interval for \(\hat{\mu}_r\) using a ratio estimate?

\(\hat{\mu}_r \pm t_{n-1}\sqrt{\hat{V}ar(\hat{\mu}_r)}\)
\begin{array}{lcl}
   & = & 85.91 \pm 2.262 \times \sqrt{27.75} \\
   & = & 85.91 \pm 11.92
\end{array}

We can see that the ratio estimate is even worse than \(\bar{y}\) when it is used in an inappropriate situation.

The width of the interval is larger than the one for the regression estimate.

The moral of this story here is, "Use the right model!"