4.2 - Selecting Sample Size and Small Population Example for Ratio Estimate

Example 4-2: Estimating average number of trees per acre Section

close up of trees

( See p. 196 of Scheaffer, Mendenhall and Ott)

Part 1: Estimate the average number of trees per acre

Suppose we want to estimate the average number of trees per acre on a 1000-acre plantation. The investigator samples 10 one-acre plots by simple random sampling and counts the number of trees (y) on each plot. She also has aerial photographs of the plantation from which she can estimate the number of trees (x) on each plot of the entire plantation. Hence, she knows \(\mu_x\) = 19.7, and since the two counts are approximately proportional through the origin, she uses a ratio estimate to estimate \(\mu_y\).

Plot Actual number per acre Y Aerial estimate X \(y_i-rx_i\)
1 25 23 0.5625
2 15 14 0.1250
3 22 20 0.7500
4 24 25 -2.5625
5 13 12 0.2500
6 18 18 -1.1250
7 35 30 3.1250
8 30 27 1.3125
9 10 8 1.5000
10 29 31 -3.9375
mean 22.10 20.80  

Here is a scatterplot of this data:

scatterplot

The Minitab output for regression is given below.

Coefficients

Predictor Coef SE Coef T-Value P-Value
Constant  1.239 2.007 0.62 0.554
 X 1.002 0.06754 10.23 0.000

Regression Equation

Y = 1.24 + 1.00 X

The scatter plot of the data shows a linear relationship between y and x. Moreover, the regression analysis suggests that the regression line goes through the origin (p-value of constant = 0.554 > 0.05). Therefore, it may be appropriate to use the ratio estimate.


Part 2: Estimate the average number of trees per acre (Computation)

N = 1000 (plantation size)
n = 10 (taken by s.r.s.)
\(y_i\) = the actual count of trees in the 1-acre plots, i = 1, 2, ... , 10.
xi = the aerial estimate for each plot
\(\bar{y}=22.10\)
\(\bar{x}=20.80\)
\(\mu_x\) is given to be 19.70

\(\hat{\mu}_r=\dfrac{\bar{y}}{\bar{x}} \cdot \mu_x=\dfrac{22.10}{20.80}\cdot 19.70=20.93\)

\(s^2_r=\dfrac{1}{10-1}\sum\limits_{i=1}^{10}\left(y_i-\dfrac{22.10}{20.80}x_i\right)^2=4.2\)

\(\hat{V}ar(\hat{\mu}_r)=\dfrac{N-n}{N}\cdot \dfrac{s^2_r}{n}=\dfrac{1000-10}{1000}\cdot \dfrac{4.2}{10}=0.4158\)

\(\hat{S}D(\hat{\mu}_r)=\sqrt{0.4158}=0.6448\)

Answer

The approximate 95% confidence interval for \(\mu_y\) is:

\begin{align}
\hat{\mu}_r \pm t_9 \cdot \hat{S}D(\hat{\mu}_r)&=20.93 \pm 2.262\cdot0.6448\\
&=20.93 \pm 1.46\\
\end{align}


Part 3: Sample size needed to estimate \(\mu_y\)

To find the sample size needed to estimate \(\mu_y\) when the ratio estimator is used.

Derivation

Let d denote the margin of error of the 100(1 - \(\alpha\))% confidence interval for \(\mu_y\). Then we know that:

\(t\cdot\sqrt{\dfrac{N-n}{N}\cdot \dfrac{s^2_r}{n}}=d\)

The formula to compute the required sample size is:

\(n=\dfrac{N\cdot t^2 \cdot s^2_r}{t^2 \cdot s^2_r+Nd^2}\)

For the tree count with the aerial estimate example, if we want to estimate \(\mu_y\) to within 1 tree with 95% confidence, how many plots should we sample if the ratio estimate is to be used?

\(n=\dfrac{1000 \cdot 1.96^2 \cdot 4.2}{1.96^2 \cdot 4.2+1000 \cdot 1^2}=15.879\)

round up to 16 acres.

(one can also use the refined method instead of estimating t by 1.96)

\(n=\dfrac{1000 \cdot 2.131^2 \cdot 4.2}{2.131^2 \cdot 4.2+1000 }=18.72\)
\(n=19\)

\(n=\dfrac{1000 \cdot 2.101^2 \cdot 4.2}{2.101^2 \cdot 4.2+1000 }=18.2\)
\(n=19\)

Answer:

n = 19.

When is using the ratio estimate advantageous?

For estimating \(\mu_y\), we want to compare \(\bar{y}\) versus \(\hat{\mu}_r\) under simple random sampling. For large samples, \(\hat{\mu}_r\) is roughly unbiased and thus we can compare their variances. Recall that \(\hat{V}ar(\bar{y})=(1-\dfrac{n}{N})\dfrac{s^2_y}{n}\) where \(s^2_y\) is the estimated variance of the sample, and \(\hat{V}ar(\hat{\mu}_r)=\dfrac{N-n}{N}\cdot \dfrac{s^2_r}{n}\) where \(s^2_r=s^2_y+r^2s^2_x-2r\hat{\rho} s_x s_y\). Then the ratio becomes:

\(\dfrac{\hat{V}ar(\bar{y})}{\hat{V}ar(\hat{\mu}_r)}=\dfrac{s^2_y}{s^2_y+r^2s^2_x-2r\hat{\rho} s_x s_y}\)

where \(r=\dfrac{\bar{y}}{\bar{x}}\) and \(\hat{\rho}\) is the sample correlation between X and Y.

Since, \(\hat{V}ar(\bar{y})>\hat{V}ar(\hat{\mu}_r)\) if \(\hat{\rho}>\dfrac{1}{2} \dfrac{s_x/\bar{x}}{s_y/\bar{y}}\), it is then advantageous to use \(\hat{\mu}_r\).

Example 4-3: Illustrating bias with a small population Section

( See Section 7.2 of the textbook)

This is an artificial small population example that we will use to demonstrate how to compute the bias and MSE of the ratio estimator.

site i 1 2 3 4
Nets, \(x_i\) 4 5 8 5
Fishes, \(y_i\) 200 300 500 400

\(\tau_x\) = 22 ,\(\tau_y\) = 1400

Samples (s.r.s.) n = 2.

Samples \(\hat{\tau}_r=\dfrac{\bar{y}}{\bar{x}}\cdot \tau_x\)
(1, 2) \(\hat{\tau}_r=\dfrac{(200+300)/2}{(4+5)/2}\cdot 22=1222\)
(1, 3) \(\hat{\tau}_r=\dfrac{(200+500)/2}{(4+8)/2}\cdot 22=1283\)
(1, 4) 1467
(2, 3) 1354
(2, 4) 1540
(3, 4) 1523

\begin{align}
E(\hat{\tau}_r)&=\left(\dfrac{1}{6}\cdot 1222\right)+\left(\dfrac{1}{6}\cdot 1283\right)+\left(\dfrac{1}{6}\cdot 1467\right)+\left(\dfrac{1}{6}\cdot 1354\right)+\left(\dfrac{1}{6}\cdot 1540\right)+\left(\dfrac{1}{6}\cdot 1523\right)\\
&=1398.17 \neq \tau_y =1400\\
\end{align}

Thus, there is a very slight bias.

\begin{align}
MSE &=\sum\limits_{i=1}^6 (\hat{\tau}_{r,s}-\tau)^2 \cdot P(s)\\
&= \left( (1222-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1283-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1467-1400)^2 \cdot \dfrac{1}{6}\right)\\
&+\left( (1354-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1540-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1523-1400)^2 \cdot \dfrac{1}{6}\right)\\
&= 14451.2\\
\end{align}

When there is a slight bias, MSE \(\ne\) Var.

site i 1 2 3 4
Nets, \(x_i\) 4 5 8 5
Fishes, \(y_i\) 200 300 500 400

\(\tau_x\)= 22 , \(\tau_y\) = 1400

On the other hand, if one uses \(\hat{\tau}=N \cdot \bar{y}\)

Samples \(\hat{\tau}=N \cdot \bar{y}\)
(1, 2) 4 × (200 + 300) / 2 = 1000
(1, 3) 4 × (200 + 500) / 2 = 1400
(1, 4) 4 × (200 + 400) / 2 = 1200
(2, 3) 4 × (300 + 500) / 2 = 1600
(2, 4) 4 × (300 + 400) / 2 = 1400
(3, 4) 4 × (500 + 400) / 2 = 1800

\begin{align}
E(\hat{\tau})&=\left(\dfrac{1}{6}\cdot 1000\right)+\left(\dfrac{1}{6}\cdot 1400\right)+\left(\dfrac{1}{6}\cdot 1200\right)+\left(\dfrac{1}{6}\cdot 1600\right)+\left(\dfrac{1}{6}\cdot 1400\right)+\left(\dfrac{1}{6}\cdot 1800\right)\\
&=1400,\text{unbiased.}\\
\end{align} 

\begin{align}
MSE &=\sum\limits_{i=1}^6 (\hat{\tau}-\tau)^2 \cdot P(s)\\
&= \left( (1000-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1400-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1200-1400)^2 \cdot \dfrac{1}{6}\right)\\
&+\left( (1600-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1400-1400)^2 \cdot \dfrac{1}{6}\right)+\left( (1800-1400)^2 \cdot \dfrac{1}{6}\right)\\
&= 66667\\
\end{align}

66667 is much larger than the MSE of \(\hat{\tau}_r\).