3.3  The HorvitzThompson Estimator
3.3  The HorvitzThompson EstimatorHorvitzThompson (1952) introduced an unbiased estimator for \(\tau\) for any design, with or without replacement.
 HorvitzThompson estimatorample

\(\pi_i\), i = 1, ... , N are given positive numbers that represent the probability that unit i is included in the sample under a given sampling scheme. The HorvitzThompson estimator is:
\(\hat{\tau}_\pi=\sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\)
Where \(\nu\) is the distinct number of units in the sample. The HorvitzThompson estimator does not depend on the number of times a unit may be selected. Each distinct unit of the sample is utilized only once.
Read section 6.5 in the text. The section reviews the proofs for how the following two formula are derived.
Note that:
\(E(\hat{\tau}_\pi)=\tau\)
\(Var(\hat{\tau}_\pi)=\sum\limits_{i=1}^N \left( \dfrac{1\pi_i}{\pi_i}\right)y^2_i + \sum\limits_{i=1}^N \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}\pi_i \pi_j}{\pi_i \pi_j}\right) y_i y_j\)
where \(\pi_{ij}\) > 0 denotes the probability that both unit i and unit j are included.
The estimated variance of the HorvitzThompson estimator is given by:
\(\hat{V}ar(\hat{\tau}_\pi)=\sum\limits_{i=1}^v \left( \dfrac{1\pi_i}{\pi^2_i} \right) y^2_i + \sum\limits_{i=1}^v \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}\pi_i \pi_j}{\pi_i \pi_j}\right)\dfrac{1}{\pi_{ij}} y_i y_j\)
Where \(\pi_{ij}\) > 0 denotes the probability that both unit i and j are included.
An approximate (1\(\alpha\)) 100% CI for \(\tau\) is:
\(\hat{\tau}_\pi \pm t_{\alpha/2} \sqrt{\hat{V}ar(\hat{\tau}_\pi)}\)
where t has \(\nu\)  1 df
Example 33: Estimating the Total Number of Palm Trees with HorvitzThompson Estimator
The HorvitzThompson estimator of the total number of palm trees. Since, for that example the sample is with replacement, the n draws are independent. It is relatively easy to compute the \(\pi_{i}\)'s .
For sample with replacement, we will compute:
\begin{align}
\pi_i &= \text{the probability of inclusion of the ith unit}\\
&= 1P(\text{ith unit is not included})\\
&= 1(1p_i)^n \\
\end{align}
Recall: Samples 1, 29 and 36 are selected.
Since \(p_1=0.01\), \(\pi_1=1(10.01)^4=0.0394\), and
\(p_2=0.05\), \(\pi_2=1(10.05)^4=0.1855\)
\(p_3=0.02\), \(\pi_3=1(10.02)^4=0.0776\)
Therefore,
\begin{align}
\hat{\tau}_\pi &= \sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\\
&= \dfrac{14}{0.0394}+\dfrac{50}{0.1855}+\dfrac{25}{0.0776}\\
&= 947.037\\
\end{align}
Next, we need to compute the estimated variance, \(\hat{V}ar(\hat{\tau}_\pi)\). For this, we need to compute \(\pi_{ij}\) .
Since \begin{align}
P(A\cap B) &= P(A)+P(B)P(A\cup B)\\
&= P(A)+P(B)[1P(A^c \cap B^c)]\\
\end{align}
Then we get:
\(\pi_{ij}=\pi_i+\pi_j[1(1p_ip_j)^n]\)
This means that we have to run through each of the unit pairs such as:
\(\pi_{12}=0.0394+0.1855[1(10.010.05)^4]=0.00565\)
\(\pi_{13}=0.0394+0.0776[1(10.010.02)^4]=0.00229\)
\(\pi_{23}=0.1855+0.0776[1(10.050.02)^4]=0.01115\)
plugging in the values, we obtain:
\(\hat{V}ar(\hat{\tau}_\pi)=92692.9\)
Thus, \(\hat{S}D(\hat{\tau}_\pi)=\sqrt{92692.9}=304.455\)
where \(\nu\) = the # of distinct units, \(\nu\) = 3, therefore the df = \(\nu\)  1 = 2
Try it!
Yes, under simple random sampling, the inclusion probability of the ith unit is:
\(\pi_i=n/N\)
Thus,
\begin{align}
\hat{\tau}_\pi &= \sum\limits_{i=1}^n \dfrac{y_i}{\pi_i}\\
&= \sum\limits_{i=1}^n \dfrac{y_i}{n} \cdot N\\
&= N \bar{y}\\
\end{align}