3.3 - The Horvitz-Thompson Estimator

Horvitz-Thompson (1952) introduced an unbiased estimator for \(\tau\) for any design, with or without replacement.

Horvitz-Thompson estimatorample

\(\pi_i\), i = 1, ... , N are given positive numbers that represent the probability that unit i is included in the sample under a given sampling scheme. The Horvitz-Thompson estimator is:

\(\hat{\tau}_\pi=\sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\)

Where \(\nu\) is the distinct number of units in the sample. The Horvitz-Thompson estimator does not depend on the number of times a unit may be selected. Each distinct unit of the sample is utilized only once.

 Read section 6.5 in the text. The section reviews the proofs for how the following two formula are derived.

Note that:

\(E(\hat{\tau}_\pi)=\tau\)

\(Var(\hat{\tau}_\pi)=\sum\limits_{i=1}^N \left( \dfrac{1-\pi_i}{\pi_i}\right)y^2_i + \sum\limits_{i=1}^N \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}-\pi_i \pi_j}{\pi_i \pi_j}\right) y_i y_j\)

where \(\pi_{ij}\) > 0 denotes the probability that both unit i and unit j are included.

The estimated variance of the Horvitz-Thompson estimator is given by:

\(\hat{V}ar(\hat{\tau}_\pi)=\sum\limits_{i=1}^v \left( \dfrac{1-\pi_i}{\pi^2_i} \right) y^2_i + \sum\limits_{i=1}^v \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}-\pi_i \pi_j}{\pi_i \pi_j}\right)\dfrac{1}{\pi_{ij}} y_i y_j\)

Where \(\pi_{ij}\) > 0 denotes the probability that both unit i and j are included.

An approximate (1-\(\alpha\)) 100% CI for \(\tau\) is:

\(\hat{\tau}_\pi \pm t_{\alpha/2} \sqrt{\hat{V}ar(\hat{\tau}_\pi)}\)

where t has \(\nu\) - 1 df

Example 3-3: Estimating the Total Number of Palm Trees with Horvitz-Thompson Estimator Section

The Horvitz-Thompson estimator of the total number of palm trees. Since, for that example the sample is with replacement, the n draws are independent. It is relatively easy to compute the \(\pi_{i}\)'s .

For sample with replacement, we will compute:

\begin{align}
\pi_i &= \text{the probability of inclusion of the ith unit}\\
&= 1-P(\text{ith unit is not included})\\
&= 1-(1-p_i)^n \\
\end{align}

Recall: Samples 1, 29 and 36 are selected.

Since \(p_1=0.01\), \(\pi_1=1-(1-0.01)^4=0.0394\), and

\(p_2=0.05\), \(\pi_2=1-(1-0.05)^4=0.1855\)
\(p_3=0.02\), \(\pi_3=1-(1-0.02)^4=0.0776\)

Therefore,

\begin{align}
\hat{\tau}_\pi &= \sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\\
&= \dfrac{14}{0.0394}+\dfrac{50}{0.1855}+\dfrac{25}{0.0776}\\
&= 947.037\\
\end{align}

Next, we need to compute the estimated variance, \(\hat{V}ar(\hat{\tau}_\pi)\). For this, we need to compute \(\pi_{ij}\) .

Since \begin{align}
P(A\cap B) &= P(A)+P(B)-P(A\cup B)\\
&= P(A)+P(B)-[1-P(A^c \cap B^c)]\\
\end{align}

Then we get:

\(\pi_{ij}=\pi_i+\pi_j-[1-(1-p_i-p_j)^n]\)

This means that we have to run through each of the unit pairs such as:

\(\pi_{12}=0.0394+0.1855-[1-(1-0.01-0.05)^4]=0.00565\)

\(\pi_{13}=0.0394+0.0776-[1-(1-0.01-0.02)^4]=0.00229\)

\(\pi_{23}=0.1855+0.0776-[1-(1-0.05-0.02)^4]=0.01115\)

plugging in the values, we obtain:

\(\hat{V}ar(\hat{\tau}_\pi)=92692.9\)

Thus, \(\hat{S}D(\hat{\tau}_\pi)=\sqrt{92692.9}=304.455\)

where \(\nu\) = the # of distinct units, \(\nu\) = 3, therefore the df = \(\nu\) - 1 = 2

Try it! Section

Is there some popular estimator that can be derived as a Horvitz-Thompson estimator?

Yes, under simple random sampling, the inclusion probability of the ith unit is:

\(\pi_i=n/N\)

Thus,

\begin{align}
\hat{\tau}_\pi &= \sum\limits_{i=1}^n \dfrac{y_i}{\pi_i}\\
&= \sum\limits_{i=1}^n \dfrac{y_i}{n} \cdot N\\
&= N \bar{y}\\
\end{align}