3  Unequal Probability Sampling

Overview

This lesson starts with the rationale for using unequal probability sampling in section 3.1. We then discuss in section 3.2 the Hansen-Hurwitz estimator which may be used when the sampling is with replacement. In section 3.3, we introduce the Horvitz-Thompson estimator which can be used when the sampling is with or without replacement. In section 4, a small population example is used to illustrate some properties of these two estimators. Through this example, one can see that both estimators are unbiased.

Lesson 3: Ch. 6.1, 6.2, 6.4 of Sampling by Steven Thompson, 3rd Edition.

Objectives

Upon completion of this lesson, you should be able to:

  1. Identify the situations that unequal probability sampling is beneficial and explain why,
  2. Explain and perform unequal probability sampling,
  3. Compute the Hansen-Hurwitz estimator and its estimated variance,
  4. Compute the Horvitz-Thompson estimator and its estimated variance, and
  5. Demonstrate that Hansen-Hurwitz and Horvitz-Thompson these two estimators are unbiased through an artificial small population example.

3.1 Unequal Probability Sampling

In simple random sampling, the probability that each unit will be sampled is the same. Sometimes, estimates can be improved by varying the probabilities with which units are sampled.

For example, we want to estimate the number of job openings in a city by sampling firms in that city. Many of the firms in the city are small firms. If one uses s.r.s, the size of a firm is not taken into consideration and a typical sample will consist of mostly small firms. However, the number of job openings is heavily influenced by large firms.

Thus, we should be able to improve the estimate of the number of job openings by giving the large firms a greater chance to appear in the sample, for example, with probability proportional to size or proportional to some other relevant aspects.

Definition 3.1 (Selection Probabilities) On each draw, the probability that a given population unit will be selected is denoted as \(p_i, i = 1, 2, 3, \dots, N\).

Suppose that sampling is with replacement, the probability of selecting the ith unit in the population is \(p_i\).

If the selection probabilities are unequal, the sample mean is not unbiased for the population mean and the sample total is not unbiased for the population total.

Example: If larger firms are sampled with higher probability, the sample mean for job openings will be biased upward.

3.2 The Hansen-Hurwitz Estimator

Note! For this section, sampling is with replacement.

Try It!

Why do we use or talk about sampling with replacement?

When sampling with replacement, the variances tend to be larger. However, replacement formulas are simpler and easier to derive. When the sample size is small compared to \(N\), with and without replacement formulas are not too different. We often use the easier formula derived for the with replacement to approximate that for the without replacement formula.

Let \(p_i, i = 1, \dots, N\) denote the probability that a given population unit will be selected.

The Hansen-Hurwitz estimator is:

\[\hat{\tau}_p=\dfrac{1}{n} \sum\limits^n_{i=1} \dfrac {y_i}{p_i}\]

Since, \(E\left(\dfrac{y_i}{p_i}\right)=\tau\)

where \(\tau=\sum\limits^N_{i=1} y_i=\) the population total

thus, \(E(\hat{\tau}_p)=\tau\) and \(\hat{\tau}_p\) is an unbiased estimator for \(\tau\).

Since \(\operatorname{Var}\left(\dfrac{y_i}{p_i}\right)=\sum\limits^N_{i=1} p_i \left(\dfrac{y_i}{p_i}-\tau \right)^2\), \(\operatorname{Var}(\hat{\tau}_p)=\dfrac{1}{n}\sum\limits^N_{i=1} p_i \left(\dfrac{y_i}{p_i}-\tau \right)^2\)

An unbiased estimator for \(\operatorname{Var}(\hat{\tau}_p)\) is:

\[\hat{\operatorname{Var}}(\hat{\tau}_p)=\dfrac{1}{n}\cdot \dfrac{\sum\limits^n_{i=1} \left(\dfrac{y_i}{p_i}-\hat{\tau}_p \right)^2}{n-1}\]

and an approximate \((1 - \alpha)\) 100% confidence interval for \(\tau\) is:

\[\hat{\tau}_p \pm t \cdot \sqrt{\hat{\operatorname{Var}}(\hat{\tau}_p)}\]

For population mean \(\mu = \frac{\tau}{N}\) one uses:

\[\hat{\mu}_p=\dfrac{1}{N} \left(\dfrac{1}{n}\cdot \sum\limits^n_{i=1}\dfrac{y_i}{p_i}\right)=\dfrac{\hat{\tau}_p}{N}\]

\[E(\hat{\mu}_p)=\dfrac{\tau}{N}= \mu\]

\[\hat{\operatorname{Var}}(\hat{\mu}_p)=\dfrac{1}{N^2}\cdot \hat{\operatorname{Var}}(\hat{\tau}_p)\]

How do we perform unequal probability sampling according to given \(p_i\)?

Example 3.1 (Total Number of Computer Help Requests) Estimate the total number of computer help requests for last year in a large firm.

The director of a computer support department plans to sample 3 divisions of a large firm that has 10 divisions, with varying numbers of employees per division. Since the number of computer support requests within each division should be highly correlated with the number of employees in that division, the director decides to use unequal probability sampling with replacement with \(p_i\) proportional to the number of employees in that division.

Division Number of Employees \(p_i\) Assigned Numbers
1 1000 - -
2 650 - -
3 2100 - -
4 860 - -
5 2840 - -
6 1910 - -
7 390 - -
8 3200 - -
9 1500 - -
10 1200 - -
Total 15650 - -

Questions

  1. How do we practically implement unequal probability sampling according to the given \(p_i\)’s?

  2. With the divisions selected by probability proportional to size, how do we construct the Hansen-Hurwitz estimator for \(\tau\)?

Answer to A:

Division Number of Employees \(p_i\) Assigned Numbers
1 1000 1000/15650 1-1000
2 650 650/15650 1001-1650
3 2100 2100/15650 1651-3750
4 860 860/15650 3751-4610
5 2840 2840/15650 4611-7450
6 1910 1910/15650 7451-9360
7 390 390/15650 9361-9750
8 3200 3200/15650 9751-12950
9 1500 1500/15650 12951-14450
10 1200 1500/15650 14451-15650
Total 15650 1 -

Using Minitab

We can perform probability proportional to size by using Minitab to calculate this for us:

  1. Generate a column C1 that contains the value 1-15650

    Calc > Make patterned data > Simple set of numbers

    Simple set of numbers
  2. Sample 3 values with replacement from the column that contains 1-15650

    Calc >Sample from columns

    Sample from columns

    The values generated by Minitab are given below:

    Worksheet 1

    The values given by Minitab are 4728, 3590, and 15228. These numbers fall into division 5, division 3, and division 10, respectively.

So, we decide to sample division 5, division 3, and division 10. We check the record to find the number of requests for these divisions. The results are:

  • For division 5, \(y_1 = \text{the number requests} = 420\)
  • For division 3, \(y_2 = \text{the number of requests} = 1785\)
  • For division 10, \(y_3 = \text{the number of requests} = 2198\)

(For this random sample shown in the example, the division is distinct. For other random samples, it is possible that the same division may be selected more than once.)

The basic assumption is that number of requests is proportional to the size of the division.


Answer to B

Computing the Hansen-Hurwitz Estimator:

We will need to compute the Hansen-Hurwitz estimator for Example 3-1.

The Hansen-Hurwitz estimator for \(\tau\) is:

\[\begin{align} \hat{\tau}_p &= \dfrac{1}{3}\left(420 \cdot \dfrac{15650}{650}+1785 \cdot \dfrac{15650}{2840}+2198 \cdot \dfrac{15650}{3200}\right) \\ &=\dfrac{1}{3}(10112.31+9836.36+10749.59)\\ &=10232.75 \\ \end{align}\]

Each of the values, 10112.31, 9836.36, and 10749.59, look fairly stable so it looks like the variance will not be too large.

\[\begin{align} \hat{\operatorname{Var}}(\hat{\tau}_p) &=\dfrac{1}{3}\cdot \dfrac{\sum\limits^3_{i=1} \left(\dfrac{y_i}{p_i}-\hat{\tau}_p \right)^2}{3-1}\\ &=\dfrac{1}{3}\cdot \dfrac{1}{2}((10112.31-10232.75)^2+(9836.36-10232.75)^2+(10749.59-10232.75)^2)\\ &=73125.74\\ \end{align}\]

\[\hat{\text{SD}}(\hat{\tau}_p)=270.418\]

(See Example 1 on p. 68-69 in the text to see an example of when a unit is chosen more than once.)

We will see that in this example \(p_i\) are chosen proportional to the values of a known positive auxiliary variable such as size, \(p_i=\dfrac{x_i}{\sum x_i}\), the Hansen-Hurwitz estimator is also called PPS (probability proportional to size).

Now, we need to ask ourselves, when and why would we need to use an unequal probability sampling? Let’s think about the ‘when’ first.

When would we elect to use PPS? What about if we were sampling from Penn State departments? They are of very different sizes, some are very large and others are very small. Would we automatically choose to use PPS? The idea is that the thing that you are interested in has to be related to the size. If the thing that you are interested in is related to size, then you would want to use PPS. However, if what you are interested in has nothing to do with the size of the department, then there is no reason to use PPS.

Now, let us address the ‘why’. By definition,

\[\tau=\sum\limits_{i=1}^N y_i \text{ and } \operatorname{Var}(\hat{\tau}_p)=\dfrac{1}{n}\sum\limits^N_{i=1} p_i \left(\dfrac{y_i}{p_i}-\tau \right)^2\]

For the special and unrealistic case \(\frac{y_i}{p_i}=\) constant, the constant will be \(\tau\) and the \(\operatorname{Var}(\hat{\tau}_p)\) will be zero. Therefore, you want \(\frac{y_i}{p_i}\) to be close to a constant. However, in reality, prior to sampling, the \(y_i\) are unknown and we can not choose \(p_i\) proportional to \(y_i\). If we know \(y_i\) is approximately proportional to a known variable such as \(x_i\), then we can choose \(p_i\) proportional to \(x_i\). \(\hat{\tau}_p\) will have low variances.

Example 3.2 (Total Number of Palm Trees) We want to estimate the total number of palm trees on 100 islands in a tropical paradise. The area of each island is known and it is reasonable to think that the number of palm trees on each island is approximately proportional to the size of the island.

We know that the sizes of the island are given (e.g., the size of island 1 is 1 square mile, the size of island 29 is 5 square miles and the size of island 36 is 2 square miles). The total size of these 100 islands is 100 square miles. We find that \(p_i, \dots, p_N\) are:

Island Tree Probabilities p10.01p2p30.040.09p290.05p36p1000.010.310.360.400.420.9910.010.030.020.05..................

How can we sample 4 islands by probabilities \(p_1, \dots, p_{100}\)?

Answer

  1. Assign an interval width of \(p_i\) to ith unit
  2. Generate 4 random numbers from a uniform distribution on [0,1].
  3. Choose the units that correspond to the interval containing the random number.

In this example, we use Minitab > Calc > Random data > Uniform and get: 0.335257, 0.0065551, 0.401869, 0.318977

The units selected are the islands 29, 1, 36, and 29, (since 0.335257 falls between 0.31 and 0.36, 0.0065551 falls between 0 and 0.01, 0.401869 falls between 0.40 and 0.42, and 0.318977 falls between 0.31 and 0.36.) The measurements (\(y_i\)) are:

\(i\) size \(p_i\) \(y_i\)
1 1 0.01 14
29 5 0.05 50
29 5 0.05 50
36 2 0.02 25

Given these results we should now be able to estimate how many total palm trees are there on all of the islands put together:

\[\begin{align} \hat{\tau}_p &=\dfrac{1}{4}\left(\dfrac{14}{0.01}+\dfrac{50}{0.05}+\dfrac{50}{0.05}+\dfrac{25}{0.02}\right) \\ &=\dfrac{1}{4}(1400+1000+1000+1250)\\ &=1162.5 \\ \end{align}\]

\[\begin{align} \hat{\operatorname{Var}}(\hat{\tau}_p) &=\dfrac{1}{n(n-1)} \sum\limits^n_{i=1} \left(\dfrac{y_i}{p_i}-\hat{\tau}_p \right)^2\\ &=\dfrac{1}{4\cdot3}[ (1400-1162.5)^2+(1000-1162.5)^2+(1000-1162.5)^2+(1250-1162.5)^2]\\ &=9739.58\\ \end{align}\]

\[\hat{\text{SD}}(\hat{\tau}_p)=98.69\]

If we are interested in the mean number of trees per island in that population, then:

\[\hat{\mu}_p=\dfrac{\hat{\tau}_p}{N}=\dfrac{1162.5}{100}=11.625\]

\[\begin{align} \hat{\operatorname{Var}}(\hat{\mu}_p) &=\dfrac{1}{N^2} \cdot \hat{\operatorname{Var}}(\hat{\tau}_p)\\ &=\dfrac{1}{(100)^2}\cdot 9739.58\\ &=0.973958\\ \end{align}\]

\[\hat{\text{SD}}(\hat{\mu}_p)=0.987\]

3.3 The Horvitz-Thompson Estimator

Horvitz-Thompson (1952) introduced an unbiased estimator for \(\tau\) for any design, with or without replacement.

Definition 3.2 (Horvitz-Thompson Estimator) \(\pi_i, i = 1, \dots, N\) are given positive numbers that represent the probability that unit i is included in the sample under a given sampling scheme. The Horvitz-Thompson estimator is:

\[\hat{\tau}_\pi=\sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\]

Where \(\nu\) is the distinct number of units in the sample. The Horvitz-Thompson estimator does not depend on the number of times a unit may be selected. Each distinct unit of the sample is utilized only once.

Read section 6.5 in the text. The section reviews the proofs for how the following two formula are derived.

Note that:

\[E(\hat{\tau}_\pi)=\tau\]

\[\operatorname{Var}(\hat{\tau}_\pi)=\sum\limits_{i=1}^N \left( \dfrac{1-\pi_i}{\pi_i}\right)y^2_i + \sum\limits_{i=1}^N \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}-\pi_i \pi_j}{\pi_i \pi_j}\right) y_i y_j\]

where \(\pi_{ij} > 0\) denotes the probability that both unit \(i\) and unit \(j\) are included.

The estimated variance of the Horvitz-Thompson estimator is given by:

\[\hat{\operatorname{Var}}(\hat{\tau}_\pi)=\sum\limits_{i=1}^v \left( \dfrac{1-\pi_i}{\pi^2_i} \right) y^2_i + \sum\limits_{i=1}^v \sum\limits_{j\neq i} \left( \dfrac{\pi_{ij}-\pi_i \pi_j}{\pi_i \pi_j}\right)\dfrac{1}{\pi_{ij}} y_i y_j\]

Where \(\pi_{ij} > 0\) denotes the probability that both unit \(i\) and \(j\) are included.

An approximate \((1-\alpha)\) 100% CI for \(\tau\) is:

\[\hat{\tau}_\pi \pm t_{\alpha/2} \sqrt{\hat{\operatorname{Var}}(\hat{\tau}_\pi)}\]

where \(t\) has \(\nu - 1 df\)

Example 3.3 (Estimating the Total Number of Palm Trees with Horvitz-Thompson Estimator) The Horvitz-Thompson estimator of the total number of palm trees. Since, for that example the sample is with replacement, the \(n\) draws are independent. It is relatively easy to compute the \(\pi_{i}\)’s .

For a sample with replacement, we will compute:

\[\begin{align} \pi_i &= \text{the probability of inclusion of the ith unit}\\ &= 1-P(\text{ith unit is not included})\\ &= 1-(1-p_i)^n \\ \end{align}\]

Recall: Samples 1, 29 and 36 are selected.

Since \(p_1=0.01\), \(\pi_1=1-(1-0.01)^4=0.0394\), and

\(p_2=0.05\), \(\pi_2=1-(1-0.05)^4=0.1855\)

\(p_3=0.02\), \(\pi_3=1-(1-0.02)^4=0.0776\)

Therefore,

\[\begin{align} \hat{\tau}_\pi &= \sum\limits_{i=1}^\nu \dfrac{y_i}{\pi_i}\\ &= \dfrac{14}{0.0394}+\dfrac{50}{0.1855}+\dfrac{25}{0.0776}\\ &= 947.037\\ \end{align}\]

Next, we need to compute the estimated variance, \(\hat{\operatorname{Var}}(\hat{\tau}_\pi)\). For this, we need to compute \(\pi_{ij}\).

Since:

\[\begin{align} P(A\cap B) &= P(A)+P(B)-P(A\cup B)\\ &= P(A)+P(B)-[1-P(A^c \cap B^c)]\\ \end{align}\]

Then we get:

\[\pi_{ij}=\pi_i+\pi_j-[1-(1-p_i-p_j)^n]\]

This means that we have to run through each of the unit pairs such as:

\[\pi_{12}=0.0394+0.1855-[1-(1-0.01-0.05)^4]=0.00565\]

\[\pi_{13}=0.0394+0.0776-[1-(1-0.01-0.02)^4]=0.00229\]

\[\pi_{23}=0.1855+0.0776-[1-(1-0.05-0.02)^4]=0.01115\]

Plugging in the values, we obtain:

\[\hat{\operatorname{Var}}(\hat{\tau}_\pi)=92692.9\]

Thus,

\[\hat{\text{SD}}(\hat{\tau}_\pi)=\sqrt{92692.9}=304.455\]

where \(\nu\) = the # of distinct units, \(\nu = 3\), therefore the \(df = \nu - 1 = 2\)

Try It!

Is there some popular estimator that can be derived as a Horvitz-Thompson estimator?

Yes, under simple random sampling, the inclusion probability of the \(i^{th}\) unit is: $$\pi_i=n/N$$ Thus, $$\begin{align} \hat{\tau}_\pi &= \sum\limits_{i=1}^n \dfrac{y_i}{\pi_i}\\ &= \sum\limits_{i=1}^n \dfrac{y_i}{n} \cdot N\\ &= N \bar{y}\\ \end{align}$$

3.4 Small Population Example

Example 3.4 (Wheat Production) (Reference Section 6.4 of the text)

unit (Farm), i 1 2 3
Selection Prob, pi 0.3 0.2 0.5
Wheat produced 11 6 25

\(N = 3\) farms, \(n = 2\) sample with replacement.

\(s\) \(p(s)\) \(y_s\)
1, 1 0.3(0.3) = 0.09 (11, 11)
2, 2 0.2(0.2) = 0.04 (6, 6)
3, 3 0.5(0.5) = 0.25 (25, 25)
1, 2 0.3(0.2) = 0.06 (11, 6)
2, 1 0.2(0.3) = 0.06 (6, 11)
1, 3 0.3(0.5) = 0.15 (11, 25)
3, 1 0.5(0.3) = 0.15 (25, 11)
2, 3 0.2(0.5) = 0.10 (6, 25)
3, 2 0.5(0.2) = 0.10 (25, 6)

Question: Compute the Hansen-Hurwitz estimator.

Answer

When (1,1) is sampled, the Hansen-Hurwitz estimator is:

\[\hat{\tau}_p=\dfrac{1}{2}\left(\dfrac{y_1}{p_1}+\dfrac{y_1}{p_1}\right)=\dfrac{1}{2}\left(\dfrac{11}{0.3}+\dfrac{11}{0.3}\right)=36.67\]

When (1,2) is sampled, the Hansen-Hurwitz estimator is:

\[\hat{\tau}_p=\dfrac{1}{2}\left(\dfrac{y_1}{p_1}+\dfrac{y_2}{p_2}\right)=\dfrac{1}{2}\left(\dfrac{11}{0.3}+\dfrac{6}{0.2}\right)=33.33\]

Similarly, we can fill out the table and get the Hansen-Hurwitz estimators as shown below:

\(s\) \(p(s)\) \(y_s\) \(\hat{\tau}_p\)
1, 1 0.3(0.3) = 0.09 (11, 11) 36.67
2, 2 0.2(0.2) = 0.04 (6, 6) 30.00
3, 3 0.5(0.5) = 0.25 (25, 25) 50.00
1, 2 0.3(0.2) = 0.06 (11, 6) 33.33
2, 1 0.2(0.3) = 0.06 (6, 11) 33.33
1, 3 0.3(0.5) = 0.15 (11, 25) 43.33
3, 1 0.5(0.3) = 0.15 (25, 11) 43.33
2, 3 0.2(0.5) = 0.10 (6, 25) 40.00
3, 2 0.5(0.2) = 0.10 (25, 6) 40.00

Question: Compute the Horvitz-Thompson estimator.

Answer

\[\pi_1=0.09+0.06+0.06+0.15+0.15=0.51\]

\[\pi_2=0.04+0.06+0.06+0.10+0.10=0.36\]

\[\pi_3=0.25+0.15+0.15+0.10+0.10=0.75\]

When (1,1) is sampled, the Horvitz-Thompson estimator is:

\[\hat{\tau}_\pi=\left(\dfrac{11}{0.51}\right)=21.57\]

When (1,2) is sampled, the Horvitz-Thompson estimator is:

\[\hat{\tau}_\pi=\left(\dfrac{11}{0.51}+\dfrac{6}{0.36}\right)=38.24\]

Similarly, we can fill out the table and get the Horvitz-Thompson estimators as shown below:

\(s\) \(p(s)\) \(y_s\) \(\hat{\tau}_p\) \(\hat{\tau}_\pi\)
1, 1 0.3(0.3) = 0.09 (11, 11) 36.67 21.57
2, 2 0.2(0.2) = 0.04 (6, 6) 30.00 16.67
3, 3 0.5(0.5) = 0.25 (25, 25) 50.00 33.33
1, 2 0.3(0.2) = 0.06 (11, 6) 33.33 38.24
2, 1 0.2(0.3) = 0.06 (6, 11) 33.33 38.24
1, 3 0.3(0.5) = 0.15 (11, 25) 43.33 54.90
3, 1 0.5(0.3) = 0.15 (25, 11) 43.33 54.90
2, 3 0.2(0.5) = 0.10 (6, 25) 40.00 50.00
3, 2 0.5(0.2) = 0.10 (25, 6) 40.00 50.00
Mean - - 42 42
Variance - - 34.67 146.46

Mean of \(\hat{\tau}_p = E(\hat{\tau}_p)=\sum{p(s) \cdot \hat{\tau}_p (s)}\)

\[= 0.09\times36.67+0.04\times30.00+0.25\times50.00+0.06\times33.33+0.06\times33.33\]

\[+0.15\times43.33+0.15\times43.33+0.10\times40.00+0.10\times40.00\]

\[= 42\]

From the table above we can see that both \(\hat{\tau}_p\) and \(\hat{\tau}_\pi\) are unbiased. This example is a small population example to illustrate conceptually the properties of these estimators. We can compute the variance for \(\hat{\tau}_p\) and the variance for \(\hat{\tau}_\pi\) directly from the definition of variance.

Since \(\text{mean of }\hat{\tau}_p\) is 42 and \(E(g) = \sum p(s)*g(s)\), we can compute the variance of \(\hat{\tau}_p\) as:

\[\operatorname{Var}(\hat{\tau}_p) = E[\hat{\tau}_p-\text{mean of }\hat{\tau}_p]^2\]

\[= 0.09\times(36.67-42)^2+0.04\times(30.00-42)^2+0.25\times(50.00-42)^2+0.06\times(33.33-42)^2\]

\[+0.06\times(33.33-42)^2+0.15\times(43.33-42)^2+0.15\times(43.33-42)^2+0.10\times(40.00-42)^2\]

\[+0.10\times(40.00-42)^2\]

\[=34.67\]

Similarly, we can compute that the variance of \(\hat{\tau}_\pi\) is 146.46.

Now, how do we compute the MSE of \(\hat{\tau}_p\)?

By definition, \(\operatorname{MSE}(\hat{\tau}_p) = E(\hat{\tau}_p-\tau)^2\), in this case, since \(\hat{\tau}_p\) is unbiased and \(E(\hat{\tau}_p) =\tau\), \(\operatorname{MSE}(\hat{\tau}_p)\) is the same as \(\operatorname{Var}(\hat{\tau}_p)\).

Remark 1

The above demonstration is just a teaching tool. In reality, we will not know the population and will not come across small population problems like this other than in exams and assignments. What we know is:

Unit Selection probability
1 0.3
2 0.2
3 0.5

And, we draw a sample. If the sample we draw is (1,2) then \(\hat{\tau}_p = 33.33\) and \(\hat{\tau}_\pi=38.24\).

We will not be able to find the real population total nor the real variance of the estimator. However, we will be able to estimate them.

Remark 2

Now, should we use \(\hat{\tau}_p\) or should we use \(\hat{\tau}_\pi\)?

There are no clear answers. Both estimators ar acceptable when \(y_i\) and \(p_i\) are proportional.