Lesson 12: Capture  Recapture Sampling, Random Response Model
Lesson 12: Capture  Recapture Sampling, Random Response ModelIn Section 12.1, we introduce capturerecapture sampling and discuss application of it to estimating population size. We then provide the formula for the variance of estimate. An example is provided for capturerecapture sampling. This is the direct sampling of capturerecapture where both the capture size and the second capture size are predetermined.
In Section 12.2, we discuss inverse sampling for capturerecapture where the initial capture size is predetermined but we sample until a fixed number of tagged items are recaptured. Here, the second capture size is random. An example is provided to compute the estimate as well as its estimated standard deviation.
In Section 12.3, the random response model is introduced to promote more truthful answers to senstive questions. An example is given to illustrate how to compute the estimate as well its estimated standard deviation.
Lesson 12: Ch. 18.1 of Sampling by Steven Thompson, 3rd edition
Objectives
 use capturerecapture sampling method to estimate population size,
 distinguish between capturerecapture method and inverse capturerecapture method,
 use inverse capturerecapture sampling method to estimate population size, and
 use random response model to deal with sensitive questions.
12.1  Capture  Recapture Sampling
12.1  Capture  Recapture SamplingOne of the popular method to estimate the total number of individuals in a population is by capturerecapture sampling. In capturerecapture sampling, an initial sample is obtained and marked. A second sample is obtained independently and it is noted how many of the individuals in that sample was marked.
Example 1: To estimate the abundance of an animal population such as the deer population in the state of Pennsylvania.
Example 2: To estimate the total number of homeless individuals in a given city.
Single Recapture
Notation:
 X  initial sample size captured and marked
 y  second sample size recaptured independently
 x  number of sample in the recaptured one that are marked
 \(\tau\)  total population size
Question: How do we estimate the total population size?
Since the proportion of the marked subjects in the recaptured sample is likely to be about the same as the first sample in the whole population:
An estimate of the variance of \(\hat{\tau}\) is:
\(\hat{V}ar(\hat{\tau})=\dfrac{Xy(Xx)(yx)}{x^3}\)
An approximate \(100(1\sigma)%\) confidence interval is:
\(\hat{\tau} \pm z\sqrt{\hat{V}ar(\hat{\tau})}\)
To deal with the case when x = 0 and we do not want to estimate \(\tau\) by infinity, a modified estimator for \(\tau\) is:
\(\tilde{\tau}=\dfrac{(X+1)(y+1)}{x+1}1\)
Try it!

In a free concert given on the Old Main lawn, we want to estimate the number of attendees. How are you going to conduct a sampling for this purpose?
At the beginning of the concert, 500 Penn State tshirts were randomly given out to attendees. 200 attendees are randomly sampled and we find that 40 have the Penn State tshirt..

How many total attendees are at the concert using values given in the answer to the question above?
\(\hat{\tau}=\dfrac{y}{x}\cdot X =\dfrac{200}{40}\cdot 500=2500\)
\(\hat{V}ar(\hat{\tau})=\dfrac{500\times 200(50040)(20040)}{40^3}=115000\)
\(\hat{S}D(\hat{\tau})=339.16\)
A 95% confidence interval is:
2500 ± 1.96 × 339.16
2500 ± 664.67Note! y can be larger than X
12.2  Inverse Sampling for CaptureRecapture
12.2  Inverse Sampling for CaptureRecaptureWhat we covered already is the direct sampling of capturerecapture, i.e., the size of both the initial sample (capture) size and the second sample (recapture) size are predetermined. When the second capture size is not predetermined, then we have:
Inverse Sampling for CaptureRecapture
Again, assume that an initial sample of X individuals is captured, tagged and released back into the population. Then, random sampling is conducted until x tagged individuals are recaptured. If y denotes the second sample size, then:
\(\hat{\tau}=\dfrac{y}{x}X\)
Note that for inverse sampling, x is fixed but y is random. The estimated variance of \(\hat{\tau}\) is:
\(\hat{V}ar(\hat{\tau})=\dfrac{X^2y(yx)}{x^2(x+1)}\)
Example 121: Number of Eagles
We want to estimate the total number of eagles in a wildlife preserve. A random sample of 200 eagles is trapped, tagged, and then released. In the same month, a second sample is drawn until 35 tagged eagles are recaptured. The sample size needed to get 35 tagged eagles is 100. (as opposed to having 100 eagles being recaptured to find 35 tagged ones in the direct capturerecapture).
Try it!
X = 200
x = 35, y = 100
\(\hat{\tau}=\dfrac{100}{35}\times 200=571.43\)
\(\hat{V}ar(\hat{\tau})=\dfrac{200^2\times 100(10035)}{35^2(35+1)}=5895.69\)
\(\hat{S}D(\hat{\tau})=76.78\)
12.3  Random Response Model
12.3  Random Response ModelPeople may lie about sensitive questions such as: "Have you used cocaine before?"
For these types of question, a question form that encourages truthful answers and makes people comfortable is useful.
Horvitz (1967) based on the idea from Warner (1965), suggests using two questions  the sensitive question and an unrelated question  and uses a randomization device to determine which is the question that the respondent should answer.
Example
Q1: Have you used cocaine before?
Q2: Is the second hand of your watch between 0 and 30?
The respondent will flip a coin and decide which question to answer whereas the interviewer does not know the outcome of the coin.
The randomization device can be anything but it must have:
 known probability t that the person is asked the sensitive question and probability 1  t that the person is asked other questions.
 the probability that the person responds yes to the other question is known.
Example 122: Tax return question
Q1: Have you ever falsified your tax return? Yes or no.
Q2: Flip a book and answer: is the page number odd? Yes or no.
The interviewer merely records the answer and does not know whether the respondent is answering Q1 or Q2.
We will conduct this survey on n subjects, n_{1} denotes the number of respondents who respond yes. How are we going to estimate the population proportion p?
t = 1/2
Here we write out what this tree diagram is expressing in terms of probability of yes:
\(P(\text{yes})=\dfrac{1}{2}\times p+\dfrac{1}{2}\times\dfrac{1}{2}=\dfrac{p}{2}+\dfrac{1}{4}\)
Let \(n_1\) denote the number of yes in n subjects
\(\dfrac{n_1}{n}=\dfrac{\hat{p}}{2}+\dfrac{1}{4}\)
\(\hat{p}=2\left(\dfrac{n_1}{n}\dfrac{1}{4}\right)\)
If the sample size is small compared to the population size, the finite correction factor can be omitted and the variance formula is:
\(\hat{V}ar(\hat{p})=\dfrac{4}{n}\times \dfrac{n_1}{n}\times (1\dfrac{n_1}{n})\)
Try it!
n = 400
\(n_1=128\)
\(\hat{p}=2\left(\dfrac{128}{400}\dfrac{1}{4}\right)=0.14\)
\(\hat{V}ar(\hat{p})=\dfrac{4}{400}\times \dfrac{128}{400}\times \left(1\dfrac{128}{400}\right)=0.0022\)
\(\hat{s}.d.(\hat{p})=0.047\)