12.1 - Capture - Recapture Sampling

One of the popular methods to estimate the total number of individuals in a population is by capture-recapture sampling. In capture-recapture sampling, an initial sample is obtained and marked. A second sample is obtained independently and it is noted how many of the individuals in that sample were marked.

Example 1: To estimate the abundance of an animal population such as the deer population in the state of Pennsylvania.

Example 2: To estimate the total number of homeless individuals in a given city.

Single Recapture

Notation:

  • X - initial sample size captured and marked
  • y - second sample size recaptured independently
  • x - number of samples in the recaptured one that is marked
  • \(\tau\) - total population size

Question: How do we estimate the total population size?

Since the proportion of the marked subjects in the recaptured sample is likely to be about the same as the first sample in the whole population:

\(\dfrac{x}{y}\cong \dfrac{X}{\tau}\)
\(\hat{\tau}=\dfrac{y}{x}\cdot X\)

An estimate of the variance of \(\hat{\tau}\) is:

\(\hat{V}ar(\hat{\tau})=\dfrac{Xy(X-x)(y-x)}{x^3}\)

An approximate \(100(1-\sigma)%\) confidence interval is:

\(\hat{\tau} \pm z\sqrt{\hat{V}ar(\hat{\tau})}\)

To deal with the case when x = 0 and we do not want to estimate \(\tau\) by infinity, a modified estimator for \(\tau\) is:

\(\tilde{\tau}=\dfrac{(X+1)(y+1)}{x+1}-1\)

Try it! Section

  1. In a free concert given on the Old Main lawn, we want to estimate the number of attendees. How are you going to conduct sampling for this purpose?

    At the beginning of the concert, 500 Penn State t-shirts were randomly given out to attendees. 200 attendees are randomly sampled and we find that 40 have the Penn State t-shirt.

  2. How many total attendees are at the concert using values given in the answer to the question above?

     

    \(\hat{\tau}=\dfrac{y}{x}\cdot X =\dfrac{200}{40}\cdot 500=2500\)

    \(\hat{V}ar(\hat{\tau})=\dfrac{500\times 200(500-40)(200-40)}{40^3}=115000\)

    \(\hat{S}D(\hat{\tau})=339.16\)

    A 95% confidence interval is:

    2500 ± 1.96 × 339.16
    2500 ± 664.67

    Note! y can be larger than X