11.2 - Interpenetrating Subsample

There are k interviewers and they are each different in their manner of interviewing and hence may obtain slightly different responses. To make the notation simple, we assume that each interviewer conducts the same number of interviews. Let n denote the total sample size and n = k * m. There are k subsamples and each interviewer will be assigned m subjects.

Objective: to use simple random sampling to estimate \(\mu\)

  • Interviewer \(1-y_{11}, y_{12}, y_{13},...,y_{1m}\)
  • Interviewer \(2-y_{21}, y_{22}, y_{23},...,y_{2m}\)
  • Interviewer \(3-y_{31}, y_{32}, y_{33},...,y_{3m}\)
  • Interviewer \(k-y_{k1}, y_{k2}, y_{k3},...,y_{km}\)

The average for the ith interviewer is denoted as:

\(\bar{y}_i=\dfrac{1}{m}\sum\limits_{j=1}^m y_{ij}\)

The grand average is denoted as:

\(\bar{y}=\dfrac{1}{k}\sum\limits_{i=1}^k \bar{y}_i\)

The grand average \(\bar{y}\) is unbiased for μ and the estimated variance of \(\bar{y}\) is:

\(\hat{V}ar(\bar{y})=\dfrac{N-n}{N}\cdot \dfrac{s^2_k}{k}\)

\(\text{where } s^2_k=\dfrac{\sum\limits_{i=1}^k (\bar{y}_i-\bar{y})^2}{k-1}\)

The technique of interpenetrating the subsample gives an estimate of the variance of ybar that accounts for interviewer biases. In practice, the estimated variance given in the above formula is usually larger than the estimate of the variance by using simple random sampling.

Example 11-3: Interpenetrating subsample Section

A researcher has 10 research assistants, each with his/her own equipment that they use to measure the time (in seconds) it takes for people to respond to a command. A simple random sample of 80 people is taken. Since the researcher believes the assistants will produce slightly biased measurements, he decides to randomly divide the 80 people into 10 subsamples of 8 persons each. Each assistant is then assigned to one subsample. The measurements are given in the following table.

assistant time it takes to respond
1

52

73 62 75 71 68 55 65
2 62 65 73 67 78 71 67 59
3 43 54 52 48 56 51 62 57
4 73 64 63 59 71 78 67 76
5 88 76 69 83 85 66 74 73
6 55 71 63 75 68 72 69 60
7 72 65 77 69 74 82 73 67
8 55 43 58 62 42 61 53 61
9 62 52 59 63 69 72 64 58
10 77 65 79 69 72 68 71 67

Minitab output:

  mean
Subsample 1 65.125
Subsample 2 67.750
Subsample 3 52.875
Subsample 4 68.875
Subsample 5 76.750
Subsample 6 66.625
Subsample 7 72.375
Subsample 8 54.375
Subsample 9 62.375
Subsample 10 71.000

Try it!

Estimate the mean and the variance of the estimate.

We estimate the mean by:

\(\bar{y}=\dfrac{1}{10}(\bar{y}_1+\bar{y}_2+\ldots+\bar{y}_{10})=\dfrac{1}{10}(65.125+\ldots+71.000)=65.81\)

Its variance is estimated to be:

\(\hat{V}ar(\bar{y})=\dfrac{\sum\limits_{i=1}^k (\bar{y}_i-65.81)^2}{(10-1)\times 10}=5.72\)

\(\hat{S}D(\bar{y})=2.39\)

If one neglects the interviewer effect, then \(\hat{S}D(\bar{y})\approx 1\), thus it is important to take into consideration the interviewer effect. Otherwise, one underestimates \(\hat{S}D(\bar{y})\).