8.1 - Systematic Sampling

8.1 - Systematic Sampling

Suppose you have a number of students lined up in a row:

1 2 3 4 5 6 7 8 9 10 11 12

Here we might take a sample of every 4 elements, or 1 in 4 elements from the population. (1, 5, 9) or (2, 6, 10), etc. There are four primary units: (1, 5, 9), (2, 6, 10), (3, 7, 11), and (4, 8, 12).

To sample systematically from a field, the following is one example:

Systematic Sampling
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

There are four primary units: (1, 3, 9, 11), (2, 4, 10, 12), (5, 7, 13, 15), and (6, 8, 14, 16).

How do we draw a 1 in k systematic sample?

Example: Suppose our population is 9,000 students and we want to sample 1,200 students. How do we sample these students systematically?

Since, 9000/1200 = 7.5, we can perform a 1-in-7 systematic sample. Or, we should sample every 7th student. We can pick a starting point randomly from 1 to 600 and sample every 7th student from that on until we have reached 1200 samples.

How do we estimate the variance of this single systematic sample?

We can not use the formula:

\(s^2_u=\dfrac{1}{n-1}\sum\limits_{i=1}^n (y_i-\bar{y})^2\)

since n = 1. Only one primary unit is selected.

If the population is randomly ordered, then there is no problem. We can estimate the variance \(\sigma^2\) by:

\(s^2=\dfrac{\sum\limits_{j=1}^{M_1}(y_{1j}-\bar{y}_1)^2}{M_1-1}\)

However, when the population is ordered, systematic sampling is usually better than simple random sampling and the above formula will overestimate the variance.

When the population is periodic, the systematic sampling may be worse than the simple random sampling and the above formula will underestimate the variance since if the period k is chosen poorly, then the elements sampled may be too similar to each other.

Repeated Systematic Sampling

Unless the population is randomly ordered we can't use the naive method to compute the variance. [Look in the textbook on page 162 for more advanced ways.] Thus, we need more than one primary unit.

Example 8-1: Repeated systematic sampling of ferry cars

( see p.247 of Scheaffer, Mendenhall and Ott)

A ferry that carries cars across a bay charges a fee by carload rather than by a person. The ferry company wants to estimate the average number of people per car for August. The company knows from last year that 400 cars took the ferry and they want to sample 80 cars. To facilitate the estimation of the variance of the systematic sample the investigator chooses to use repeated systematic sampling with 10 samples of 8 cars each. Use the data given in the following table to estimate the average number of persons per car and also provide an estimate of the variance.

How do we obtain the random numbers for repeated systematic sampling?

We will select 10 repeated samples with 8 samples in each, so we choose 1-in-400/8 = 50. From the values 1 to 50, 10 numbers are selected without replacement and we start from those 10 numbers to get 10 samples of 1-in-50 systematic samples.

The 10 numbers sampled randomly without replacement from 1 to 50 are 2, 5, 7, 13, 26, 31, 35, 40, 45, and 46. In the following table, the car that will be sampled is listed with the number of people per car (the response) in parentheses.

Random starting point Second element Third element Fourth element Fifth element Sixth element Seventh element Eighth element \(\bar{y}_i\) mean
2(3) 52(4) 102(5) 152(3) 202(6) 252(1) 302(4) 352(4) 3.75
5(5) 55(3) 105(4) 155(2) 205(4) 255(2) 305(3) 355(4) 3.38
7(2) 57(4) 107(6) 157(2) 207(3) 257(2) 307(1) 357(3) 2.88
13(6) 63(4) 113(6) 163(7) 213(2) 263(3) 313(2) 363(7) 4.62
26(4) 76(5) 126(7) 176(4) 226(2) 276(6) 326(2) 376(6) 4.50
31(7) 81(6) 131(4) 181(4) 231(3) 281(6) 331(7) 381(5) 5.25
35(3) 85(3) 135(2) 185(3) 135(6) 285(5) 235(6) 385(8) 4.50
40(2) 90(6) 140(2) 190(5) 240(5) 290(4) 340(4) 390(5) 4.12
45(2) 95(6) 145(3) 195(6) 245(4) 295(4) 345(5) 395(4) 4.25
46(6) 96(5) 146(4) 196(6) 246(3) 296(3) 346(5) 396(3) 4.38

Try it!

For the above "Passengers in a car" example, determine the following:
  • The total number of primary units N =?
  • The number of primary units sampled n =?
  • The number of secondary units in the ith primary unit \(M_i\) =?
  • The total number of secondary units in the population M =?
  • The total number of primary units N = 50
  • The number of primary units sampled n = 10
  • The number of secondary units in the ith primary unit \(M_i\) = 8
  • The total number of secondary units in the population
  • \(M=\sum\limits_{i=1}^{50}M_i=400\)

To estimate the population mean \(\mu\) =\(\tau\)/ M we can use the unbiased estimator. The estimator is:

\(\hat{\mu}=\dfrac{\hat{\tau}}{M}=\sum\limits_{i=1}^n \dfrac{\bar{y}_i}{n}=4.16\)

\(\text{where } \bar{y}_i=\dfrac{y_i}{M_i}=\dfrac{\sum\limits_{j=1}^{M_i} y_{ij}}{M_i} \text{for }i=1,2,\ldots,n.\)

In this example, \(\overline{M}=M_1=M_2=\ldots=M_n\)

Try it!

Compute the variance of the above estimator.

\begin{align}
\hat{V}ar(\hat{\mu}) &= \dfrac{M-n\cdot \bar{M}}{M}\cdot \dfrac{1}{n(n-1)} \cdot  \sum\limits_{i=1}^n (\bar{y}_i-\hat{\mu})^2\\
&= \dfrac{400-10 \cdot 8}{400} \cdot \dfrac{1}{10(9)}\cdot [(3.75-4.16)^2+\cdots+(4.38-4.16)^2]\\
&= 0.0365\\
\end{align}

Try it!

When we use confidence intervals to estimate μ we use t, what are the degrees of freedom? {Hint: consider how many primary units you have and then compute the degree of freedom}
There are 10 primary units. Therefore, the degree of freedom is 9.

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility