# 8.1 - Systematic Sampling

8.1 - Systematic SamplingSuppose you have a number of students lined up in a row:

1 2 3 4 5 6 7 8 9 10 11 12

Here we might take a sample every 4 elements, or 1 in 4 elements from the population. (1, 5, 9) or (2, 6, 10), etc. There are four primary units: (1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12).

To sample systematically from a field, the following is one example:

1 | 2 | 3 | 4 |

5 | 6 | 7 | 8 |

9 | 10 | 11 | 12 |

13 | 14 | 15 | 16 |

There are four primary units: (1, 3, 9, 11), (2, 4, 10, 12), (5, 7, 13, 15), (6, 8, 14, 16).

How do we draw a 1 in *k* systematic sample?

**Example:** Suppose our population is 9,000 students and we want to sample 1,200 students. How do we sample these students systematically?

Since, 9000/1200 = 7.5, we can perform a 1-in-7 systematic sample. Or, we should sample every 7th student. We can pick a starting point randomly from 1 to 600 and sample every 7th student from that on until we have reached 1200 samples.

How do we estimate the variance of this single systematic sample?

We can not use the formula:

\(s^2_u=\dfrac{1}{n-1}\sum\limits_{i=1}^n (y_i-\bar{y})^2\)

since *n* = 1. Only one primary unit is selected.

If the population is randomly ordered, then there is no problem. We can estimate the variance \(\sigma^2\) by:

\(s^2=\dfrac{\sum\limits_{j=1}^{M_1}(y_{1j}-\bar{y}_1)^2}{M_1-1}\)

However, when the population is **ordered**, the systematic sampling is usually **better** than simple random sampling and the above formula will **overestimate** the variance.

When the population is **periodic**, the systematic sampling may be **worse** than the simple random sampling and the above formula will **underestimate** the variance since if the period *k* is chosen poorly, then the elements sampled may be too similar to each other.

#### Repeated Systematic Sampling

Unless the population is randomly ordered we can't use the naive method to compute variance. [Look in the textbook page 162 for more advanced ways.] Thus, we need more than one primary unit.

## Example 8-1: Repeated systematic sampling of ferry cars

(

see p.247 of Scheaffer, Mendenhall and Ott)A ferry that carries cars across a bay charges a fee by the carload rather than by person. The ferry company wants to estimate the average number of people per car for the month of August. The company knows from last year that 400 cars took the ferry and they want to sample 80 cars. To facilitate the estimation of variance of the systematic sample the investigator chooses to use repeated systematic sampling with 10 samples of 8 cars each. Use the data given in the following table to estimate the average number of persons per car and also provide an estimate of the variance.

How do we obtain the random numbers for the repeated systematic sampling?

We will select 10 repeated samples with 8 samples in each, so we choose 1-in-400/8 = 50. From the values 1 to 50, 10 numbers are selected without replacement and we start from those 10 numbers to get 10 samples of 1-in-50 systematic samples.

The 10 numbers sampled randomly without replacement from 1 to 50 are: 2, 5, 7, 13, 26, 31, 35, 40, 45, 46. In the following table, the car that will be sampled is listed with the number of people per car (the response) in parentheses.

Random starting point |
Second element |
Third element |
Fourth element |
Fifth element |
Sixth element |
Seventh element |
Eighth element |
\(\bar{y}_i\) mean |

2(3) | 52(4) | 102(5) | 152(3) | 202(6) | 252(1) | 302(4) | 352(4) | 3.75 |

5(5) | 55(3) | 105(4) | 155(2) | 205(4) | 255(2) | 305(3) | 355(4) | 3.38 |

7(2) | 57(4) | 107(6) | 157(2) | 207(3) | 257(2) | 307(1) | 357(3) | 2.88 |

13(6) | 63(4) | 113(6) | 163(7) | 213(2) | 263(3) | 313(2) | 363(7) | 4.62 |

26(4) | 76(5) | 126(7) | 176(4) | 226(2) | 276(6) | 326(2) | 376(6) | 4.50 |

31(7) | 81(6) | 131(4) | 181(4) | 231(3) | 281(6) | 331(7) | 381(5) | 5.25 |

35(3) | 85(3) | 135(2) | 185(3) | 135(6) | 285(5) | 235(6) | 385(8) | 4.50 |

40(2) | 90(6) | 140(2) | 190(5) | 240(5) | 290(4) | 340(4) | 390(5) | 4.12 |

45(2) | 95(6) | 145(3) | 195(6) | 245(4) | 295(4) | 345(5) | 395(4) | 4.25 |

46(6) | 96(5) | 146(4) | 196(6) | 246(3) | 296(3) | 346(5) | 396(3) | 4.38 |

#### Try it!

**The total number of primary units***N*= ?**The number of primary units sampled***n*= ?**The number of secondary units in the***i*th primary unit \(M_i\) = ?**The total number of secondary units in the population***M*= ?

- The total number of primary units
*N*= 50 - The number of primary units sampled
*n*= 10 - The number of secondary units in the
*i*th primary unit \(M_i\) = 8 - The total number of secondary units in the population
- \(M=\sum\limits_{i=1}^{50}M_i=400\)

To estimate the population mean \(\mu\) =\(\tau\)/ *M* we can use the unbiased estimator. The estimator is:

\(\hat{\mu}=\dfrac{\hat{\tau}}{M}=\sum\limits_{i=1}^n \dfrac{\bar{y}_i}{n}=4.16\)

\(\text{where } \bar{y}_i=\dfrac{y_i}{M_i}=\dfrac{\sum\limits_{j=1}^{M_i} y_{ij}}{M_i} \text{for }i=1,2,\ldots,n.\)

In this example, \(\overline{M}=M_1=M_2=\ldots=M_n\)

#### Try it!

\begin{align}

\hat{V}ar(\hat{\mu}) &= \dfrac{M-n\cdot \bar{M}}{M}\cdot \dfrac{1}{n(n-1)} \cdot \sum\limits_{i=1}^n (\bar{y}_i-\hat{\mu})^2\\

&= \dfrac{400-10 \cdot 8}{400} \cdot \dfrac{1}{10(9)}\cdot [(3.75-4.16)^2+\cdots+(4.38-4.16)^2]\\

&= 0.0365\\

\end{align}

#### Try it!

*t*, what is the degrees of freedom? {Hint: consider how many primary units do you have and then compute the degree of freedom}