Lesson 6: Stratified Sampling

Lesson 6: Stratified Sampling


In Section 6.1, we discuss when and why to use stratified sampling. The estimate for mean and total are provided when the sampling scheme is stratified sampling. An example of using stratified sampling to compute the estimates as well as the standard deviation of the estimates is provided. Confidence intervals for these estimates are then discussed.

In Section 6.2, the optimal allocation of sample size under different conditions is given. Then we discuss post-stratification. It is important to note that the variance of estimates under post-stratification is different from under stratification. In Section 6.3, we use an example to illustrate that a stratified sample may not be better than a simple random sample if the variable one stratifies on is not related to the response. At the end of section 6.3, we discuss stratified sampling for proportions.

  Lesson 6:  Ch. 11.1-11.6 of Sampling by Steven Thompson, 3rd edition


Upon completion of this lesson you should be able to:

  1. Identify the appropriate reasons and situations for using stratified sampling,
  2. Estimate mean and total when stratified sampling is used,
  3. Compute confidence interval for the stratified mean and stratified total,
  4. Determine the optimal allocation of sample sizes,
  5. Compute estimates when post-stratification is used,
  6. Compute the variance for the estimates when post-stratification is used, and
  7. Estimate population proportions when stratified sampling is used.

6.1 - How to Use Stratified Sampling

6.1 - How to Use Stratified Sampling

In stratified sampling, the population is partitioned into non-overlapping groups, called strata and a sample is selected by some design within each stratum.

For example, geographical regions can be stratified into similar regions by means of some known variables such as habitat type, elevation, or soil type. Another example might be to determine the proportions of defective products being assembled in a factory. In this case, sampling may be stratified by production lines, factories, etc.

Can you think of a couple of additional examples where stratified sampling would make sense? Look for opportunities when the measurements within the strata are more homogeneous.

The principal reasons for using stratified random sampling rather than simple random sampling include:

  1. Stratification may produce a smaller error of estimation than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are very homogeneous.
  2. The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
  3. Estimates of population parameters may be desired for subgroups of the population. These subgroups should then be identified.

Example 6-1: Average Hours Watching TV Per Week

 Reference p.121 of Scheaffer, Mendenhall, and Ott

An advertising firm, interested in determining how much to emphasize television advertising in a certain county decides to conduct a sample survey to estimate the average number of hours each week that households within that county watch television. The county has two towns, A and B, and a rural area C. Town A is built around a factory and most households contain factory workers with school-aged children. Town B contains mainly retirees and rural area C residents are mainly farmers.

There are 155 households in town A, 62 in town B and 93 in rural area C. The firm decides to select 20 households from Town A, 8 households from Town B, and 12 households from the rural area. The results are given in the following table:

Town A

35, 43, 36, 39, 28, 28, 29, 25, 38, 27,
26, 32, 29, 40, 35, 41, 37, 31, 45, 34
\(N_1\) = 155
Town B 27, 15, 4, 41, 49, 25, 10, 30
\(N_2\) = 62
Rural Area C 8, 14, 12, 15, 30, 32, 21, 20, 34, 7, 11, 24
\(N_3\) = 93

Here is the Minitab output that describes the data from each stratum: ( N in the output denotes numbers of data)

Variable N Mean StDev SE Mean
Town A 20 33.90 5.95 1.33
Town B 8 25.12 15.25 5.39
Rural ar 12 19.00 9.36 2.70

Usually, a sample is selected by some probability design from each of the L strata in the population, with selections in different strata independent of each other. The special case where from each stratum a simple random sample is drawn is called a stratified random sample.

Try it!

Does it make sense to use a stratified random sample for this problem? Why or Why not?
Yes, for all three reasons listed above.


  • L = the number of strata
  • Nh = number of units in each stratum h
  • nh = the number of samples taken from stratum h
  • N = the total number of units in the population, i.e., N1 + N2 + ... + NL

For our "Watching TV" example the following values are:

L = 3, \(N_1\) = 155, \(N_2\) = 62, \(N_3\) = 93, N = 155 + 62 + 93 = 310

Estimating the Population Total

\(\hat{\tau}_{st}=\sum\limits_{h=1}^L \hat{\tau}_h\)

The total is from each stratum added up where \(\hat{\tau}_h\) is an unbiased estimator for \(\tau_h\).

Since selections in a different strata are independent, the variance is:

\(Var(\hat{\tau}_{st})=\sum\limits_{h=1}^L Var(\hat{\tau}_h)\), and

\(\hat{V}ar(\hat{\tau}_{st})=\sum\limits_{h=1}^L \hat{V}ar(\hat{\tau}_h)\)

The formula is computed differently according to the sampling scheme within each stratum. For stratified random sampling, i.e., take a random sample within each stratum:

\(\hat{\tau}_h=N_h \bar{y}_h\)

\(\hat{V}ar(\hat{\tau}_{st})=\sum\limits_{h=1}^L N_h \cdot (N_h-n_h)\cdot \dfrac{s^2_h}{n_h}\)


You can see that this turns out pretty easy to remember, and one can easily obtain the estimates for the population mean.


For stratified random sampling:

\(\bar{y}_{st}=\dfrac{1}{N} \sum\limits_{h=1}^L N_h \bar{y}_h\)

\(\hat{V}ar(\bar{y}_{st})=\sum\limits_{h=1}^L \left(\dfrac{N_h}{N}\right)^2 \left(\dfrac{N_h-n_h}{N_h}\right) \dfrac{s^2_h}{n_h}\)

\(s_h\) is the sample standard deviation of h stratum as given in Minitab.

Try it!

Consider the Average Hours Watching TV example. Estimate the overall mean and variance of the estimator of mean for this example. Also, estimate the total and the variance of the estimator of the total for this example.

\bar{y}_{st} &=\dfrac{1}{N}(N_1\bar{y}_1+N_2\bar{y}_2+N_3\bar{y}_3)\\
&= \dfrac{1}{155+62+93} [(155 \times 33.9)+ (62 \times 25.12)+(93 \times 19.0)]\\
&= 27.7\\

\hat{V}ar(\bar{y}_{st}) &=\sum\limits_{h=1}^3 \left(\dfrac{N_h}{N}\right)^2 \left(\dfrac{N_h-n_h}{N_h}\right) \dfrac{s^2_h}{n_h}\\
&=\dfrac{1}{(310)^2}\left[\left((155)^2\cdot \dfrac{(155-20)}{155}\cdot \dfrac{(5.95)^2}{20}\right)+\left((62)^2\cdot \dfrac{(62-8)}{62}\cdot \dfrac{(15.25)^2}{8}\right) \right.\\
&\left.+\left((93)^2\cdot \dfrac{(93-12)}{93}\cdot \dfrac{(9.36)^2}{12}\right)\right]\\
&= 1.97\\

For the total hours watching TV example:

\(\hat{\tau}_{st}=N\cdot \bar{y}_{st}=310 \times 27.7=8587\)

\hat{V}ar(\hat{\tau}_{st})&= N^2 \hat{V}ar(\bar{y}_{st})\\
&= (310)^2 \times 1.97=189317\\

Confidence Intervals

When all of the stratum sizes are small, an approximate 100(1-\(\alpha\))% CI for \(\tau\) is:

\(\hat{\tau}_{st} \pm t\sqrt{\hat{V}ar(\hat{\tau}_{st})}\)

However, when the stratum sample sizes are at least 30, use z to approximate t.

What are the degrees of freedom for the t used in this formula for the confidence interval? Intuitively we would want this to be, (\(n_1-1)+(n_2-1)+...+(n_L-1)\), and this is correct when the variances of all strata are all the same. But when this is not the case and we can not pool the degrees of freedom, we will need to use the Satterwaithe approximation for the degrees of freedom as follows:

\(d=\left(\sum\limits_{h=1}^L a_h s^2_h\right)^2/\sum\limits_{h=1}^L \dfrac{(a_h s^2_h)^2}{(n_h-1)}\)

where, \(a_h=\dfrac{N_h(N_h-n_h)}{n_h}\)

In particular, when \(N_h\) are all equal, \(n_h\) are all equal and \(s^2_h\) are all equal , the d.f. = n - L.

For the TV example:




d&= \dfrac{(a_1s^2_1+a_2s^2_2+a_3s^2_3)^2}{\dfrac{(a_1s^2_1)^2}{n_1-1}+\dfrac{(a_2s^2_2)^2}{n_2-1}+\dfrac{(a_3s^2_3)^2}{n_3-1}}\\
&= \dfrac{(1046.5\cdot(5.95)^2+418.5\cdot(15.25)^2+627.75\cdot(9.36)^2)^2}{\dfrac{(1046.5\cdot(5.95)^2)^2}{20-1}+\dfrac{(418.5\cdot(15.25)^2)^2}{8-1}+\dfrac{(627.75\cdot(9.36)^2)^2}{12-1}}\\

Try it!

Provide a 95% CI for \(\mu\) and also a 95% CI for \(\tau\).

We will use t with   df=21, hence a 95% CI for \(\mu\) is:

\(\bar{y}_{st} \pm t\sqrt{\hat{V}ar(\bar{y}_{st})}\)
   & = & 27.7 \pm 2.08 \times \sqrt{1.97} \\
   & = & 27.7 \pm 2.91

Similarly, a 95% CI for \(\tau\) is:

\(\hat{\tau}_{st} \pm t\sqrt{\hat{V}ar(\hat{\tau}_{st})}\)
   & = & 8587 \pm 2.08 \times \sqrt{189278.56} \\
   & = & 8587 \pm 902.32

Using R

Here is the code for R for this example:

Datafile:  TVhour.txt
R code:  Chapter6_TVhour.R.txt

6.2 - The Stratification Principle

6.2 - The Stratification Principle

The Stratification Principle

If your only objective of stratification is to produce estimators with small variances, then we want to stratify such that within each stratum, the units are as similar as possible. In a survey of the human population, stratification may be based on socioeconomic factors or geographic regions.

For example, to estimate the average starting income for recent Penn State graduates, it would make sense to stratify by the department since the starting income for graduates of the same department would be similar.

Allocation in Stratified Random Sampling

The question is, given a total sample size of n, how do we allocate these among L strata?

Try it!

If our objective is to use an allocation that gives us a specified amount of information at minimum cost, then the best allocation scheme is affected by what three factors?

The best allocation scheme is affected by the following three factors:

  1. the total number of elements in each stratum,
  2. the variability of the measurements within each stratum, and
  3. the cost associated with obtaining an observation from each stratum.

If we don't have all this information, but we know the total number, we can use a simplistic allocation. This is a proportional allocation that will maintain a steady sampling fraction throughout the population.

\(n_h=\dfrac{n\cdot N_h}{N}\)

This does not take into consideration the variability within each stratum and is not the optimal choice.

If the cost of sampling from each stratum is the same, then the optimal allocation (the allocation with the lowest variances) is:

\(n_h=\dfrac{n \cdot N_h \sigma_h}{\sum\limits_{k=1}^L N_k \sigma_k}\)

 read text section 11.8 for proof

However, if the cost of sampling differs from stratum to stratum and the total cost is:


where \(c_0\) is the overhead cost, \(c_h\) is the cost per unit for stratum h. The optimal allocation is:

\(n_h=\dfrac{(c-c_0)N_h \sigma_h/\sqrt{c_h}}{\sum\limits_{k=1}^L N_k \sigma_k \sqrt{c_k}}\)


  1. the sample size is directly proportional to \(N_h\) and \(\sigma_h\), i.e., allocate a larger sample size to the larger and more variable stratum.
  2. the sample size is inversely proportional to \(\sqrt{c_h}\), i.e., this allocates smaller sample sizes to the more expensive stratum.

In order to use the optimal allocation, one must be able to estimate σh

Let's take a look at this in the context of the TV Example...

Try it!

For the Average Hours Watching TV Example, if before the advertising the firm conducts the survey they have already estimated that \(\sigma_1=5, \sigma_2=15, \sigma_3=10\). Now, if the cost of obtaining an observation is about the same for the three areas, (e.g., telephone interview), then what is the optimal allocation if they want to sample 40 households?

Optimal allocation:

\(n_h=\dfrac{n \cdot N_h \sigma_h}{\sum\limits_{k=1}^L N_k \sigma_k}\)

\(N_1=155, \sigma_1=5\)
\(N_2=62, \sigma_2=15\)
\(N_3=93, \sigma_3=10\)


\(n_1=\dfrac{40 \times 155 \times 5}{155 \times 5+62 \times 15+93 \times 10}=11.7647\)

\(n_2=\dfrac{40 \times 62 \times 15}{155 \times 5+62 \times 15+93 \times 10}=14.1176\)

\(n_3=\dfrac{40 \times 93 \times 10}{155 \times 5+62 \times 15+93 \times 10}=14.1177\)

Thus we will choose \(n_1=12, n_2=14\) and \(n_3=14\).

Remember, it is important that \(n_1+n_2+n_3=40\) in this case.

6.3 - Poststratification and further topics on stratification

6.3 - Poststratification and further topics on stratification

Sometimes, we would like to stratify on a key variable but cannot place the units into their correct strata until the units are sampled. For instance, in a telephone interview, the respondents can not be placed into a male or female stratum until after the respondent is contacted.

Poststratification (stratification after the sample has been selected by simple random sampling) is often appropriate when a simple random sample is not properly balanced by the representation.

Here is an example. We want to estimate the average weight and take a simple random sample of 100 people. Here is what was obtained.

Male Female
\(n_1=20\) \(n_2=80\)
\(\bar{y}_1=180\) lbs. \(\bar{y}_2=120\) lbs.

\(\bar{y}\) = the overall sample mean = 132

This is obviously not balanced with respect to gender. This is likely an underestimate due to the underrepresentation of males in the data. How can we account for this?

In the population \(\dfrac{N_1}{N}=0.5\) and \(\dfrac{N_2}{N}=0.5\).


\bar{y}_{st} &= 0.5\cdot \bar{y}_1+0.5 \cdot \bar{y}_2\\
&= \dfrac{N_1}{N} \bar{y}_1+\dfrac{N_2}{N} \bar{y}_2=150\\

The poststratification estimator \(\bar{y}_{st}\) will not have the same variance as the stratified sample mean since the sample sizes \(n_h\) are random. Thus, the variance of the poststratification \(\bar{y}_{st}\) is the sum of the variance of the stratum. \(\bar{y}_{st}\) under the proportional allocation: \(nN_h/N\) and a term that shows the amount of increase one expects from the post- rather than the pre-stratification.

\(Var(\text{post}-\text{stratified }\bar{y}) \approx \dfrac{N-n}{nN}\sum\limits_{h=1}^L \left(\dfrac{N_h}{N}\right)\sigma^2_h + \dfrac{1}{n^2}\left(\dfrac{N-n}{N-1}\right)\sum\limits_{h=1}^L \dfrac{N-N_h}{N}\sigma^2_h\)

Example 6-2: Account Receivable

A firm knows that 40% of its accounts receivable are wholesale and 60% are retail. However, to identify an account without pulling a file and looking at it is difficult. An auditor randomly sampled 100 accounts without replacement. Here are the results of his sampling:

Wholesale Retail
\(n_1=70\) \(n_2=30\)
\(\bar{y}_1=520\) \(\bar{y}_2=280\)
\(s_1=210\) \(s_2=90\)

Try it!

Compute the post-stratified mean and the variance of the post-stratified mean.

\bar{y}_{st} &= \dfrac{N_1}{N} \bar{y}_1+\dfrac{N_2}{N} \bar{y}_2\\
&= 0.4\times 520+0.6 \times 280\\
&= 376\\

Given the firm has many, many accounts receivable we can ignore the finite correction factor.

\hat{V}ar(\text{post}-\text{stratified }\bar{y}) & \approx \dfrac{1}{n}\left(\dfrac{N_1}{N}s^2_1+\dfrac{N_2}{N}s^2_2\right)+\dfrac{1}{n^2}\left[\left(1-\dfrac{N_1}{N}\right) s^2_1 + \left(1-\dfrac{N_2}{N}\right) s^2_2 \right]\\
&= \dfrac{1}{100}[0.4 \times (210)^2+ 0.6 \times (90)^2]+ \dfrac{1}{100^2}[0.6 \times (210)^2+ 0.4 \times (90)^2]\\
&= 225+2.97\\
&= 227.97\\

Note! Further Topic on Stratified Sampling

It is not true that stratified random sampling always produces an estimator with a smaller variance than that from simple random sampling.

Example 6-3: Students Weights

The principal of a Prep school for boys wants to estimate the average weight of the 7th-grade boys in the school. There are 4 classes, 24 students in class 1, 36 in class 2, 30 students in class 3, and 30 in class 4.

For administrative ease, he decides to use stratified sampling with each class as a stratum. The principal has enough time and money to obtain data for 20 students, and because the cost of sampling is the same in each stratum, he decides to use proportional allocation, which gives \(n_1=4, n_2=6, n_3=5\) and \(n_4=5\). The data (in lbs.) is given in the following table:

Weight of the student (in lbs.)
Class 1 94, 90, 102, 110
Class 2 91, 99, 93, 105, 111, 101
Class 3 108, 96, 100, 93, 93
Class 4 92, 110, 94, 91, 113

Here is the Minitab output that describes the data from each stratum:

Variable N Mean StDev SE Mean
Class 1 4 99.00 8.87 4.43
Class 2 6 100.00 7.46 3.04
Class 3 5 98.00 6.28 2.81
Class 4 5 100.00 10.61 4.74
All 20 99.30 7.73 1.73

Try it!

Calculate the stratified estimator \(\bar{y}_{st}\) and the variance of \(\bar{y}_{st}\).

To estimate the average weight of the 7th-grade boys, using the Minitab output:

\(\bar{y}_{st}=\sum\limits_{h=1}^L \dfrac{N_h}{N}\bar{y}_h=99.3\)

\hat{V}ar(\bar{y}_{st}) &= \dfrac{1}{N^2}\sum\limits_{i=1}^4 N^2_i \left(\dfrac{N_i-n_i}{N_i}\right)\dfrac{s^2_i}{n_i}\\
&= \dfrac{1}{120^2}\left[\left((24)^2\cdot \dfrac{5}{6} \cdot \dfrac{(8.87)^2}{4}\right)+\left((36)^2\cdot \dfrac{5}{6} \cdot \dfrac{(7.46)^2}{6}\right) \right.\\
&\left.+\left((30)^2\cdot \dfrac{5}{6} \cdot \dfrac{(6.28)^2}{5}\right)+\left((30)^2\cdot \dfrac{5}{6} \cdot \dfrac{(10.61)^2}{5}\right)\right]\\
&= 2.93\\

For a 95% CI, we need to compute Satterwaithe's formula to get the degree of freedom:

\(d=\dfrac{\left(\sum\limits_{h=1}^L a_h s^2_h \right)^2}{\sum\limits_{h=1}^L \dfrac{(a_h s^2_h)^2}{n_h-1}}\)



Plug in the formula and we get that d = 13.7576.

Round it down to 13, to be more conservative, and use d.f. = 13.

Then, an approximate 95% CI is:

\(99.3 \pm 2.160\sqrt{2.93}\)
\(=99.3 \pm 3.697\)

Looking back at the data, if we had used simple random sampling, would our CI have been tighter or looser?

Usually, the stratified random sampling will overall perform better because we usually use stratified random sampling when the stratum is more homogeneous.

There is no reason that the classes are more homogeneous in weight, and therefore there is no reason why this stratified random sampling is any better than simple random sampling.

Try it!

Find a 95% CI for the population mean based on the sample mean. Is it wider or narrow than that based on the stratified estimate?

\hat{V}ar(\bar{y})&= \left(\dfrac{N-n}{N}\right) \left(\dfrac{s^2}{n}\right)\\
&= \left(\dfrac{120-20}{120}\right) \left(\dfrac{(7.73)^2}{20}\right)\\
&= 2.49\\

Then an approximate 95% CI is: df = 19

\(99.3 \pm 2.093\sqrt{2.49}\)
\(=99.3 \pm 3.30\)

Thus the margin of error is smaller and the confidence interval narrower.

Since the data had been collected by stratified sampling, the above method treating it as srs is the wrong way to compute the variance for this problem. How the variance is computed depends on the method by which the sample was taken. We did the computation just to show that if hypothetically, the data was collected by s.r.s. with the data turning out to be as shown (for illustration's sake), then the margin of error will be smaller.

Moral of this example:

Stratifying on class, which is not related to weight, does not result in smaller variances within the strata. On the other hand, if stratification had other purposes such as to estimate the parameters of each subgroup, it still makes sense to stratify, though the purpose is not to get estimates with smaller variance. For this particular example, the stratification to estimate the average weight for each class may be relevant.

Stratified sampling to estimate population proportion

\(\hat{p}_{st}=\dfrac{1}{N}\sum\limits_{h=1}^L N_h \hat{p}_h\)

\hat{V}ar(\hat{p}_{st})&= \dfrac{1}{N^2}\sum\limits_{h=1}^L N^2_h \hat{V}ar(\hat{p}_h)\\
&= \dfrac{1}{N^2}\sum\limits_{h=1}^L N^2_h \left(\dfrac{N_h-n_h}{N_h}\right)\cdot \dfrac{\hat{p}_h(1-\hat{p}_h)}{n_h-1}\\

Example 6-4: TV Show Viewership

The advertising firm wants to estimate the proportion of households in the county that view the television show "American Idol".

\(N_1=155,N_2=62, N_3=93\). As before, we stratify by town and the sample results are:

Stratum Sample Size \(\hat{p}_h\)
Town A \(n_1=20\) 16/20 = 0.80
Town B \(n_2=8\) 2/8 = 0.25
Rural Area C \(n_3=12\) 6/12 = 0.50

We plug in the values and we can get the following:

Try it!

Compute the estimator for the population proportion.

\hat{p}_{st}&=\dfrac{1}{N}\sum\limits_{h=1}^L N_h \hat{p}_h\\
&= \dfrac{155}{310}\cdot 0.8 +\dfrac{62}{310}\cdot 0.25+\dfrac{93}{310}\cdot 0.5\\
&= 0.6\\

The following display the estimated variance for each stratum:

\hat{V}ar(\hat{p}_1)&= \left(\dfrac{N_1-n_1}{N_1}\right)\cdot \dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1-1}\\
&= \left(\dfrac{155-20}{155}\right)\cdot \dfrac{0.8(0.2)}{19}\\
&= 0.007\\

\hat{V}ar(\hat{p}_2)&= \left(\dfrac{N_2-n_2}{N_2}\right)\cdot \dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2-1}\\
&= \left(\dfrac{62-8}{62}\right)\cdot \dfrac{0.25(0.75)}{7}\\
&= 0.024\\

\hat{V}ar(\hat{p}_3)&= \left(\dfrac{N_3-n_3}{N_3}\right)\cdot \dfrac{\hat{p}_3(1-\hat{p}_3)}{n_3-1}\\
&= \left(\dfrac{93-12}{93}\right)\cdot \dfrac{0.5(0.5)}{11}\\
&= 0.02\\

Try it!

Compute the estimated variance of the stratified proportion.

\hat{V}ar(\hat{p}_{st})&= \dfrac{1}{(310)^2}[(155)^2(0.007)+(62)^2(0.024)+(93)^2(0.02)]\\
&= 0.0045\\

Has Tooltip/Popover
 Toggleable Visibility