Sometimes, we would like to stratify on a key variable but cannot place the units into their correct strata until the units are sampled. For instance, in a telephone interview, the respondents can not be placed into a male or female stratum until after the respondent is contacted.
Poststratification (stratification after the sample has been selected by simple random sampling) is often appropriate when a simple random sample is not properly balanced by the representation.
Here is an example. We want to estimate the average weight and take a simple random sample of 100 people. Here is what was obtained.
Male  Female 

\(n_1=20\)  \(n_2=80\) 
\(\bar{y}_1=180\) lbs.  \(\bar{y}_2=120\) lbs. 
\(\bar{y}\) = the overall sample mean = 132
This is obviously not balanced with respect to gender. This is likely an underestimate due to the underrepresentation of males in the data. How can we account for this?
In the population \(\dfrac{N_1}{N}=0.5\) and \(\dfrac{N_2}{N}=0.5\).
Thus,
\begin{align}
\bar{y}_{st} &= 0.5\cdot \bar{y}_1+0.5 \cdot \bar{y}_2\\
&= \dfrac{N_1}{N} \bar{y}_1+\dfrac{N_2}{N} \bar{y}_2=150\\
\end{align}
The poststratification estimator \(\bar{y}_{st}\) will not have the same variance as the stratified sample mean since the sample sizes \(n_h\) are random. Thus, the variance of the poststratification \(\bar{y}_{st}\) is the sum of the variance of the stratum. \(\bar{y}_{st}\) under the proportional allocation: \(nN_h/N\) and a term that shows the amount of increase one expects from the post rather than the prestratification.
\(Var(\text{post}\text{stratified }\bar{y}) \approx \dfrac{Nn}{nN}\sum\limits_{h=1}^L \left(\dfrac{N_h}{N}\right)\sigma^2_h + \dfrac{1}{n^2}\left(\dfrac{Nn}{N1}\right)\sum\limits_{h=1}^L \dfrac{NN_h}{N}\sigma^2_h\)
Example 62: Account Receivable Section
A firm knows that 40% of its accounts receivable are wholesale and 60% are retail. However, to identify an account without pulling a file and looking at it is difficult. An auditor randomly sampled 100 accounts without replacement. Here are the results of his sampling:
Wholesale  Retail 

\(n_1=70\)  \(n_2=30\) 
\(\bar{y}_1=520\)  \(\bar{y}_2=280\) 
\(s_1=210\)  \(s_2=90\) 
Try it!
\begin{align}
\bar{y}_{st} &= \dfrac{N_1}{N} \bar{y}_1+\dfrac{N_2}{N} \bar{y}_2\\
&= 0.4\times 520+0.6 \times 280\\
&= 376\\
\end{align}
Given the firm has many, many accounts receivable we can ignore the finite correction factor.
\begin{align}
\hat{V}ar(\text{post}\text{stratified }\bar{y}) & \approx \dfrac{1}{n}\left(\dfrac{N_1}{N}s^2_1+\dfrac{N_2}{N}s^2_2\right)+\dfrac{1}{n^2}\left[\left(1\dfrac{N_1}{N}\right) s^2_1 + \left(1\dfrac{N_2}{N}\right) s^2_2 \right]\\
&= \dfrac{1}{100}[0.4 \times (210)^2+ 0.6 \times (90)^2]+ \dfrac{1}{100^2}[0.6 \times (210)^2+ 0.4 \times (90)^2]\\
&= 225+2.97\\
&= 227.97\\
\end{align}
Note! Further Topic on Stratified Sampling Section
It is not true that stratified random sampling always produces an estimator with a smaller variance than that from simple random sampling.
Example 63: Students Weights Section
The principal of a Prep school for boys wants to estimate the average weight of the 7thgrade boys in the school. There are 4 classes, 24 students in class 1, 36 in class 2, 30 students in class 3, and 30 in class 4.
For administrative ease, he decides to use stratified sampling with each class as a stratum. The principal has enough time and money to obtain data for 20 students, and because the cost of sampling is the same in each stratum, he decides to use proportional allocation, which gives \(n_1=4, n_2=6, n_3=5\) and \(n_4=5\). The data (in lbs.) is given in the following table:
Weight of the student (in lbs.)



Class 1  94, 90, 102, 110 
Class 2  91, 99, 93, 105, 111, 101 
Class 3  108, 96, 100, 93, 93 
Class 4  92, 110, 94, 91, 113 
Here is the Minitab output that describes the data from each stratum:
Variable  N  Mean  StDev  SE Mean 

Class 1  4  99.00  8.87  4.43 
Class 2  6  100.00  7.46  3.04 
Class 3  5  98.00  6.28  2.81 
Class 4  5  100.00  10.61  4.74 
All  20  99.30  7.73  1.73 
Try it!
To estimate the average weight of the 7thgrade boys, using the Minitab output:
\(\bar{y}_{st}=\sum\limits_{h=1}^L \dfrac{N_h}{N}\bar{y}_h=99.3\)
\begin{align}
\hat{V}ar(\bar{y}_{st}) &= \dfrac{1}{N^2}\sum\limits_{i=1}^4 N^2_i \left(\dfrac{N_in_i}{N_i}\right)\dfrac{s^2_i}{n_i}\\
&= \dfrac{1}{120^2}\left[\left((24)^2\cdot \dfrac{5}{6} \cdot \dfrac{(8.87)^2}{4}\right)+\left((36)^2\cdot \dfrac{5}{6} \cdot \dfrac{(7.46)^2}{6}\right) \right.\\
&\left.+\left((30)^2\cdot \dfrac{5}{6} \cdot \dfrac{(6.28)^2}{5}\right)+\left((30)^2\cdot \dfrac{5}{6} \cdot \dfrac{(10.61)^2}{5}\right)\right]\\
&= 2.93\\
\end{align}
For a 95% CI, we need to compute Satterwaithe's formula to get the degree of freedom:
\(d=\dfrac{\left(\sum\limits_{h=1}^L a_h s^2_h \right)^2}{\sum\limits_{h=1}^L \dfrac{(a_h s^2_h)^2}{n_h1}}\)
\(a_h=\dfrac{N_h(N_hn_h)}{n_h}\)
Plug in the formula and we get that d = 13.7576.
Round it down to 13, to be more conservative, and use d.f. = 13.
Then, an approximate 95% CI is:
\(99.3 \pm 2.160\sqrt{2.93}\)
\(=99.3 \pm 3.697\)
Looking back at the data, if we had used simple random sampling, would our CI have been tighter or looser?
Usually, the stratified random sampling will overall perform better because we usually use stratified random sampling when the stratum is more homogeneous.
There is no reason that the classes are more homogeneous in weight, and therefore there is no reason why this stratified random sampling is any better than simple random sampling.
Try it!
\begin{align}
\hat{V}ar(\bar{y})&= \left(\dfrac{Nn}{N}\right) \left(\dfrac{s^2}{n}\right)\\
&= \left(\dfrac{12020}{120}\right) \left(\dfrac{(7.73)^2}{20}\right)\\
&= 2.49\\
\end{align}
Then an approximate 95% CI is: df = 19
\(99.3 \pm 2.093\sqrt{2.49}\)
\(=99.3 \pm 3.30\)
Thus the margin of error is smaller and the confidence interval narrower.
Since the data had been collected by stratified sampling, the above method treating it as srs is the wrong way to compute the variance for this problem. How the variance is computed depends on the method by which the sample was taken. We did the computation just to show that if hypothetically, the data was collected by s.r.s. with the data turning out to be as shown (for illustration's sake), then the margin of error will be smaller.
Moral of this example:
Stratifying on class, which is not related to weight, does not result in smaller variances within the strata. On the other hand, if stratification had other purposes such as to estimate the parameters of each subgroup, it still makes sense to stratify, though the purpose is not to get estimates with smaller variance. For this particular example, the stratification to estimate the average weight for each class may be relevant.
Stratified sampling to estimate population proportion Section
\(\hat{p}_{st}=\dfrac{1}{N}\sum\limits_{h=1}^L N_h \hat{p}_h\)
\begin{align}
\hat{V}ar(\hat{p}_{st})&= \dfrac{1}{N^2}\sum\limits_{h=1}^L N^2_h \hat{V}ar(\hat{p}_h)\\
&= \dfrac{1}{N^2}\sum\limits_{h=1}^L N^2_h \left(\dfrac{N_hn_h}{N_h}\right)\cdot \dfrac{\hat{p}_h(1\hat{p}_h)}{n_h1}\\
\end{align}
Example 64: TV Show Viewership Section
The advertising firm wants to estimate the proportion of households in the county that view the television show "American Idol".
\(N_1=155,N_2=62, N_3=93\). As before, we stratify by town and the sample results are:
Stratum  Sample Size  \(\hat{p}_h\) 

Town A  \(n_1=20\)  16/20 = 0.80 
Town B  \(n_2=8\)  2/8 = 0.25 
Rural Area C  \(n_3=12\)  6/12 = 0.50 
We plug in the values and we can get the following:
Try it!
\begin{align}
\hat{p}_{st}&=\dfrac{1}{N}\sum\limits_{h=1}^L N_h \hat{p}_h\\
&= \dfrac{155}{310}\cdot 0.8 +\dfrac{62}{310}\cdot 0.25+\dfrac{93}{310}\cdot 0.5\\
&= 0.6\\
\end{align}
The following display the estimated variance for each stratum:
\begin{align}
\hat{V}ar(\hat{p}_1)&= \left(\dfrac{N_1n_1}{N_1}\right)\cdot \dfrac{\hat{p}_1(1\hat{p}_1)}{n_11}\\
&= \left(\dfrac{15520}{155}\right)\cdot \dfrac{0.8(0.2)}{19}\\
&= 0.007\\
\end{align}
\begin{align}
\hat{V}ar(\hat{p}_2)&= \left(\dfrac{N_2n_2}{N_2}\right)\cdot \dfrac{\hat{p}_2(1\hat{p}_2)}{n_21}\\
&= \left(\dfrac{628}{62}\right)\cdot \dfrac{0.25(0.75)}{7}\\
&= 0.024\\
\end{align}
\begin{align}
\hat{V}ar(\hat{p}_3)&= \left(\dfrac{N_3n_3}{N_3}\right)\cdot \dfrac{\hat{p}_3(1\hat{p}_3)}{n_31}\\
&= \left(\dfrac{9312}{93}\right)\cdot \dfrac{0.5(0.5)}{11}\\
&= 0.02\\
\end{align}
Try it!
\begin{align}
\hat{V}ar(\hat{p}_{st})&= \dfrac{1}{(310)^2}[(155)^2(0.007)+(62)^2(0.024)+(93)^2(0.02)]\\
&= 0.0045\\
\end{align}