4.2.2 - Sampling Distribution of the Sample Proportion

The distribution of the sample proportion approximates a normal distribution under the following 2 conditions.

Over the years the values of the conditions have changed. The examples that follow in the remaining lessons will use the first set of conditions at 5, however, you may come across other books or software that may use 10 or 15 for this value.

Book (Minitab)

\(np \geq 5\)
\(n(1−p) \geq 5\)

1990-2000s

\(np \geq 10\)
\(n(1−p) \geq 10\)

Current

\(np \geq 15 \)
\(n(1-p) \geq 15 \)

Sampling Distribution of the Sample Proportion Section

If any set of the two conditions listed above are satisfied, the sampling distribution of the sample proportion is...

approximately normal
with mean, \(\mu=p\)
standard deviation [standard error], \(\sigma=\sqrt{\dfrac{p(1-p)}{n}}\)

Why is this important? This is similar to the notes in the section on the CLT. If the sampling distribution of \(\hat{p}\) is approximately normal, we can convert a sample proportion to a z-score using the following formula:

\(z=\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n}}}\)

We can apply this theory to find probabilities involving sample proportions.

Now we have a basic understanding of the relationship between samples and populations. Ellie will need to use the properties of the sampling distribution to work from the mean from her sample of runners to the larger distribution of all means of all populations of runners, but this does not directly answer her question about the average number of miles all runners run. To do this, she needs to use another related technique called a confidence interval. Calculating a confidence interval will allow Ellie to estimate an interval that is likely to contain the true average number of miles run per week, based on her sample information. Let’s take a closer look at confidence intervals