4.2 - Sampling Distribution of the Sample Proportion

4.2 - Sampling Distribution of the Sample Proportion

Before we begin, let’s make sure we review the terms and notation associated with proportions:

  • \(p\) is the population proportion. It is a fixed value.
  • \(n\) is the size of the random sample.
  • \(\hat{p}\) is the sample proportion. It varies based on the sample.

The following example will illustrate how to find the sampling distribution for an example where the population is small.

Sample Proportions with a Small Population: Favorite Color

Decorative banner image of colored pencils.

In a particular family, there are five children. Their names are Alex (A), Betina (B), Carly (C), Debbie (D), and Edward (E). The table below shows the child’s name and their favorite color.
 

Name

Alex (A)

Betina (B)

Carly (C)

Debbie (D)

Edward (E)

Color

Green

Blue

Yellow

Purple

Blue

We are interested in the proportion of children in the family who prefer the color blue, and from the table, we can see that \(p = .40\) of the children prefer blue.

Similar to the pumpkin example earlier in the lesson, let's say we didn't know the proportion of children who like blue as their favorite color. We'll use resampling methods to estimate the proportion. Let’s take \(n=2\) repeated samples, taken without replacement. Here are all the possible samples of size \(n=2\) and their respective probabilities of the proportion of children who like blue.

Sample

P(Blue)

Probability

AB

1/2

1/10

AC

0

1/10

AD

0

1/10

AE

1/2

1/10

BC

1/2

1/10

BD

1/2

1/10

BE

1

1/10

CD

0

1/10

CE

1/2

1/10

DE

1/2

1/10

The probability mass function (PMF) is:

P(Blue)

0

1/2

1

Probability

3/10

6/10

1/10

The graph of the PMF:

Sampling Distribution of P(Blue)

Bar graph showing three bars (0 with a length of 0.3, 0.5 with length of 0.5 and 1 with a lenght of 0.1).

0.0 0.1 0.2 0.3 0.4 0.5 1 0.5 0 0.0 0.2 0.4 0.6 0.8 1.0 0.6

The true proportion is \(p=P(Blue)=\frac{2}{5}\). When the sample size is \(n=2\), you can see from the PMF, it is not possible to get a sampling proportion that is equal to the true proportion.

Although not presented in detail here, we could find the sampling distribution for a larger sample size, say \(n=4\). The PMF for n=4 is...

P(Blue)

1/4

1/2

Probability

2/5

3/5

As with the sampling distribution of the sample mean, the sampling distribution of the sample proportion will have sampling error. It is also the case that the larger the sample size, the smaller the spread of the distribution.

Example 4-3 Resampling with StatKey

Using StatKey, we resample a 1000 times from populations that have probabilities of success, 0.1, 0.9, and 0.5 respectively with a sample size of $n=25$. The video shows the resulting distributions.


4.2.1 - Normal Approximation to the Binomial

4.2.1 - Normal Approximation to the Binomial

For the sampling distribution of the sample mean, we learned how to apply the Central Limit Theorem when the underlying distribution is not normal. In this section, we will present how we can apply the Central Limit Theorem to find the sampling distribution of the sample proportion. Let’s start by defining a Bernoulli random variable, \(Y\).

Bernoulli Random Variable \(\boldsymbol{Y}\)

For an experiment that results in a success or a failure , let the random variable equal 1, if there is a success, and 0 if there is a failure. Therefore,

\(f(y)=\begin{cases} 1 & \text{success}\\ 0 & \text{failure}\end{cases}\)

and let \(p\) be the probability of a success.

The Bernoulli random variable is a special case of the Binomial random variable, where the number of trials is equal to one.

Suppose we have, say \(n\), independent trials of this same experiment. Then we would have \(n\) values of \(Y\), namely \(Y_1, Y_2, ...Y_n\).

If we define \(X\) to be the sum of those values, we get...

\(X=\sum_{i=1}^n Y_i\)

\(X\) is then a Binomial random variable with parameters \(n\) and \(p\).

You are probably wondering what this has to do with the sampling distribution of the sample proportion. Well, suppose we have a random sample of size \(n\) from a population and are interested in a particular “success”. Let the probability of success be \(p\). We can label the successes as 1 and the failures as 0. The sample proportion, \(\hat{p}\) would be the sum of all the successes divided by the number in our sample. Therefore,

\(\hat{p}=\dfrac{\sum_{i=1}^n Y_i}{n}=\dfrac{X}{n}\)

In other words, \(\hat{p}\) could be thought of as a mean! If this is the case, we can apply the Central Limit Theorem for large samples!

Therefore, for large samples, the shape of the sampling distribution for $\hat{p}$ will be approximately normal. What about the mean and the standard deviation?

Mean and Standard Deviation [Standard Error] of \(\hat{p}\)

Given X is binomial...

  • The mean of \(\hat{p}\)

    The mean of \(\hat{p}\) would just be \(p\) since the mean of \(X\) is \(\mu=np\) and \(\hat{p}=\dfrac{X}{n}\).

  • The standard deviation [standard error] of \(\hat{p}\)

    The standard error of \(\hat{p}\) is \(\sqrt{\dfrac{p(1-p)}{n}}\) since the standard deviation of \(X\) is \(\sqrt{np(1-p)}\).


4.2.2 - Sampling Distribution of the Sample Proportion

4.2.2 - Sampling Distribution of the Sample Proportion

The distribution of the sample proportion approximates a normal distribution under the following 2 conditions.

Over the years the values of the conditions have changed. The examples that follow in the remaining lessons will use the first set of conditions at 5, however, you may come across other books or software that may use 10 or 15 for this value.


Book (Minitab)
  1. \(np \geq 5\)
  2. \(n(1−p) \geq 5\)
1990-2000s
  1. \(np \geq 10\)
  2. \(n(1−p) \geq   10\)
Current
  1. \(np \geq 15 \)
  2. \(n(1-p) \geq 15 \)

Sampling Distribution of the Sample Proportion

If any set of the two conditions listed above are satisfied, the sampling distribution of the sample proportion is...

  • approximately normal
  • with mean, \(\mu=p\)
  • standard deviation [standard error], \(\sigma=\sqrt{\dfrac{p(1-p)}{n}}\)

If the sampling distribution of \(\hat{p}\) is approximately normal, we can convert a sample proportion to a z-score using the following formula:

\(z=\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n}}}\)

We can apply this theory to find probabilities involving sample proportions.

Example 4-4: iPhone Users

Decorative image of iphone on desk.

Suppose it is known that 43% of Americans own an iPhone. If a random sample of 50 Americans were surveyed, what is the probability that the proportion of the sample who owned an iPhone is between 45% and 50%?

Answer

For this problem, we know $p=0.43$ and $n=50$. First, we should check our conditions for the sampling distribution of the sample proportion.

\(np=50(0.43)=21.5\) and \(n(1-p)=50(1-0.43)=28.5\)  - both are greater than 5.

Since the conditions are satisfied, $\hat{p}$ will have a sampling distribution that is approximately normal with mean \(\mu=0.43\) and standard deviation [standard error] \(\sqrt{\dfrac{0.43(1-0.43)}{50}}\approx 0.07\).

\begin{align} P(0.45<\hat{p}<0.5) &=P\left(\frac{0.45-0.43}{0.07}< \frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}}<\frac{0.5-0.43}{0.07}\right)\\ &\approx P\left(0.286<Z<1\right)\\ &=P(Z<1)-P(Z<0.286)\\ &=0.8413-0.6126\\ &=0.2287\end{align}

Therefore, if the true proportion of American who own an iPhone is 43%, then there would be a 22.87% chance that we would see a sample proportion between 45% and 50% when the sample size is 50.

Try it!

If a random sample of size of seventy five was surveyed, what is the probability we would find more than 50% of Americans with an iPhone?

First, check our conditions: \(np=75(0.43)\) and \(n(1-p)=75(1-0.43)\) are both greater than five. The sampling distribution of the sample proportion is approximately Normal with Mean \(\mu=0.43\) Standard deviation \(\sqrt{\frac{p(1-p)}{n}}=\sqrt{\frac{0.43(1-0.43)}{75}}\approx 0.05717\). \begin{align}P\left(\hat{p}>0.5\right) &=\left(\frac{\hat{p}}{\sqrt{\frac{p(1-p)}{n}}}>\frac{0.5-0.43}{\sqrt{\frac{0.43(1-0.43)}{75}}}\right)\\ &\approx P\left(Z>1.22\right)\\&=1-P(Z<1.22)\\&=1-0.8888\\&=0.1112 \end{align} Therefore, there is a 11.1% chance to get a sample proportion of 50% or higher in a sample size of 75.

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility