4.2.1 - Normal Approximation to the Binomial

For the sampling distribution of the sample mean, we learned how to apply the Central Limit Theorem when the underlying distribution is not normal. In this section, we will present how we can apply the Central Limit Theorem to find the sampling distribution of the sample proportion. Remember when we introduced quantitative and categorical data? In this example, we are working with a special type of categorical variable called Bernoulli random variable, $Y$.

A side note for those who are curious: A Bernoulli random variable is a very simple kind of variable. It only has two possible values, 0 and 1 and there is only one trial. This is different from a binomial random variable in that there are repeated independent trails. We will not focus too much on these differences in this course but if you are curious this might be information to have!

Bernoulli Random Variable $\boldsymbol{Y}$

For an experiment that results in a success or a failure , let the random variable equal 1, if there is a success, and 0 if there is a failure. Therefore,

$f(y)=\begin{cases} 1 & \text{success}\\ 0 & \text{failure}\end{cases}$

and let $p$ be the probability of a success.

The Bernoulli random variable is a special case of the Binomial random variable, where the number of trials is equal to one.

Suppose we have, say $n$, independent trials of this same experiment. Then we would have $n$ values of $Y$, namely $Y_1, Y_2, ...Y_n$.

If we define $X$ to be the sum of those values, we get...

$X=\sum_{i=1}^n Y_i$

$X$ is then a Binomial random variable with parameters $n$ and $p$.

You are probably wondering what this has to do with the sampling distribution of the sample proportion. Well, suppose we have a random sample of size $n$ from a population and are interested in a particular “success”. Let the probability of success be $p$. We can label the successes as 1 and the failures as 0. The sample proportion, $\hat{p}$ would be the sum of all the successes divided by the number in our sample. Therefore,

$\hat{p}=\dfrac{\sum_{i=1}^n Y_i}{n}=\dfrac{X}{n}$

In other words, $\hat{p}$ could be thought of as a mean! If this is the case, we can apply the Central Limit Theorem for large samples!

Therefore, for large samples, the shape of the sampling distribution for $\hat{p}$ will be approximately normal. What about the mean and the standard deviation?

Mean and Standard Deviation [Standard Error] of the Sample Proportion, $\hat{p}$

Given X is binomial...

The mean of $\hat{p}$
- The mean of $\hat{p}$ would just be $p$ since the mean of $X$ is $\mu=np$ and $\hat{p}=\dfrac{X}{n}$.
The standard deviation [standard error] of $\hat{p}$
- The standard error of $\hat{p}$ is $\sqrt{\dfrac{p(1-p)}{n}}$ since the standard deviation of $X$ is $\sqrt{np(1-p)}$.