What happens when the sample comes from a population that is not normally distributed? This is where the Central Limit Theorem (CLT) comes in.
Central Limit Theorem
For a large sample size (we will explain this later), \(\bar{x}\) is approximately normally distributed, regardless of the distribution of the population one samples from. If the population has mean \(\mu\) and standard deviation \(\sigma\), then the distribution of \(\bar{x}\) has mean \(\mu\) and standard deviation \(\dfrac{\sigma}{\sqrt{n}}\).
We should stop here to break down what this theorem is saying because the Central Limit Theorem is very powerful!
The Central Limit Theorem applies to a sample mean from any distribution. We could have a left-skewed or a right-skewed distribution. As long as the sample size is large, the distribution of the sample means will follow an approximate Normal distribution.
For the purposes of this course, a sample size of \(n>30\) is considered a large sample.
For many people just learning statistics there is a "so what" thought about the CLT. Why is this important and why do I care? If you recall, when we introduced the idea of Z scores we did so with the caveat that the distribution was normal. We take the observed data, that is normally distributed, and convert the data to z scores creating a standard normal distribution. We then leveraged this distribution to find percentiles (and will in future units leverage this to find probabilities.
The CLT allows us to assume a distribution IS normal as long as the sample size is greater than 30 observations. With this, we can apply most of our inferential statistics without having to compensate for non-normal distributions. This will take on greater relevance as we move through the course.
Sampling Distribution of the Sample Mean Section
With the Central Limit Theorem, we can finally define the sampling distribution of the sample mean.
Sampling Distribution of the Sample Mean
The sampling distribution of the sample mean will have:
- the same mean as the population mean, \(\mu\)
- Standard deviation [standard error] of \(\dfrac{\sigma}{\sqrt{n}}\)
It will be Normal (or approximately Normal) if either of these conditions is satisfied
- The population distribution is Normal
- The sample size is large (greater than 30).