8: The Diversity of Samples

Lesson Overview Section

The Law of Large Numbers (section 7.2) tells us that the results of a chance process repeated over-and-over again independently are unpredictable in the short-run but have a regular and a predictable pattern in the long-run. Averages and proportions from a random sample tend to hone in on their corresponding population values. More stability is associated with more trials and more volatility is associated with fewer trials.

The result is not surprising. If we think about the Expected Value of a measurement as the long-run average value, then it makes sense that we should get closer to that long-run average as we get closer to the idealized "long run".

Example 8.1 Section

According to the most recent census, 38.4% of the adults over 25 years old in Centre County Pennsylvania have a Bachelor's Degree. A survey of Centre County residents is about to be taken to examine their opinions about the "student loan crisis" and the importance of a college education. The researcher's plans call for a random sample to be taken in the hopes that the sample will be representative of the Centre County population. For example, the researcher hopes the proportion of people over 25 years old in her sample with a Bachelor's degree will come close to the known proportion of 0.384 in the population. Since the survey method is unbiased, the sample proportion is expected to come out around the population value give or take a random error.

sample proportion = population proportion + random error

This lesson studies the random error that tells us how far off the sample is from the population. A random sample makes all possible samples of size n equally likely.

  • n = 1: If n = 1 in Example 8.1, we would just pick one person randomly from the population and the possible proportions we might get are just 0 (if the person we pick doesn't have a Bachelor's degree) or 1 (if the person we pick has a degree). A histogram of the proportions we might get from all possible samples of size n = 1 would look like Figure 8.1A.
  • n = 10: Similarly, if we use a sample of size 10, then the possible sample proportions we might get are 0 (if none of the ten have a degree) or 0.1 (if one of the ten has a degree) or 0.2 (if 2 of the ten have degrees), etc... A histogram of all of the different sample proportions we might get with n = 10 is given in Figure 8.1B.
  • n = 100: Finally, a histogram of the sample proportions we might get with all possible samples of size 100 is given in Figure 8.1C.
0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.25 0.35 0.45 0.55 A Proportion (n=1) B Proportion (n=10) C Proportion (n=100)

Figure 8.1 Probability Histograms of all possible proportions when n=1, 10, or 100 in Example 8.1

With n = 1, the sample proportions go all the way from 0 to 1. With n = 10, the preponderance of sample proportions is between 0.2 and 0.6. With n = 100, the bulk of the proportions are between 0.33 and 0.43. As the sample size grows there is less variation in the proportions we might get - that's the Law of Large Numbers. But there is another pattern emerging in Figure 8.1. As the sample size grows, the histogram of the possible sample proportions takes a familiar shape - with large samples it looks like the normal curve!

Objectives

After successfully completing this lesson, you should be able to:

  • Identify and avoid the gambler's fallacy.
  • Understand the concept of a sampling distribution and how it relates the population parameter to the sample statistic.
  • Apply the normal approximation to sample proportions and means.
  • Examine real-world problems and decide when the normal approximation does and does not apply.