2.2 - The Beauty of Sampling

2.2 - The Beauty of Sampling

Many sample surveys are used to estimate the percentage of people in a population that has a certain characteristic or opinion. If you follow the news, you might remember hearing that many of these polls are based on samples of size 1000 to 1500 people. So, why is a sample size of around 1000 people commonly used in surveying? The answer is based on understanding what is called the margin of error.

The margin of error:

  • measures the reliability of the percent or other estimate based on the survey data
  • is smaller when the sample size (n) is larger
  • does not provide information about bias or other errors in a survey

For a sample size of n = 1000, the margin of error for a sample proportion is around \(\frac {1}{\sqrt{n}}=\frac{1}{\sqrt{1000}}≈0.03\) or about 3%. Since other problems inherent in surveys may often cause biases of a percent or two, pollsters often believe that it is not worth the expense to achieve the small improvement in the margin of error that might be gained by increasing the sample size further (see section 3.4).

The margin of error for most sample estimates depends directly on the square root of the size of the sample, \(\sqrt{n}\). For example, if you have four times as many people in your sample, your margin of error will be cut in half and your survey will be twice as reliable. The size of the population does not affect the margin of error. So, a percentage estimated from a sample will have the same margin of error (reliability), regardless of whether the population size is 50,000 or 5 billion. If a survey is conducted using an unbiased methodology then the margin of error tells us directly about the accuracy of the poll at estimating a population parameter.

So what does the margin of error represent?

Interpretation: If one obtains many unbiased samples of the same size from a defined population, the difference between the sample percent and the true population percent will be within the margin of error, at least 95% of the time.

Key Features of the Interpretation of the Margin of Error

  • Even though a pollster obtains only one sample, you should remember that the interpretation of the margin of error is based on what would happen if the survey was conducted repeatedly under identical conditions. The key to statistics is analyzing the quality of the process used to gather data. The margin of error says something about the reliability of that processes.
  • The margin of error represents the largest distance that would occur in most unbiased surveys between the sample percent, which is the percent obtained by the poll, and the true population percent, which is unknown because we have not sampled the entire population.
  • When talking about the margin of error, it is just not possible to say that the difference between the sample percent and the population percent will be within the margin of error for 100% of all possible samples. So, statisticians use the laws of probability to ensure that at least 95% of the time, the difference between the sample percent and the population percent will be within the margin of error.

Example 2.2. Margin of Error and the Gallup Emotions Report

The Gallup Global Emotions Report was released in March 2016 and included the results of surveys Gallup conducted in 140 different countries in 2015 to study the emotional well-being of the populations of each country. For example, Gallup’s survey in Paraguay (population size 7 million) included about 1000 interviews and found the adults in that country to be the happiest in the world with 84% of respondents indicating they had laughed or smiled a lot the day before. On the other hand, Gallup’s survey in Syria (population size 23 million) also included about 1000 interviews and found the adults in that country to be the least happy in the world with only about 36% of respondents indicating they had laughed or smiled a lot the day before. The surveys in both of those two countries had a margin of error of about 3%.

The results of the poll in Paraguay suggest that 84% ± 3% of all Paraguay adults smile or laugh a lot on any given day. What is the correct interpretation of this margin of error?

Margin of Error Interpretation

Assuming the poll in Paraguay used an unbiased procedure, the difference between our sample percent and the true population percent will be within 3%, at least 95% of the time. This means that we are almost certain that 84% ± 3% or (81% to 87%) of all Paraguay adults smile or laugh a lot each day. Because the range of possible values from this poll all fall above 72%, we can also say that we are pretty sure that the rate in Paraguay is above the worldwide average of 72%. If any of the range of possible values would have been 72% or less, then we would not have been able to make that kind of statement with as much certainty. The range of values (81% to 87%) is called a 95% confidence interval. Other levels of confidence, besides 95% may be used - but 95% is the most typical. We will go into further detail about confidence intervals in Lesson 9. Importantly the poll in Syria also had a margin of error of about 3% despite that country having a population that is three times larger. However, the interpretation of the margin of error in Syria should also include the reminder that it reflects only the variability due to the randomness in the survey. The margin of error in that survey does not include any information about the likely bias in the Syria poll that resulted from undercoverage due to the fact that security concerns made it impossible for Gallup to have access to a third of the population (see section 2.1).


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility