2.1 - Defining a Common Language for Sampling

The Gallup Global Emotions Survey in 2015 included interviews with more than 147,000 people in 140 countries. In each country Gallup is interested in studying the emotional well-being of the population of adults 18 years old and over. They do this by taking a sample of about 1000 adults in each country using techniques that involve random selection of people for interviews. Luckily, in most countries, they are able to operate quite freely as they conduct interviews in the local language about various positive and negative experiences and feelings that the respondents have. Unfortunately, in a country like Syria, where there was an active war going on in 2015, the security situation made it impossible for Gallup to operate in about a third of the country. The two-thirds of the population that lived in areas without active fighting going on formed the sampling frame for Gallup’s poll in Syria. Gallup published their Global Emotions report about the 2015 interviews in March 2016. In one question they asked people “Did you smile or laugh a lot yesterday?” and about 72% of their respondents worldwide answered yes. However, only about half of that percentage said “yes” in war-torn Syria.

Let’s examine the general framework of the example above and define a common language for the processes used in sample surveys.

It is first necessary to distinguish between a census and a sample survey. A census is an attempt to collect data from every member of the population, while a sample survey is a collection of data from a subset of the population chosen by the researcher. A sample survey is a type of observational study. Obviously, it is much easier to conduct a sample survey than a census. In planning a sample survey, the researcher needs to precisely define the following:

  • Sampling Unit: The individual person, animal, or object that has the measurement (observation) taken on them/it
  • Population: The entire group of individuals or objects that we wish to know something about. A numerical characteristic of the population is called a parameter.
  • Sampling Frame: The list of the sampling units from which those to be contacted for inclusion in the sample is obtained. The sampling frame lies between the population and sample. Ideally, the sampling frame should match the population, but rarely does because the population is not usually small enough to list all members of the population.
  • Sample: Those individuals or objects who provide the data to be collected. Numerical characteristics of the sample are called statistics and are typically used as estimates of population parameters.
Population Undercoverage Sampling frame close to population Bias can result if the sampling frame does not include major parts of the population Having the sampling frame is essentially the same as the population avoids selection bias Sampling frame Sampling frame Sample Sample Population

Figure 2.1 Relationship between Population, Sampling Frame and Sample

Example 2.1. Who are those angry women? Section

(Streitfield, D., 1988 and Wallis, 1987)

Recalling some of the information from Example 1.1 in Lesson 1, in 1987, Shere Hite published a best-selling book called Women and Love: A Cultural Revolution in Progress. This 7-year research project produced a controversial 922-page publication that summarized the results from a survey that was designed to examine how American women felt about their relationships with men. Hite mailed out 100,000 fifteen-page questionnaires to women who were members of a wide variety of organizations across the U.S. Questionnaires were actually sent to the leader of each organization. The leader was asked to distribute questionnaires to all members. Each questionnaire contained 127 open-ended questions with many parts and follow-ups. Part of Hite's directions read as follows: "Feel free to skip around and answer only those questions you choose." Approximately 4500 questionnaires were returned.

In Lesson 1, we determined that the

  • The population was all American women.
  • The sample was the 4,500 women who responded.

It is also easy to identify that the sampling unit was an American woman. So, the key question is "What is the sampling frame?" Some might think that the sampling frame was the 100,000 women who received the questionnaires (that's the intended sample). However, this answer is not correct because the sampling frame was the list from which the 100,000 who were sent the survey was obtained. In this instance, the sampling frame included all American women who had some affiliation with an organization because those are the women that had some possibility of being contacted. If the response rate had been 100%, the sample would have been the 100,000 women who responded to the survey.

You should also remember that ideally, the sampling frame should be as close to the entire population as possible. If this is not possible, the sampling frame should appropriately represent the desired population. In this case, the sampling frame of all American women who were "affiliated with some organization" did not appropriately represent the population of all American women. In Lesson 2, we called this problem selection bias.

This example illustrates three key difficulties that can result in bias in sample surveys:

  1. Using the wrong sampling frame. As discussed above, bias can result when the sampling frame leaves out major portions of the population. This is called undercoverage which is a type of selection bias.
  2. Not reaching the individuals selected. Because the questionnaire was sent to leaders of organizations, there is no guarantee that these questionnaires actually reached the women who were supposed to be in the sample.
  3. Getting a low response rate. In Lesson 1, we learned that this survey has a problem with nonresponse bias because of the low response rate. This problem can create bias if the people who respond have different views than those who do not.

Summary Section

Focusing on these distinctions is meant to help you think carefully about the process of creating a sample so you can identify issues that might arise in interpreting the results of a sample survey.

The process: You want to know about a POPULATION but you only really have access to a SAMPLING FRAME that you can draw an INTENDED SAMPLE from, but in the end, you only get observations from the actual SAMPLE.

When you read about a sample survey, always try to break down the process used into these component parts.

When a report says that a random sample was used, that usually means that the intended sample was randomly selected from the sampling frame. You must then judge whether the sampling frame was really representative of the population and whether the sample was really representative of the intended sample. When you read the methodology used in high-quality sample surveys, you will find that they go to great lengths to make adjustments to avoid bias from these issues. If no such adjustments are made, survey results can be quite misleading.