2.4 - Simple Random Sampling and Other Sampling Methods
2.4 - Simple Random Sampling and Other Sampling MethodsSampling Methods
Sampling Methods can be classified into one of two categories:
- Probability Sampling: Sample has a known probability of being selected
- Non-probability Sampling: Sample does not have known probability of being selected as an inconvenience or voluntary response surveys
Probability Sampling
In probability sampling, it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling:
- Simple Random Sampling (SRS)
- Stratified Sampling
- Cluster Sampling
- Systematic Sampling
- Multistage Sampling (in which some of the methods above are combined in stages)
Of the five methods listed above, students have the most trouble distinguishing between stratified sampling and cluster sampling.
Stratified Sampling is possible when it makes sense to partition the population into groups based on a factor that may influence the variable that is being measured. These groups are then called strata. An individual group is called a stratum. With stratified sampling one should:
- partition the population into groups (strata)
- obtain a simple random sample from each group (stratum)
- collect data on each sampling unit that was randomly sampled from each group (stratum)
Stratified sampling works best when a heterogeneous population is split into fairly homogeneous groups. Under these conditions, stratification generally produces more precise estimates of the population percents than estimates that would be found from a simple random sample. Table 2.2 shows some examples of ways to obtain a stratified sample.
Example 1 | Example 2 | Example 3 | |
---|---|---|---|
Population | All people in the U.S. | All PSU intercollegiate athletes | All elementary students in the local school district |
Groups (Strata) |
4 Time Zones in the U.S. (Eastern, Central, Mountain, Pacific) |
26 PSU intercollegiate teams | 11 different elementary schools in the local school district |
Obtain a Simple Random Sample | 500 people from each of the 4 time zones | 5 athletes from each of the 26 PSU teams | 20 students from each of the 11 elementary schools |
Sample | 4 × 500 = 2000 selected people | 26 × 5 = 130 selected athletes | 11 × 20 = 220 selected students |
Cluster Sampling is very different from Stratified Sampling. With cluster sampling, one should
- divide the population into groups (clusters).
- obtain a simple random sample of so many clusters from all possible clusters.
- obtain data on every sampling unit in each of the randomly selected clusters.
It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only different but also more complicated than that used with stratified sampling.
Example 1 | Example 2 | Example 3 | |
---|---|---|---|
Population | All people in the U.S. | All PSU intercollegiate athletes | All elementary students in a local school district |
Groups (Clusters) | 4 Time Zones in the U.S. (Eastern, Central, Mountain, Pacific.) | 26 PSU intercollegiate teams | 11 different elementary schools in the local school district |
Obtain a Simple Random Sample | 2 time zones from the 4 possible time zones | 8 teams from the 26 possible teams | 4 elementary schools from the l1 possible elementary schools |
Sample | every person in the 2 selected time zones | every athlete on the 8 selected teams | every student in the 4 selected elementary schools |
Each of the three examples that are found in Tables 2.2 and 2.3 was used to illustrate how both stratified and cluster sampling could be accomplished. However, there are obviously times when one sampling method is preferred over the other. The following explanations add some clarification about when to use which method.
- With Example 1: Stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are affected by time zone. For example, the percentage of people watching a live sporting event on television might be highly affected by the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire population. In this case, selecting 2 clusters from 4 possible clusters really does not provide many advantages over simple random sampling.
- With Example 2: Either stratified sampling or cluster sampling could be used. It would depend on what questions are being asked. For instance, consider the question "Do you agree or disagree that you receive adequate attention from the team of doctors at the Sports Medicine Clinic when injured?" The answer to this question would probably not be team dependent, so cluster sampling would be fine. In contrast, if the question of interest is "Do you agree or disagree that weather affects your performance during an athletic event?" The answer to this question would probably be influenced by whether or not the sport is played outside or inside. Consequently, stratified sampling would be preferred.
- With Example 3: Cluster sampling would probably be better than stratified sampling if each individual elementary school appropriately represents the entire population as in a school district where students from throughout the district can attend any school. Stratified sampling could be used if the elementary schools had very different locations and served only their local neighborhood (i.e., one elementary school is located in a rural setting while another elementary school is located in an urban setting.) Again, the questions of interest would affect which sampling method should be used.
The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such "robocall polls" can be very biased because they have extremely low response rates (most people don't like speaking to a machine) and because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they are trying to survey.
Non-probability Sampling
The following sampling methods that are listed in your text are types of non-probability sampling that should be avoided:
- volunteer samples
- haphazard (convenience) samples
Since such non-probability sampling methods are based on human choice rather than random selection, a statistical theory cannot explain how they might behave and potential sources of bias are rampant. In your textbook, the two types of non-probability samples listed above are called "sampling disasters."
Read the article: "How Polls are Conducted" by the Gallup organization available in Canvas.
The article provides great insight into how major polls are conducted. When you are finished reading this article you may want to go to the Gallup Poll Website and see the results from recent Gallup polls. Another excellent source of public opinion polls on a wide variety of topics using solid sampling methodology is the Pew Research Center Website. When you read one of the summary reports on the Pew site, there is a link (in the upper right corner) to the complete report giving more detailed results and a full description of their methodology as well as a link to the actual questionnaire used in the survey so you can judge whether there might be bias in the wording of their survey.
It is important to be mindful of margin or error as discussed in this article. We all need to remember that public opinion on a given topic cannot be appropriately measured with one question that is only asked on one poll. Such results only provide a snapshot at that moment under certain conditions. The concept of repeating procedures over different conditions and times leads to more valuable and durable results. Within this section of the Gallup article, there is also an error: "in 95 out of those 100 polls, his rating would be between 46% and 54%." This should instead say that in an expected 95 out of those 100 polls, the true population percent would be within the confidence interval calculated. In 5 of those surveys, the confidence interval would not contain the population percent.