1.1.2 - Strategies for Collecting Data

How can we get data? How do we select observations or measurements for a study?

There are two types of methods for collecting data, non-probability methods and probability methods.

Non-probability Methods

These might include:

Convenience sampling (haphazard): Collecting data from subjects who are conveniently obtained.
  • Example: surveying students as they pass by in the university's student union building.
Gathering volunteers: Collecting data from subjects who volunteer to provide data.
  • Example: using an advertisement in a magazine or on a website inviting people to complete a form or participate in a study.
Probability Methods
  • Simple random sample: making selections from a population where each subject in the population has an equal chance of being selected.
  • Stratified random sample: where you have first identified the population of interest, you then divide this population into strata or groups based on some characteristic (e.g. sex, geographic region), then perform simple random sample from each strata.
  • Cluster sample: where a random cluster of subjects is taken from the population of interest. For instance, if we were to estimate the average salary for faculty members at Penn State - University Park Campus, we could take a simple random sample of departments and find the salary of each faculty member within the sampled department. This would be our cluster sample.

There are advantages and disadvantages to both types of methods. Non-probability methods are often easier and cheaper to facilitate. When non-probability methods are used it is often the case that the sample is not representative of the population. If it is not representative, you can make generalizations only about the sample, not the population. The primary benefit of using probability sampling methods is the ability to make inference. We can assume that by using random sampling we attain a representative sample of the population The results can be “extended” or “generalized” to the population from which the sample came.

Example 1-1: Survey Methods Section

Airplane cabin

Airline Company Survey of Passengers

Let's say that you are the owner of a large airline company and you live in Los Angeles. You want to survey your L.A. passengers on what they like and dislike about traveling on your airline. For each of the methods, determine if a non-probability method or a probability method is used. Then determine the type of sampling.

  1. Since you live in L.A. you go to the airport and just interview passengers as they approach your ticket counter.
    Non-probability method; convenience sampling.
  2. You have your ticket counter personnel distribute a questionnaire to each passenger requesting they complete the survey and return it at end of the flight.
    Non-probability methods; Volunteer sampling
  3. You randomly select a set of passengers flying on your airline and question those that you have selected.
    Probability method; Simple random sampling
  4. You group your passengers by the class they fly (first, business, economy), and then take a random sample from each of these groups.
    Probability method: Stratified sampling
  5. You group your passengers by the class they fly (first, business, economy) and randomly select such classes from various flights and survey each passenger in that class and flight selected.
    Probability method; Cluster sampling

Think About it! Section

In predicting the 2008 Iowa Caucus results a phone survey said that Hillary Clinton would win, but instead, Obama won. Where did they go wrong?

The survey was based on landline phones, which was skewed to older people who tended to support Hillary. However, lots of younger people got involved in this election and voted for Obama. The younger people could only be reached by cell phone.

Looking Ahead Section

Students interested in pursuing topics related to sampling might explore STAT 506: Sampling Theory. STAT 506 covers sampling design and analysis methods that are useful for research and management in many fields. A well-designed sampling procedure ensures that we can summarize and analyze data with a minimum of assumptions and complications.