# 1.2 - Samples & Populations

1.2 - Samples & PopulationsWe often have questions concerning large **populations**. Gathering information from the entire population is not always possible due to barriers such as time, accessibility, or cost. Instead of gathering information from the whole population, we often gather information from a smaller subset of the population, known as a **sample**.

Values concerning a sample are referred to as sample **statistics** while values concerning a population are referred to as population **parameters**.

- Population
- The entire set of possible cases

- Sample
- A subset of the population from which data are collected

- Statistic
- A measure concerning a sample (e.g., sample mean)

- Parameter
- A measure concerning a population (e.g., population mean)

The process of using sample statistics to make conclusions about population parameters is known as **inferential statistics**. In other words, data from a sample are used to make an inference about a population.

- Inferential Statistics
- Statistical procedures that use data from an observed sample to make a conclusion about a population

## Example: Student Housing

A survey is carried out at Penn State Altoona to estimate the proportion of all undergraduate students living at home during the current term. Of the 3,838 undergraduate students enrolled at the campus, a random sample of 100 was surveyed.

**Population**: All 3,838 undergraduate students at Penn State Altoona**Sample**: The 100 undergraduate students surveyed

We can use the data collected from the sample of 100 students to make inferences about the population of all 3,838 students.

## Example: Polling Teachers

Educational policy researchers randomly selected 400 teachers at random from the National Science Teachers Association database of members and asked them whether or not they believed that evolution should be taught in public schools. They received responses from 252 teachers.

**Population**: All National Science Teachers Association members**Sample**: The 252 respondents

The researchers can use the data collected from the 252 teachers who responded to the survey to make inferences about the population of all National Science Teachers Association members.

## Example: Flipping a Coin

A fair coin is flipped 500 times and the number of heads is recorded.

**Population**: All flips of this coin**Sample**: The 500 flips recorded in this study

We can use data from these 500 flips to make inferences about the population of all flips of this coin.

# 1.2.1 - Sampling Bias

1.2.1 - Sampling BiasRecall the entire group of individuals of interest is called the population. It may be unrealistic or even impossible to gather data from the entire population. The subset of the population from which data are actually gathered is the sample. A sample should be selected from a population randomly, otherwise it may be prone to **bias**. Our goal is to obtain a sample that is **representative **of the population.

- Representative Sample
- A subset of the population from which data are collected that accurately reflects the population

- Bias
- The systematic favoring of certain outcomes

- Sampling Bias
- Systematic favoring of certain outcomes due to the methods employed to obtain the sample

## Example: Weight Loss Study Volunteers

A medical research center is testing a new weight loss treatment. They advertise on a social media site that they are looking for volunteers to participate. There is **sampling bias** because the sample will be limited to people who use the social media site where they advertised. The individuals who choose to participate may be different from the overall population. For example, volunteers may be individuals who are already actively trying to lose weight. This is **not a** **representative sample **because the sample may have characteristics that are different from the population of interest.

## Example: NYC Advertising Study

The marketing department for a large retail chain wants to survey their customers about a new advertising plan. They go into one of their largest New York City stores on a Tuesday morning and survey the first 50 people who make a purchase. There is **sampling bias** for a number of reasons. They are only sampling at one store, in New York City; there may be differences between the customers at this store and those that shop at their other locations. By conducting their survey on a Tuesday morning they are limiting themselves to individuals who are out shopping at that time; the sample may lack people who work during the day. Finally, they only survey people who make a purchase; individuals who do not make a purchase, perhaps because they are not satisfied with the store, will not be included in their sample. This is **not a** **representative sample **because the sample selected may be different from the population of interest.

# 1.2.2 - Sampling Methods

1.2.2 - Sampling MethodsThere are many different ways to select a sample from a population. Some of these methods are probability-based, such as the **simple random sampling **method, which you'll read about below and in your textbook. Other probability-based methods include *cluster sampling* methods and *stratified sampling *methods. You may learn more about these if you take a research methods course or an advanced statistics course in the future. Other sampling methods are not probability-based, such as **convenience sampling **methods, which you will read about below.

## Simple Random Sampling

To prevent sampling bias and obtain a representative sample, a sample should be selected using a probability-based sampling design which gives each individual a known chance of being selected. The most common probability-based sampling method is the **simple random sampling method.**

Using this method, a sample is selected without replacement. This means that once an individual has been selected to be a part of the sample they cannot be selected a second time. If multiple samples are being taken (e.g., when constructing a sampling distribution in Lesson 4), an individual can appear in more than one sample, but only once in each sample.

- Simple Random Sampling
- A method of obtaining a sample from a population in which every member of the population has an equal chance of being selected

## Example: Community Service Attitudes

An institutional researcher is conducting a study of World Campus students’ attitudes toward community service. He takes a list of all 12,242 World Campus students and uses a random number generator to select 30 students whom he contacts to complete the survey. This researcher used **simple random sampling** because participants were selected from the overall population in a way that each individual had an equal chance of being selected.

## Example: Languages

A student wants to learn more about the languages spoken in her town. She has access to the census forms submitted by all 3,500 households in her town. It would take too long for her to go through all 3,500 forms, so she** **uses a random number generator to select 100 households. She finds those 100 census forms and records data concerning the languages spoken in those households. This is a **simple random sample** because the sample of 100 households was selected in a way that each of the 3,500 households had an equal chance of being selected.

## Convenience Sampling

While probability-based sampling methods are considered better because they can prevent sampling bias, there are times when it is not possible to use one of these methods. For example, a researcher may not have access to the entire population. In cases were probability-based sampling methods are not practical, **convenience** **samples **are often used.

- Convenience Sampling
- A method of obtaining a sample from a population by ease of accessibility; such a sample is not random and may not be representative of the intended population.

## Example: Weight Loss Supplements

A weight loss company wants to compare how much weight adults lose on their supplement versus a competitor's supplement. To recruit participants, they post an advertisement in a newspaper asking for adults who want to lose weight. This is an example of a volunteer sample which is a **convenience sampling method**. The researchers are using a sample of individuals who volunteer to participate.

## Example: Chocolate Preferences

A chocolate company wants to know if customers prefer their dark chocolate with or without peanuts. They set up a table in a grocery store on a Monday morning, offer customers samples of their dark chocolate with and without peanuts, and ask which they prefer. This is an example of a **convenience sampling method**. The sample is not being selected using any probability-based method and may not be representative of the company's intended population. People who grocery shop may be a special subset of the population. For example, people who do not work traditional full-time jobs may be more likely to grocery shop at that time. The researchers are using a sample of individuals who happen to be grocery shopping on a Monday morning and who volunteer to eat their chocolate.

# 1.2.2.1 - Minitab Express: Simple Random Sampling

1.2.2.1 - Minitab Express: Simple Random SamplingUsing simple random sampling methods, each member of the population has an equal chance of being selected. We can use statistical software to select a simple random sample.

In the example below, we will use Minitab Express to randomly select 10 names from a class list.

## MinitabExpress – Random Sampling from a Column

Open the following data set:

and randomly select 10 using Minitab Express by:

**On a PC or Mac**select**DATA > Sample from Columns**- Double-click on the variable
*Name* - In the box labeled "Number of rows in each sample", enter
**10.** - By default, leave the method as "Sample without replacement".
- Click OK.

The result should be the following output:

Input | |
---|---|

Source data column | Name |

Number of rows sampled | 10 |

Method | Without replacement |

Output | |
---|---|

Sampled data column | C2 |

10 rows were sampled from Name and stored in C2. |

Along with a random sample of the names in the second column in the data worksheet:

C1 | C2 | |
---|---|---|

Name | Sample From Name | |

1 | Beckman | Qi |

2 | Beeson | Song |

3 | Boone | Walia |

4 | Botero | Gruver |

5 | Brooks | Corey |

6 | Brown | Cingolani |

7 | Campbell | Farooq |

8 | Cao | Yan |

9 | Chen | O'Donnell |

10 | Chen | Wang |

11 | Chung | |

12 | Cingolani | |

⋮ | ⋮ |

Since we are using simple random sampling procedures, the results will be different each time due to random sampling variation. Try these steps a few times, you should see that you get a different set of 10 names each time.

Select your operating system below to see a step-by-step guide for this example: