1.2 - Samples & Populations

We often have questions concerning large populations. Gathering information from the entire population is not always possible due to barriers such as time, accessibility, or cost. Instead of gathering information from the whole population, we often gather information from a smaller subset of the population, known as a sample.

Values concerning a sample are referred to as sample statistics while values concerning a population are referred to as population parameters.

Population: The entire set of possible cases

Sample: A subset of the population from which data are collected

Statistic: A measure concerning a sample (e.g., sample mean)

Parameter: A measure concerning a population (e.g., population mean)

The process of using sample statistics to make conclusions about population parameters is known as inferential statistics. In other words, data from a sample are used to make an inference about a population.

Inferential Statistics: Statistical procedures that use data from an observed sample to make a conclusion about a population

Example: Student Housing

A survey is carried out at Penn State Altoona to estimate the proportion of all undergraduate students living at home during the current term. Of the 3,838 undergraduate students enrolled at the campus, a random sample of 100 was surveyed.

Population: All 3,838 undergraduate students at Penn State Altoona
Sample: The 100 undergraduate students surveyed

We can use the data collected from the sample of 100 students to make inferences about the population of all 3,838 students.

Example: Polling Teachers

Educational policy researchers randomly selected 400 teachers at random from the National Science Teachers Association database of members and asked them whether or not they believed that evolution should be taught in public schools. They received responses from 252 teachers.

Population: All National Science Teachers Association members
Sample: The 252 respondents

The researchers can use the data collected from the 252 teachers who responded to the survey to make inferences about the population of all National Science Teachers Association members.

Example: Flipping a Coin

A fair coin is flipped 500 times and the number of heads is recorded.

Population: All flips of this coin
Sample: The 500 flips recorded in this study

We can use data from these 500 flips to make inferences about the population of all flips of this coin.

1.2.1 - Sampling Bias

Recall the entire group of individuals of interest is called the population. It may be unrealistic or even impossible to gather data from the entire population. The subset of the population from which data are actually gathered is the sample. A sample should be selected from a population randomly, otherwise it may be prone to bias. Our goal is to obtain a sample that is representative of the population.

Representative Sample: A subset of the population from which data are collected that accurately reflects the population

Bias: The systematic favoring of certain outcomes

Sampling Bias: Systematic favoring of certain outcomes due to the methods employed to obtain the sample

Example: Weight Loss Study Volunteers

A medical research center is testing a new weight loss treatment. They advertise on a social media site that they are looking for volunteers to participate. There is sampling bias because the sample will be limited to people who use the social media site where they advertised. The individuals who choose to participate may be different from the overall population. For example, volunteers may be individuals who are already actively trying to lose weight. This is not a representative sample because the sample may have characteristics that are different from the population of interest.

Example: NYC Advertising Study

The marketing department for a large retail chain wants to survey their customers about a new advertising plan. They go into one of their largest New York City stores on a Tuesday morning and survey the first 50 people who make a purchase. There is sampling bias for a number of reasons. They are only sampling at one store, in New York City; there may be differences between the customers at this store and those that shop at their other locations. By conducting their survey on a Tuesday morning they are limiting themselves to individuals who are out shopping at that time; the sample may lack people who work during the day. Finally, they only survey people who make a purchase; individuals who do not make a purchase, perhaps because they are not satisfied with the store, will not be included in their sample. This is not a representative sample because the sample selected may be different from the population of interest.

1.2.2 - Sampling Methods

There are many different ways to select a sample from a population. Some of these methods are probability-based, such as the simple random sampling method, which you'll read about below and in your textbook. Other probability-based methods include cluster sampling methods and stratified sampling methods. You may learn more about these if you take a research methods course or an advanced statistics course in the future. Other sampling methods are not probability-based, such as convenience sampling methods, which you will read about below.

Simple Random Sampling

To prevent sampling bias and obtain a representative sample, a sample should be selected using a probability-based sampling design which gives each individual a known chance of being selected. The most common probability-based sampling method is the simple random sampling method.

Using this method, a sample is selected without replacement. This means that once an individual has been selected to be a part of the sample they cannot be selected a second time. If multiple samples are being taken (e.g., when constructing a sampling distribution in Lesson 4), an individual can appear in more than one sample, but only once in each sample.

Simple Random Sampling: A method of obtaining a sample from a population in which every member of the population has an equal chance of being selected

Example: Community Service Attitudes

An institutional researcher is conducting a study of World Campus students’ attitudes toward community service. He takes a list of all 12,242 World Campus students and uses a random number generator to select 30 students whom he contacts to complete the survey. This researcher used simple random sampling because participants were selected from the overall population in a way that each individual had an equal chance of being selected.

Example: Languages

A student wants to learn more about the languages spoken in her town. She has access to the census forms submitted by all 3,500 households in her town. It would take too long for her to go through all 3,500 forms, so she uses a random number generator to select 100 households. She finds those 100 census forms and records data concerning the languages spoken in those households. This is a simple random sample because the sample of 100 households was selected in a way that each of the 3,500 households had an equal chance of being selected.

Convenience Sampling

While probability-based sampling methods are considered better because they can prevent sampling bias, there are times when it is not possible to use one of these methods. For example, a researcher may not have access to the entire population. In cases were probability-based sampling methods are not practical, convenience samples are often used.

Convenience Sampling: A method of obtaining a sample from a population by ease of accessibility; such a sample is not random and may not be representative of the intended population.

Example: Weight Loss Supplements

A weight loss company wants to compare how much weight adults lose on their supplement versus a competitor's supplement. To recruit participants, they post an advertisement in a newspaper asking for adults who want to lose weight. This is an example of a volunteer sample which is a convenience sampling method. The researchers are using a sample of individuals who volunteer to participate.

Example: Chocolate Preferences

A chocolate company wants to know if customers prefer their dark chocolate with or without peanuts. They set up a table in a grocery store on a Monday morning, offer customers samples of their dark chocolate with and without peanuts, and ask which they prefer. This is an example of a convenience sampling method. The sample is not being selected using any probability-based method and may not be representative of the company's intended population. People who grocery shop may be a special subset of the population. For example, people who do not work traditional full-time jobs may be more likely to grocery shop at that time. The researchers are using a sample of individuals who happen to be grocery shopping on a Monday morning and who volunteer to eat their chocolate.

1.2.2.1 - Minitab: Simple Random Sampling

At the end of most lessons, there will be a "Minitab" section. These pages will demonstrate how Minitab can be used to create some of the graphs or conduct some of the analyses presented in that lesson. Videos showing where to click will be provided after the step-by-step instructions.

Lesson 1 focused primarily on the design of research studies and data collection. There is just one feature in Minitab that is applicable to this lesson, and that is the Sample from Columns feature. This takes a simple random sample of cases from one or more variables in a dataset.

Minitab^® – Random Sampling from a Column

In this example, we have a worksheet containing the names of all of the Department of Statistics' full-time faculty members from the Spring 2021 semester.

These data are in the following files. The file ending in .mwx is a Minitab worksheet file; this can only be opened with Minitab 20. The file ending in .xlsx is an Excel file; this can be opened with any version of Minitab as well as with Excel:

FacultySP21.mwx

FacultySP21.xlsx

If this is your first time opening an .mwx file you may receive an error message if your computer does not know to open this in Minitab. You should be able to fix this by saving the file to your desktop, opening Minitab, and then opening the worksheet from within Minitab. After the first time, you computer should recognize that .mwx files should be opened with Minitab.

To select a simple random sample of 10 names from this dataset, follow the steps below. At the bottom of this section there is a video that shows where to click.

Open the data in Minitab
From the tool bar, select Calc > Sample from Columns...
In the Number of rows to sample box, enter 10
Click in the From columns box and then double click the Name variable
Click in the Store samples in box and type MySample
Click OK

The third column of your worksheet should now be labeled "MySample" and it should contain 10 names. Since we are using simple random sampling procedures, the results will be different each time due to random sampling variation. Try these steps a few times, you should see that you get a different set of 10 names each time.

Video Walkthrough

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility