1: Collecting Data

Welcome to Lesson 1! In this lesson, we will be learning about collecting data. This is essentially a brief introduction to research methods. Depending on your academic major, you may also be required to take a full course in research methods after this course, which will focus on the methods used in your field. In this lesson, we'll focus on methods that are applied across many fields.

You will be introduced to a lot of the terminology that we will be using throughout the course. Because this lesson is heavy on vocabulary, many students find that making and studying from flashcards is a good strategy to apply this week.

Objectives

Upon successful completion of this lesson, you will be able to:

Identify cases and variables in a research study
Classify variables as categorical or quantitative
Identify explanatory and response variables in a research study
Distinguish between a sample and a population
Determine whether a given sample is representative of the intended population
Identify simple random sampling and convenience sampling methods
Use Minitab to draw a simple random sample from a known population
Identify potential non-response and response bias
Distinguish between experimental and observational designs
Identify confounding variables
Identify randomized experiments
Determine when causal conclusions (as opposed to associations) can be made
Classify samples as being independent or paired
Identify control groups, placebos, and blinding in research studies and explain why each is used

1.1 - Cases & Variables

Throughout the course, we will be using many of the terms introduced in this lesson. Let's start by defining some of the most frequently used terms: case, variable, and constant.

A case is an experimental unit. These are the individuals from which data are collected. When data are collected from humans, we sometimes call them participants. When data are collected from animals, the term subjects is often used. Another synonym is experimental unit.

A variable is a characteristic that is measured and can take on different values. In other words, something that varies between cases. This is in contrast to a constant which is the same for all cases in a research study.

Case: An experimental unit from which data are collected

Variable: Characteristic of cases that can take on different values (in other words, something that can vary)

Constant: Characteristic that is the same for all cases in a study

Let's look at a few examples.

Example: Study Time & Grades

A teacher wants to know if third grade students who spend more time reading at home get higher homework and exam grades.

The students are the cases. There are three variables: amount of time spent reading at home, homework grades, and exam grades. The grade-level of the students is a constant because all students are in the third grade.

Example: Dog Food

A researcher wants to know if dogs who are fed only canned food have different body mass indexes (BMI) than dogs who are fed only hard food. They collect BMI data from 50 dogs who eat only canned food and 50 dogs who eat only hard food.

The cases are the dogs. There are two variables: type of food and BMI. A constant would be subspecies, because all cases are domestic dogs.

Example: Age & Weight of Sea Otters

Researchers are studying the relationship between age and weight in a sample of 100 male sea otters (Enhydra lutris).

The 100 otters are the cases. There are two variables: age and weight. Biological sex is a constant because all subjects are male. Species is also a constant.

1.1.1 - Categorical & Quantitative Variables

Variables can be classified as categorical or quantitative. Categorical variables are those that provide groupings that may have no logical order, or a logical order with inconsistent differences between groups (e.g., the difference between 1st place and 2 second place in a race is not equivalent to the difference between 3rd place and 4th place). Quantitative variables have numerical values with consistent intervals.

Categorical variable: Names or labels (i.e., categories) with no logical order or with a logical order but inconsistent differences between groups (e.g., rankings), also known as qualitative.

Quantitative variable: Numerical values with magnitudes that can be placed in a meaningful order with consistent intervals, also known as numerical.

Example: Weight

A team of medical researchers weigh participants in kilograms. Weight in kilograms is a quantitative variable because it takes on numerical values with meaningful magnitudes and equal intervals.

Example: Favorite Ice Cream Flavor

A teacher conducts a poll in her class. She asks her students if they would prefer chocolate, vanilla, or strawberry ice cream at their class party. Preferred ice cream flavor is a categorical variable because the different flavors are categories with no meaningful order of magnitudes.

Example: Birth Location

A survey asks “On which continent were you born?” This is a categorical variable because the different continents represent categories without a meaningful order of magnitudes.

Example: Children per Household

A census asks every household in a city how many children under the age of 18 reside there. Number of children in a household is a quantitative variable because it has a numerical value with a meaningful order and equal intervals.

Example: Highway Mile Markers

When a car breaks down on the highway, the emergency dispatcher may ask for the nearest mile marker. Highway mile marker value is a quantitative variable because it is numeric with a meaningful order of magnitudes and equal intervals.

Example: Running Distance

A runner records the distance he runs each day in miles. Distance in miles is a quantitative variable because it takes on numerical values with meaningful magnitudes and equal intervals.

Example: Highest Level of Education

A census asks residents for the highest level of education they have obtained: less than high school, high school, 2-year degree, 4-year degree, master's degree, doctoral/professional degree. This is a categorical variable. While there is a meaningful order of educational attainment, the differences between each category are not consistent. For example, the difference between high school and 2-year degree is not the same as the difference between a master's degree and a doctoral/professional degree. Because there are not equal intervals, this variable cannot be classified as quantitative.

Example: Online Courses Taught

A survey designed for online instructors asks, "How many online courses have you taught?" Three options are given: "none," "some," or "many." While there is a meaningful order of magnitudes, there are not equal intervals. This is a categorical variable.

If the survey had asked, "How many online courses have you taught? Enter a number." this would be a quantitative variable. Here, participants are answering with the number of online courses they have taught. This is a numerical value with a meaningful order of magnitudes and equal intervals.

1.1.2 - Explanatory & Response Variables

In some research studies one variable is used to predict or explain differences in another variable. In those cases, the explanatory variable is used to predict or explain differences in the response variable. In an experimental study, the explanatory variable is the variable that is manipulated by the researcher.

Explanatory Variable: Also known as the independent or predictor variable, it explains variations in the response variable; in an experimental study, it is manipulated by the researcher

Response Variable: Also known as the dependent or outcome variable, its value is predicted or its variation is explained by the explanatory variable; in an experimental study, this is the outcome that is measured following manipulation of the explanatory variable

Example: Panda Fertility Treatments

A team of veterinarians wants to compare the effectiveness of two fertility treatments for pandas in captivity. The two treatments are in-vitro fertilization and male fertility medications. This experiment has one explanatory variable: type of fertility treatment. The response variable is a measure of fertility rate.

Example: Public Speaking Approaches

A public speaking teacher has developed a new lesson that she believes decreases student anxiety in public speaking situations more than the old lesson. She designs an experiment to test if her new lesson works better than the old lesson. Public speaking students are randomly assigned to receive either the new or old lesson; their anxiety levels during a variety of public speaking experiences are measured. This experiment has one explanatory variable: the lesson received. The response variable is anxiety level.

Example: Coffee Bean Origin

A researcher believes that the origin of the beans used to make a cup of coffee affects hyperactivity. He wants to compare coffee from three different regions: Africa, South America, and Mexico. The explanatory variable is the origin of coffee bean; this has three levels: Africa, South America, and Mexico. The response variable is hyperactivity level.

Example: Height & Age

A group of middle school students wants to know if they can use height to predict age. They take a random sample of 50 people at their school, both students and teachers, and record each individual's height and age. This is an observational study. The students want to use height to predict age so the explanatory variable is height and the response variable is age.

Example: Grade & Height

Research question: Do fourth graders tend to be taller than third graders?

This is an observational study. The researcher wants to use grade level to explain differences in height. The explanatory variable is grade level. The response variable is height.

1.2 - Samples & Populations

We often have questions concerning large populations. Gathering information from the entire population is not always possible due to barriers such as time, accessibility, or cost. Instead of gathering information from the whole population, we often gather information from a smaller subset of the population, known as a sample.

Values concerning a sample are referred to as sample statistics while values concerning a population are referred to as population parameters.

Population: The entire set of possible cases

Sample: A subset of the population from which data are collected

Statistic: A measure concerning a sample (e.g., sample mean)

Parameter: A measure concerning a population (e.g., population mean)

The process of using sample statistics to make conclusions about population parameters is known as inferential statistics. In other words, data from a sample are used to make an inference about a population.

Inferential Statistics: Statistical procedures that use data from an observed sample to make a conclusion about a population

Example: Student Housing

A survey is carried out at Penn State Altoona to estimate the proportion of all undergraduate students living at home during the current term. Of the 3,838 undergraduate students enrolled at the campus, a random sample of 100 was surveyed.

Population: All 3,838 undergraduate students at Penn State Altoona
Sample: The 100 undergraduate students surveyed

We can use the data collected from the sample of 100 students to make inferences about the population of all 3,838 students.

Example: Polling Teachers

Educational policy researchers randomly selected 400 teachers at random from the National Science Teachers Association database of members and asked them whether or not they believed that evolution should be taught in public schools. They received responses from 252 teachers.

Population: All National Science Teachers Association members
Sample: The 252 respondents

The researchers can use the data collected from the 252 teachers who responded to the survey to make inferences about the population of all National Science Teachers Association members.

Example: Flipping a Coin

A fair coin is flipped 500 times and the number of heads is recorded.

Population: All flips of this coin
Sample: The 500 flips recorded in this study

We can use data from these 500 flips to make inferences about the population of all flips of this coin.

1.2.1 - Sampling Bias

Recall the entire group of individuals of interest is called the population. It may be unrealistic or even impossible to gather data from the entire population. The subset of the population from which data are actually gathered is the sample. A sample should be selected from a population randomly, otherwise it may be prone to bias. Our goal is to obtain a sample that is representative of the population.

Representative Sample: A subset of the population from which data are collected that accurately reflects the population

Bias: The systematic favoring of certain outcomes

Sampling Bias: Systematic favoring of certain outcomes due to the methods employed to obtain the sample

Example: Weight Loss Study Volunteers

A medical research center is testing a new weight loss treatment. They advertise on a social media site that they are looking for volunteers to participate. There is sampling bias because the sample will be limited to people who use the social media site where they advertised. The individuals who choose to participate may be different from the overall population. For example, volunteers may be individuals who are already actively trying to lose weight. This is not a representative sample because the sample may have characteristics that are different from the population of interest.

Example: NYC Advertising Study

The marketing department for a large retail chain wants to survey their customers about a new advertising plan. They go into one of their largest New York City stores on a Tuesday morning and survey the first 50 people who make a purchase. There is sampling bias for a number of reasons. They are only sampling at one store, in New York City; there may be differences between the customers at this store and those that shop at their other locations. By conducting their survey on a Tuesday morning they are limiting themselves to individuals who are out shopping at that time; the sample may lack people who work during the day. Finally, they only survey people who make a purchase; individuals who do not make a purchase, perhaps because they are not satisfied with the store, will not be included in their sample. This is not a representative sample because the sample selected may be different from the population of interest.

1.2.2 - Sampling Methods

There are many different ways to select a sample from a population. Some of these methods are probability-based, such as the simple random sampling method, which you'll read about below and in your textbook. Other probability-based methods include cluster sampling methods and stratified sampling methods. You may learn more about these if you take a research methods course or an advanced statistics course in the future. Other sampling methods are not probability-based, such as convenience sampling methods, which you will read about below.

Simple Random Sampling

To prevent sampling bias and obtain a representative sample, a sample should be selected using a probability-based sampling design which gives each individual a known chance of being selected. The most common probability-based sampling method is the simple random sampling method.

Using this method, a sample is selected without replacement. This means that once an individual has been selected to be a part of the sample they cannot be selected a second time. If multiple samples are being taken (e.g., when constructing a sampling distribution in Lesson 4), an individual can appear in more than one sample, but only once in each sample.

Simple Random Sampling: A method of obtaining a sample from a population in which every member of the population has an equal chance of being selected

Example: Community Service Attitudes

An institutional researcher is conducting a study of World Campus students’ attitudes toward community service. He takes a list of all 12,242 World Campus students and uses a random number generator to select 30 students whom he contacts to complete the survey. This researcher used simple random sampling because participants were selected from the overall population in a way that each individual had an equal chance of being selected.

Example: Languages

A student wants to learn more about the languages spoken in her town. She has access to the census forms submitted by all 3,500 households in her town. It would take too long for her to go through all 3,500 forms, so she uses a random number generator to select 100 households. She finds those 100 census forms and records data concerning the languages spoken in those households. This is a simple random sample because the sample of 100 households was selected in a way that each of the 3,500 households had an equal chance of being selected.

Convenience Sampling

While probability-based sampling methods are considered better because they can prevent sampling bias, there are times when it is not possible to use one of these methods. For example, a researcher may not have access to the entire population. In cases were probability-based sampling methods are not practical, convenience samples are often used.

Convenience Sampling: A method of obtaining a sample from a population by ease of accessibility; such a sample is not random and may not be representative of the intended population.

Example: Weight Loss Supplements

A weight loss company wants to compare how much weight adults lose on their supplement versus a competitor's supplement. To recruit participants, they post an advertisement in a newspaper asking for adults who want to lose weight. This is an example of a volunteer sample which is a convenience sampling method. The researchers are using a sample of individuals who volunteer to participate.

Example: Chocolate Preferences

A chocolate company wants to know if customers prefer their dark chocolate with or without peanuts. They set up a table in a grocery store on a Monday morning, offer customers samples of their dark chocolate with and without peanuts, and ask which they prefer. This is an example of a convenience sampling method. The sample is not being selected using any probability-based method and may not be representative of the company's intended population. People who grocery shop may be a special subset of the population. For example, people who do not work traditional full-time jobs may be more likely to grocery shop at that time. The researchers are using a sample of individuals who happen to be grocery shopping on a Monday morning and who volunteer to eat their chocolate.

1.2.2.1 - Minitab: Simple Random Sampling

At the end of most lessons, there will be a "Minitab" section. These pages will demonstrate how Minitab can be used to create some of the graphs or conduct some of the analyses presented in that lesson. Videos showing where to click will be provided after the step-by-step instructions.

Lesson 1 focused primarily on the design of research studies and data collection. There is just one feature in Minitab that is applicable to this lesson, and that is the Sample from Columns feature. This takes a simple random sample of cases from one or more variables in a dataset.

Minitab^® – Random Sampling from a Column

In this example, we have a worksheet containing the names of all of the Department of Statistics' full-time faculty members from the Spring 2021 semester.

These data are in the following files. The file ending in .mwx is a Minitab worksheet file; this can only be opened with Minitab 20. The file ending in .xlsx is an Excel file; this can be opened with any version of Minitab as well as with Excel:

FacultySP21.mwx

FacultySP21.xlsx

If this is your first time opening an .mwx file you may receive an error message if your computer does not know to open this in Minitab. You should be able to fix this by saving the file to your desktop, opening Minitab, and then opening the worksheet from within Minitab. After the first time, you computer should recognize that .mwx files should be opened with Minitab.

To select a simple random sample of 10 names from this dataset, follow the steps below. At the bottom of this section there is a video that shows where to click.

Open the data in Minitab
From the tool bar, select Calc > Sample from Columns...
In the Number of rows to sample box, enter 10
Click in the From columns box and then double click the Name variable
Click in the Store samples in box and type MySample
Click OK

The third column of your worksheet should now be labeled "MySample" and it should contain 10 names. Since we are using simple random sampling procedures, the results will be different each time due to random sampling variation. Try these steps a few times, you should see that you get a different set of 10 names each time.

Video Walkthrough

1.3 - Other Sources of Bias

On the previous pages you learned about sampling bias and how simple random sampling methods can be used to avoid sampling bias. Here, we will discuss two other sources of bias: non-response bias and response bias. These are both problems that should be prevented in the design of a research study.

Non-Response Bias: Systematic favoring of certain outcomes that occurs when the individuals who choose participate in a study differ from the individuals who choose to not participate

Response Bias: Systematic favoring of certain outcomes that occurs when participants do not respond truthfully; they may do so to align with social norms or to appease the researcher

Example: Restaurant Experience Survey

A restaurant invited their recent customers to complete an online survey. Customers who had really strong feelings about their experience, either positive or negative, were very likely to complete the survey while customers who had a neutral experience were much less likely to complete the survey. This is an example of non-response bias because the individuals who chose to participate differed from those who chose to not participate.

Example: Retail Store Hours

A retail store was considering expanding their operating hours. To determine if this was a need perceived by their customers, they conducted a survey over the telephone to obtain data. Research assistants called the phone numbers of customers who were randomly selected to participate between the hours of 9AM and 4PM. Individuals who were at work were less likely to answer their phone call or agree to participate in the study than individuals who were at home at that time. This is an example of non-response bias because the individuals who responded to the survey were different from individuals who did not respond in terms of their work schedule.

Example: Sexual Activity Survey

A psychologist is conducting a research study concerning sexual activities. The survey is administered over the phone and many of the questions are personal. Some participants feel uncomfortable and do not answer honestly. This is an example of response bias because the participants are not responding truthfully; instead their responses are biased toward what they perceive as being socially acceptable.

Example: Cheating in Class

Using an anonymous online survey, a professor asks his students “Have you cheated on an exam in my class?” Many of the students who have cheated still answered “no.” This is an example of response bias because the participants are not responding truthfully; instead their responses are biased toward responses that are less likely to get them in trouble.

1.4 - Research Study Design

Experimental and Observational Designs

Research studies are often classified in terms of their designs. Here, we will make the distinction between experimental and observational research designs.

Experimental Research Design: A study in which the researcher manipulates the treatments (i.e., level of the explanatory variable) received by subjects and collects data; also known as a scientific study

Observational Research Design: A study in which the researcher collects data without performing any manipulations; also known as a non-experimental study

Example: Caffeinated Coffee Studies

An organization wants to know if drinking caffeinated coffee causes hyperactivity in college students. To test their research question, they select a sample of college students and give them a survey concerning their intake of caffeinated coffee and their hyperactivity levels. This is an observational study because the researchers are not making any manipulations. They are observing what is happening without intervening. This is not an experiment because no treatment was imposed by the researchers.

Another organization also wants to know if drinking caffeinated coffee causes hyperactivity in college students. They design a different study. They select a random sample of college students and randomly assign them to drink coffee with or without caffeine. The researchers observe the students' behaviors. This is an experimental study because the researchers are manipulating the treatment that each participant receives.

Choosing a Research Study Design

Usually, if there is an option available, experimental studies are preferred over observational studies. Later in this lesson you will learn about randomization, placebos, and blinding, which can all be built into experimental designs to strengthen the conclusions that can be made.

There are times when an experimental design is not possible. If the independent variable is naturally occurring, it may not be possible for a researcher to manipulate it. For example, race, ethnicity, birthplace, age, gender identity, and biological sex are all variables that cannot be randomly assigned to different cases.

On Your Own

A team of researchers want to know if Advil or Tylenol is more effective.

Think about the following data collection methods, then click on the method to compare your answers.

Method 1

Researchers survey a sample of adults and ask if they use Advil or Tylenol. They ask them to rate the effectiveness of the one they use. Is this an observational study or experimental study?

This is an observational study because the researchers observed the difference between two existing groups (Advil and Tylenol users). The researchers did not manipulate the participants' experiences.

Method 2

Researchers obtain a random sample of adults. They randomly assign half of the participants to take Advil and the other half to take Tylenol. They ask each participant to rate the effectiveness of the one that they were assigned to take. Is this an observational study or experimental study?

This is an experimental study because the researchers assigned participants to groups.

1.4.1 - Confounding Variables

Randomized experiments are typically preferred over observational studies or experimental studies that lack randomization because they allow for more control. A common problem in studies without randomization is that there may be other variables influencing the results. These are known as confounding variables. A confounding variable is related to both the explanatory variable and the response variable.

Confounding Variable: Characteristic that varies between cases and is related to both the explanatory and response variables; also known as a lurking variable or a third variable

Example: Ice Cream & Home Invasions

There is a positive relationship between ice cream sales and home invasions (i.e., as ice cream sales increase throughout the year so do home invasions). It is clear that increases in ice cream sales do not cause home invasions to increase, and home invasions do not cause an increase in ice cream sales. There is a third variable at play here: outdoor temperature. When the weather is warmer both ice cream sales and home invasions increase. In this case, outdoor temperature is a confounding variable because it is related to both ice cream sales and home invasions.

Example: Weight & Preferred Beverage

Research question: Do adults who prefer to drink beer, wine, and water differ in terms of their mean weights?

Data were collected from a sample of World Campus students to address the research question above. The researchers found that adults who preferred beer tended to weigh more than those who preferred wine.

A confounding variable in this study was gender identity. Those who identified as men were more likely to prefer beer and those who identified as women were more likely to prefer wine. In the sample, men weighed more than women on average.

1.4.2 - Causal Conclusions

In order to control for confounding variables, participants can be randomly assigned to different levels of the explanatory variable. This act of randomly assigning cases to different levels of the explanatory variable is known as randomization. An experiment that involves randomization may be referred to as a randomized experiment or randomized comparative experiment. By randomly assigning cases to different conditions, a causal conclusion can be made; in other words, we can say that differences in the response variable are caused by differences in the explanatory variable. Without randomization, an association can be noted, but a causal conclusion cannot be made.

Note that randomization and random sampling are different concepts. Randomization refers to the random assignment of experimental units to different conditions (e.g., different treatment groups). Random sampling refers to probability-based methods for selecting a sample from a population.

Randomization: The act of randomly assigning cases to different levels of the explanatory variable

Causation: Changes in one variable can be attributed to changes in a second variable

Association: A relationship between variables

Example: Fitness Programs

Two teams have designed research studies to compare the weight loss of participants in two different fitness programs. Each team used a different research study design.

The first team surveyed people who already participate in each program. This is an observational study, which means there is no randomization. Each group is comprised of participants who made the personal decision to engaged in that fitness program. With this research study design, the researchers can only determine whether or not there is an association between the fitness program and participants' weight loss. A causal conclusion cannot be made because there may be confounding variables. The people in the two groups may be different in some key ways. For example, if the cost of the two programs is different, the two groups may differ in terms of their finances.

The second team of researchers obtained a sample of participants and randomly assigned half to participate in the first fitness program and half to participate in the second fitness program. They measured each participants' weight twice: both at the beginning and end of their study. This is a randomized experiment because the researchers randomly assigned each participant to one of the two programs. Because participants were randomly assigned to groups, the groups should be balanced in terms of any confounding variables and a causal conclusion may be drawn from this study.

1.4.3 - Independent and Paired Samples

In both observational and experimental studies, we often want to compare two or more groups. When comparing two or more groups, cases may be independent or paired.

Independent Groups: Cases in each group are unrelated to one another.

Paired Groups: Cases in each group are meaningfully matched with one another; also known as dependent samples or matched pairs

Example: Exam Scores

An instructor wants to compare students' scores on the midterm and final exam. This is most often done by obtaining a sample of students and recording each student's midterm exam score and final exam score. In other words, there would be two measurements for each student. This is an example of a matched pairs design because data would be paired by student.

Example: Shoes

A shoe company is studying how many shoes Italian men and women own. In one research study they take a random sample of 500 Italian adults and ask each individual if they identify as a man or women and how many pairs of shoes they own. The men and women in this study are in two independent groups.

In a second study the researchers use a different design. This time they take a random sample of 250 heterosexual married couples in Italy (i.e., 250 husbands and 250 wives). They record the number of shoes owned by each husband and each wife. This is an example of a matched pairs design. Data are paired by couple.

1.4.4 - Control and Placebo Groups

A control group is an experimental condition that does not receive the actual treatment and may serve as a baseline. A control group may receive a placebo or they may receive no treatment at all. A placebo is something that appears to the participants to be an active treatment, but does not actually contain the active treatment. For example, a placebo pill is a sugar pill that participants may take not knowing that it does not contain any active medicine. This can lead to a psychological phenomena called the placebo effect which occurs when participants who are given a placebo treatment experience a change even though they are not receiving any active treatment. Researchers use placebos in the control group to determine if any differences between groups are due to the active medicine or the participants' perceptions (the placebo effect).

Control Group: A level of the explanatory variable that does not receive an active treatment; they may receive no treatment or a placebo

Placebo Group: A group that receives what, to them, appears to be a treatment, but actually is neutral and does not contain any active treatment (e.g., a sugar pill in a medication study)

Example: Vitamin B Energy Study

Researchers want to know if adults who consume a drink that is high in vitamin B-12 have increased energy. They obtain a representative sample of adults. All participants are given a drink that they are told to consume every morning. They are not told what is in the drink. Half are given a drink that is high in vitamin B-12 while the other half are given a drink that tastes the same but contains no vitamin B-12.

The participants who received the drink with no vitamin B-12 are the placebo group. The purpose of the placebo group in this study is to make the two groups equivalent except for the presence of the vitamin B-12. By comparing these two groups, the researchers will be able to determine what impact the vitamin B-12 had on the response variable. We could also say that this served as a control group because this group did not receive any active ingredients.

1.4.5 - Blinding

Blinding techniques are also used to avoid bias. In a single-blind study the participants do not know what treatment groups they are in, but the researchers interacting with them do know. In a double-blind study, the participants do not know what treatment groups they are in and neither do the researchers who are interacting with them directly. Double-blind studies are used to prevent researcher bias.

Blinding: Procedure employed in research to prevent bias in which the participants and/or the researchers interacting with the participations do not know which treatment each case is receiving

Single-Blind Study: Research study in which the participants do not know the treatment group that they have been assigned to

Double-Blind Study: Research study in which neither the participants nor the researchers interacting with them know which cases have been assigned to which treatment groups

Example: Yogurt Tasting

Researchers are comparing a low-fat blueberry yogurt to a high-fat blueberry yogurt. Participants are randomly assigned to receive one type of yogurt. After tasting it, they complete an online survey. The researchers know which yogurt containers are low-fat and which are high-fat, but participants are not told. This is an example of a single-blind study because the researchers know which participants are in the low- and high-fat groups but the participants do not know. A double-blind study may not be necessary in this case since the researchers have only minimal contact with the participants.

Example: Caffeine Energy Study

Researchers want to know if adult males who consume high amounts of caffeine interact more energetically. They obtain a representative sample and randomly assign half of the participants to take a caffeine pill and half to take a placebo pill. The pills are randomly numbered and coded so at the time the researchers do not know which participants have been given caffeine and which have been given the placebo. All participants are told that they may have been given a caffeine pill. After taking the pill, researchers observe the participants interacting with one another and rate the interactions in terms of level of energy.

This is a double-blind study because neither the researchers nor the participants know who is in which group at the time the data are collected. After the data are collected, researchers can look at the pill codes to determine which groups the participants were in to conduct their analyses. A double-blind study is necessary here because the researchers are observing and rating the participants. If the researchers know who is in the caffeine group they may be more likely to rate their levels of energy as very high because that is consistent with their hypothesis.

1.5 - Lesson 1 Summary

Lesson 1: Learning Objectives

Upon successful completion of this lesson, you will be able to:

Identify cases and variables in a research study
Classify variables as categorical or quantitative
Identify explanatory and response variables in a research study
Distinguish between a sample and a population
Determine whether a given sample is representative of the intended population
Identify simple random sampling and convenience sampling methods
Use Minitab to draw a simple random sample from a known population
Identify potential non-response and response bias
Distinguish between experimental and observational designs
Identify confounding variables
Identify randomized experiments
Determine when causal conclusions (as opposed to associations) can be made
Classify samples as being independent or paired
Identify control groups, placebos, and blinding in research studies and explain why each is used

In this lesson you learned about how data are collected. You were introduced to terminology that will be used throughout the course and you examined different types of research study designs (experimental and observational), sampling methods, and sources of bias. You learned that in order to make generalizations from a sample to a population the sample must be representative of the population; ideally the sample should be randomly selected using a probability-based sampling method such as a simple random sampling. In order to make a causal conclusion, randomization is required.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility

1: Collecting Data

Objectives

1.1 - Cases & Variables

Example: Study Time & Grades

Example: Dog Food

Example: Age & Weight of Sea Otters

1.1.1 - Categorical & Quantitative Variables

Example: Weight

Example: Favorite Ice Cream Flavor

Example: Birth Location

Example: Children per Household

Example: Highway Mile Markers

Example: Running Distance

Example: Highest Level of Education

Example: Online Courses Taught

1.1.2 - Explanatory & Response Variables

Example: Panda Fertility Treatments

Example: Public Speaking Approaches

Example: Coffee Bean Origin

Example: Height & Age

Example: Grade & Height

1.2 - Samples & Populations

Example: Student Housing

Example: Polling Teachers

Example: Flipping a Coin

1.2.1 - Sampling Bias

Example: Weight Loss Study Volunteers

Example: NYC Advertising Study

1.2.2 - Sampling Methods

Simple Random Sampling

Example: Community Service Attitudes

Example: Languages

Convenience Sampling

Example: Weight Loss Supplements

Example: Chocolate Preferences

1.2.2.1 - Minitab: Simple Random Sampling

Minitab® – Random Sampling from a Column

1.3 - Other Sources of Bias

Example: Restaurant Experience Survey

Example: Retail Store Hours

Example: Sexual Activity Survey

Example: Cheating in Class

1.4 - Research Study Design

Experimental and Observational Designs

Example: Caffeinated Coffee Studies

Choosing a Research Study Design

On Your Own

1.4.1 - Confounding Variables

Example: Ice Cream & Home Invasions

Example: Weight & Preferred Beverage

1.4.2 - Causal Conclusions

Example: Fitness Programs

1.4.3 - Independent and Paired Samples

Example: Exam Scores

Example: Shoes

1.4.4 - Control and Placebo Groups

Example: Vitamin B Energy Study

1.4.5 - Blinding

Example: Yogurt Tasting

Example: Caffeine Energy Study

1.5 - Lesson 1 Summary

Lesson 1: Learning Objectives

Minitab^® – Random Sampling from a Column