The following examples are meant to illuminate the world of statistics.
Example 1.1: Angry Women Section
Who are those angry women? (Streitfield, D., 1988 and Wallis, 1987.) In 1987, Shere Hite published a best-selling book called Women and Love: A Cultural Revolution in Progress. This 7-year research project produced a controversial 922-page publication that summarized the results from a survey that was designed to examine how American women feel about their relationships with men. Hite mailed out 100,000 fifteen-page questionnaires to women who were members of a wide variety of organizations across the U.S. These organizations included church, political, volunteer, senior citizen, and counseling groups, among many others. Questionnaires were actually sent to the leader of each organization. The leader was asked to distribute questionnaires to all members. Each questionnaire contained 127 open-ended questions with many parts and follow-ups. Part of Hite's directions read as follows: "Feel free to skip around and answer only those questions you choose." Approximately 4500 questionnaires were returned. Below are a few statements from this 1987 publication.
- 84% of women are not emotionally satisfied with their relationships
- 95% of women reported emotional and psychological harassment from their partners
- 70% of women married 5 years or more are having extramarital affairs
You should notice that this study is an example of a sample survey. The sample is comprised of individuals who actually provided the data while the population is the larger group from which the sample is chosen and whom the sample is to represent. In this example, the population is all American women (although some people may say all American women who have relationships with men), while the sample is the 4500 respondents who returned the questionnaire.
As you might expect, a sample should appropriately represent the population. However, in this instance, the sample does not represent the population because of two problems. The first problem found in this study is that only "joiners" were allowed to be a part of the sample. Even though Shere Hite tries to defend her methods by saying she sampled from a variety of organizations, the fact remains that only people who were involved in some organization had a chance to be in the sample. This problem is an example of selection bias.
The other problem found with this sample is nonresponse bias. Nonresponse bias can occur when a large number of people who are selected for the study elect to not respond to the survey or key questions on the survey. This is clearly evident because the response rate was only 4500/100,000 = 4.5%. (Note: most researchers like response rates to be at least 60% or 70%). Moreover, the directions encouraged the participants to "skip around" and answer only questions that they liked. As you might expect, only people with strong opinions would take the time to answer a questionnaire that contained 127 open-ended questions. In fact, Shere Hite estimated that, on the average, participants took about 4.4 hours to answer the questionnaire. Also the use of the group leader to distribute the questionnaires meant that there was a gatekeeper who had power to affect both who responded and the response rate for each organization.
So with Example 1.1, the overall conclusion is that even though the sample size is quite large, the sample does not adequately represent the population. Unfortunately, the results from this 7-year study are of little value.
Example 1.2: Pets and Marriage Section
Does owning a pet lead to less marital problems? (Rubin, 1998)
Karen Allen, a researcher at the University of Buffalo, conducted a study to determine whether or not couples who own cats or dogs have more satisfying marriages and experience less stress than couples who don't own pets. Allen compared 50 pet-owning couples with 50 pet-free couples. The volunteers completed a standard questionnaire that assessed both their relationships and attachments to pets. Each couple also kept track of their social contacts for two weeks. Allen examined stress levels by monitoring the heart rates and blood pressure readings while couples discussed sensitive topics. Pet-owning couples not only started out with lower heart rates and lower blood pressure readings but also had smaller increases in heart rates and blood pressure readings when they quarreled.
The study described above is an example of a comparative study. In this instance, the couples who owned pets are compared with couples who do not own pets as shown in Figure 1.1:
A comparative study can either be an observational study or an experiment. Observational studies collect data on participants in their naturally occurring settings/groupings, while with experiments, the participants are assigned to the groups being compared by the researcher. In a randomized experiment,this assignment is made using a chance mechanism (like flipping a coin).
The study found in this example is an observational study because participants are observed in their naturally occurring groupings as either a pet-owning or pet-free couple. It would be difficult to conduct a randomized experiment because the researcher would have to randomly assign couples to either have or not have pets. It is not ethical to impose pet-ownership on the couples, nor would it necessarily be good for the pet. So what are the statistical differences between observational studies and experiments? With a randomized experiment, appropriate evidence can support cause and effect conclusions (of course other conditions must also be met - such as demonstrating that differences between groups are not just the result of natural variability). On the other hand, observational studies can only describe associations - they cannot demonstrate a cause-and-effect relationship.
In this study, one cannot say that owning pets causes married couples to have less stress and more satisfying marriages because randomization was not used to cancel out other factors that may affect stress level and marital satisfaction. Factors such as income, number of hours spent working, where you live (i.e., suburbs versus inner city), whether or not there are children, etc., may also be responsible for changes in stress level and marital satisfaction. We will never know because the study is not a randomized experiment.
The researcher correctly stated the conclusion by indicating that there was a difference in the two groups when considering stress level and marital satisfaction. Appropriately, no "cause and effect" language was used. However, it is not uncommon for people who have no statistical background to incorrectly infer a "cause and effect" conclusion from observational studies. So as you examine other studies that are found in the daily news, first determine if the study is an experiment or an observational study. Next, decide if the conclusions are appropriate for the type of study that was conducted.
Example 1.3: Weights of BALB/c mice Section
BALB/c mice are a strain of albino mice commonly used in experiments conducted in cancer research laboratories. A sample of five male and five female eight-week-old BALB/c mice gave the following weights in grams:
|Male mice weights (g):||21||16||23||25||19|
|Female mice weights (g):||22||14||17||20||16|
Of course, we know that human males tend to weigh more than females, but what about this strain of mice? Is there enough evidence here to convince you that there is a difference in the weights of male and female BALB/c mice? Figure 2.2 gives a comparative dot plot of the data.
Figure 1.2. Dot plot of the weights of BALB/c mice by gender, sample size = 5
As you can see the groups are overlapping and it’s difficult to tell if the males or females are typically heavier in this case. We really don’t have strong evidence one way or the other. Figure 1.3 gives a dot plot for samples of 15 male and 15 female BALB/c mice at the same age.
Figure 1.3. Dot plot of the weights of BALB/c mice by gender, sample size = 15
As you can see, it is much easier to distinguish between the two groups with the larger sample size. There is some variability in the weights from mouse-to-mouse but, by looking at the larger sample size, it becomes clear that the males weigh more on average. The sample size is an important factor to consider when trying to detect differences between groups since overall patterns take a more definitive shape when you have more data.
A summary of the lessons learned so far ... is that in order to conduct a proper study, one must:
- Get a representative sample
- Get a large enough sample
- Decide whether the study should be an observational study or an experiment