Lesson 1: Statistics: Benefits, Risks, and Measurements

Lesson 1: Statistics: Benefits, Risks, and Measurements

Lesson 1 Overview

What is statistics? Statistics is about reasoning with data in the context of uncertainty in order to learn and understand things about the world around us. It is the science of learning from data.  We hope you enjoy exploring the topic as you move through the 11 lessons of the course.

Statistical Paradigm

[Not supported by viewer]
Probability

The rules of probability can tell us the likelihood of different types of samples that might arise from a particular population.

[Not supported by viewer]
Inference

We want to infer what parameter values are most consistent with the sample statistic at hand.

[Not supported by viewer]
Conclude

What does our knowledge of the parameter values tell us about the population?

[Not supported by viewer]
Describe and Compare

Data is collected from the samples and, with sample data in hand, we attempt to create statistical summaries and pictures that give the salient features of the data collected.

[Not supported by viewer]

Samples

[Not supported by viewer]

Statistical Summaries and Pictures

[Not supported by viewer]

Population

[Not supported by viewer]

Parameters

[Not supported by viewer]
Key Components of the Statistical Paradigm

Statistics is about how best to collect data to learn something valuable about the real world (from populations to samples: Lessons 1 to Lesson 3).  Statistics is about drawing out the salient features in the data collected (describing and comparing samples: Lesson 3 to Lesson 6).  Statistics is about learning some fundamental principles and procedures that will help you make intelligent decisions in everyday life when faced with uncertainty (understanding random chance for inference and making conclusions: Lesson 7 to Lesson 11).  We will revisit the figure above throughout this course to continually remind ourselves where we are in the "big picture" of the statistical paradigm.

 

Objectives

After successfully completing this lesson, you should be able to:

  • Identify the three conditions needed to conduct a proper study.
  • Apply the seven pitfalls that can be encountered when asking questions in a survey.
  • Distinguish between measurement variables and categorical variables.
  • Distinguish between continuous variables and discrete variables for those that are measurement variables.
  • Distinguish between validity, reliability, and bias.

1.1 - What is Statistics?

1.1 - What is Statistics?

The following examples are meant to illuminate the world of statistics.

Example 1.1: Angry Women

Who are those angry women? (Streitfield, D., 1988 and Wallis, 1987.) In 1987, Shere Hite published a best-selling book called Women and Love: A Cultural Revolution in Progress. This 7-year research project produced a controversial 922-page publication that summarized the results from a survey that was designed to examine how American women feel about their relationships with men. Hite mailed out 100,000 fifteen-page questionnaires to women who were members of a wide variety of organizations across the U.S. These organizations included church, political, volunteer, senior citizen, and counseling groups, among many others. Questionnaires were actually sent to the leader of each organization. The leader was asked to distribute questionnaires to all members. Each questionnaire contained 127 open-ended questions with many parts and follow-ups. Part of Hite's directions read as follows: "Feel free to skip around and answer only those questions you choose." Approximately 4500 questionnaires were returned. Below are a few statements from this 1987 publication.

  • 84% of women are not emotionally satisfied with their relationships
  • 95% of women reported emotional and psychological harassment from their partners
  • 70% of women married 5 years or more are having extramarital affairs

You should notice that this study is an example of a sample survey. The sample is comprised of individuals who actually provided the data while the population is the larger group from which the sample is chosen and whom the sample is to represent. In this example, the population is all American women (although some people may say all American women who have relationships with men), while the sample is the 4500 respondents who returned the questionnaire.

As you might expect, a sample should appropriately represent the population. However, in this instance, the sample does not represent the population because of two problems. The first problem found in this study is that only "joiners" were allowed to be a part of the sample. Even though Shere Hite tries to defend her methods by saying she sampled from a variety of organizations, the fact remains that only people who were involved in some organization had a chance to be in the sample. This problem is an example of selection bias.

The other problem found with this sample is nonresponse bias. Nonresponse bias can occur when a large number of people who are selected for the study elect to not respond to the survey or key questions on the survey. This is clearly evident because the response rate was only 4500/100,000 = 4.5%. (Note: most researchers like response rates to be at least 60% or 70%). Moreover, the directions encouraged the participants to "skip around" and answer only questions that they liked. As you might expect, only people with strong opinions would take the time to answer a questionnaire that contained 127 open-ended questions. In fact, Shere Hite estimated that, on the average, participants took about 4.4 hours to answer the questionnaire. Also the use of the group leader to distribute the questionnaires meant that there was a gatekeeper who had power to affect both who responded and the response rate for each organization.

So with Example 1.1, the overall conclusion is that even though the sample size is quite large, the sample does not adequately represent the population. Unfortunately, the results from this 7-year study are of little value.

Example 1.2: Pets and Marriage

Male and female with three dogs

Does owning a pet lead to less marital problems? (Rubin, 1998)

Karen Allen, a researcher at the University of Buffalo, conducted a study to determine whether or not couples who own cats or dogs have more satisfying marriages and experience less stress than couples who don't own pets. Allen compared 50 pet-owning couples with 50 pet-free couples. The volunteers completed a standard questionnaire that assessed both their relationships and attachments to pets. Each couple also kept track of their social contacts for two weeks. Allen examined stress levels by monitoring the heart rates and blood pressure readings while couples discussed sensitive topics. Pet-owning couples not only started out with lower heart rates and lower blood pressure readings but also had smaller increases in heart rates and blood pressure readings when they quarreled.

The study described above is an example of a comparative study. In this instance, the couples who owned pets are compared with couples who do not own pets as shown in Figure 1.1:

Married Couples
Married Couples
50 Married Couples without Pets
50 Married Couples without Pets
50 Married Couples with Pets
50 Married Couples with Pets
Stress Levels
Stress Levels
Marital Satisfaction
Marital Satisfaction
Associations
Associations
Sample
[Not supported by viewer]
Compare
[Not supported by viewer]
Measure
[Not supported by viewer]
Analysis & Conclusion
[Not supported by viewer]
Figure 1.1. Illustration of a Comparative Study

A comparative study can either be an observational study or an experiment. Observational studies collect data on participants in their naturally occurring settings/groupings, while with experiments, the participants are assigned to the groups being compared by the researcher. In a randomized experiment,this assignment is made using a chance mechanism (like flipping a coin).

The study found in this example is an observational study because participants are observed in their naturally occurring groupings as either a pet-owning or pet-free couple. It would be difficult to conduct a randomized experiment because the researcher would have to randomly assign couples to either have or not have pets. It is not ethical to impose pet-ownership on the couples, nor would it necessarily be good for the pet. So what are the statistical differences between observational studies and experiments? With a randomized experiment, appropriate evidence can support cause and effect conclusions (of course other conditions must also be met - such as demonstrating that differences between groups are not just the result of natural variability). On the other hand, observational studies can only describe associations - they cannot demonstrate a cause-and-effect relationship.

In this study, one cannot say that owning pets causes married couples to have less stress and more satisfying marriages because randomization was not used to cancel out other factors that may affect stress level and marital satisfaction. Factors such as income, number of hours spent working, where you live (i.e., suburbs versus inner city), whether or not there are children, etc., may also be responsible for changes in stress level and marital satisfaction. We will never know because the study is not a randomized experiment.

The researcher correctly stated the conclusion by indicating that there was a difference in the two groups when considering stress level and marital satisfaction. Appropriately, no "cause and effect" language was used. However, it is not uncommon for people who have no statistical background to incorrectly infer a "cause and effect" conclusion from observational studies. So as you examine other studies that are found in the daily news, first determine if the study is an experiment or an observational study. Next, decide if the conclusions are appropriate for the type of study that was conducted.

Example 1.3: Weights of BALB/c mice

image of white mouse.

BALB/c mice are a strain of albino mice commonly used in experiments conducted in cancer research laboratories. A sample of five male and five female eight-week-old BALB/c mice gave the following weights in grams:

Mice Weights in Grams
Male mice weights (g): 21 16 23 25 19
Female mice weights (g): 22 14 17 20 16

Of course, we know that human males tend to weigh more than females, but what about this strain of mice?  Is there enough evidence here to convince you that there is a difference in the weights of male and female BALB/c mice? Figure 2.2 gives a comparative dot plot of the data.

boxplot of the weights of BALB/c mice - sample size 5

Figure 1.2. Dot plot of the weights of BALB/c mice by gender, sample size = 5

As you can see the groups are overlapping and it’s difficult to tell if the males or females are typically heavier in this case. We really don’t have strong evidence one way or the other. Figure 1.3 gives a dot plot for samples of 15 male and 15 female BALB/c mice at the same age.

boxplot of the weights of BALB/c mice - sample size 15

Figure 1.3. Dot plot of the weights of BALB/c mice by gender, sample size = 15

As you can see, it is much easier to distinguish between the two groups with the larger sample size. There is some variability in the weights from mouse-to-mouse but, by looking at the larger sample size, it becomes clear that the males weigh more on average. The sample size is an important factor to consider when trying to detect differences between groups since overall patterns take a more definitive shape when you have more data.

A summary of the lessons learned so far ... is that in order to conduct a proper study, one must:

  • Get a representative sample
  • Get a large enough sample
  • Decide whether the study should be an observational study or an experiment

1.2 - Asking Research Questions

1.2 - Asking Research Questions

Overview

Suppose you desire to do a study or administer a survey. As an investigator, the most challenging task that you will confront is to decide what questions to ask and/or what measurements to obtain. In this lesson, you will be introduced to some key concepts associated with obtaining measurements. You will also learn about possible pitfalls found with survey questions.

It's All in the Wording

There are many possible pitfalls that can lead to bias when asking questions in a survey or study. In thinking about the veracity of results from a sample survey ask yourself if any of these pitfalls are likely to be a problem. It is also possible that more than one type of pitfall can happen at the same time. Examine the following examples.

Example 1.4 Deliberate Bias such as One-Sided Statements

People who use a form of deliberate bias often desire to gather support for a specific cause or opinion. Consider two different wordings for a particular question:

Wording 1: It is hard for today's college graduates to have a bright future with the way things are today in the world.

  1. agree
  2. disagree

Wording 2: Today's college graduates will have a bright future.

  1. agree
  2. disagree

Although Wording 1 and Wording 2 are contradictory statements, when both questions are used in the same survey, it is not uncommon to find that people answer "agree" to both questions. This is because respondents tend to agree to one-sided statements. Listed below are revised wordings for these two questions. These choices are preferred because the statements are now at least two-sided.

Revised Wording 1: Do you agree or disagree that it is hard for today's college graduates to have a bright future with the way things are today in the world?

Revised Wording 2: Do you agree or disagree that today's college graduates will have a bright future?

Example 1.5. Filtering

Consider two different choices of answers for a particular question:

Choice 1: What is your opinion of our current President?

  1. favorable
  2. unfavorable

Choice 2: What is your opinion of our current President?

  1. favorable
  2. unfavorable
  3. undecided

This example illustrates the problem of "filtering." Filtering exists when certain choices such as "undecided" or "don't know" are not included in the list of possible answers. People tend to provide an answer of "undecided" or "don't know" only when these choices are included in the list of possible answers.

Example 1.6. Importance of Order

Consider two different wordings for a particular question:

Wording 1: Pick a color: red or blue?

Wording 2: Pick a color: blue or red?

The results in Table 1.1 are from a study conducted in a Statistics class. As you can see the results vary somewhat based on the order in which the colors are presented. Even though many people probably have a preference for one color over the other, if order does not matter, the percents should be same with each wording.

Table 1.1. Bias due to Order of Comparisons

Color Choice Wording 1 Wording 2
Red 59% 45%
Blue 41% 55%

Example 1.7. Anchoring

Consider two different wordings for a particular question:

Wording 1: Knowing that the population of the U.S. is 316 million, what is the population of Canada?

Wording 2: Knowing that the population of Australia is 23 million, what is the population of Canada?

This survey was conducted in Stat 100 classes where both wordings of the question were randomly distributed.  The students did not know that there were two versions of this question so each only answered the question that they received. The results of this survey are found in Figure 1.4.

The stacked dotplots show the answers to two questions about Canada's population in the survey. Most data points for the wording #1 are located at the right part of the graph, and most points for the wording #2 at the left part.

Figure 1.4. STAT 100 Survey Results

As you can see, the students were influenced by the wording of the question that they were asked to answer. People's perceptions can be severely distorted when they are provided with a reference point or an anchor. People tend to say close to the anchor because of either having limited knowledge about the topic or being distracted by the anchor. You should also consider the following three points:

  • The sample sizes were large enough to detect a difference between the two groups
  • Canada's population is about 35 million
  • The anchor might be less distracting if the following wording were used: "What is the population of Canada when knowing that the population of the U.S. is 316 million?" but it is best to leave out the anchoring statement altogether.

Example 1.8. Unintentional Bias

Consider two different wordings for a particular question:

Wording 1: Do you favor or oppose an ordinance that forbids surveillance cameras to be placed on Beaver Avenue?

Wording 2: Do you favor or oppose an ordinance that does not allow surveillance cameras to be placed on Beaver Avenue?

People will tend to answer "oppose" or "no" to a question that contains words such as forbid, control, ban, outlaw, and restraint regardless of what question is actually being asked. People do not like to be told that they can't do something. So the responses to the two questions would not provide similar results. Wording 2 would be preferred over Wording 1.

Example 1.9. Unnecessary Complexity ("Double-Barreled" Problem)

Consider the following question.

Question: Do you think that health care workers and military personnel should be the first to receive the smallpox vaccination?

The problem with this question is that the respondent must consider both health care workers and military personnel at the same time. The following rewording is much better.

Revised Question: Who should have priority in receiving the smallpox vaccination?

  1. health care workers
  2. military personnel
  3. both health care workers and military personnel
  4. neither

Example 1.10. Asking the Uninformed and Unnecessary Complexity (Double Negative Problem and List Problem)

Consider the following question:

Question: Do you agree or disagree that children who have a Body Mass Index (BMI) at or above the 95th percentile should not be allowed to spend a lot of time watching television, playing computer games, and listening to music?

The first concern with this question is that many people may not clearly understand what the Body Mass Index (BMI) represents. BMI is a measure that is used to identify obesity and is calculated by dividing a person's weight (in kilograms) by the square of their height (in meters). (Note: many Web sites have BMI calculators.) In children and adolescents, obesity is defined as a BMI for age and gender at or above the 95th percentile. This definition should be included prior to the listing of the question on a survey.

This question can also cause problems because of a possible "double negative". Specifically, the problem is with the "disagree" choice. This choice produces a double negative because "disagree" and "should not" are both in the statement. Many respondents will not understand what they are really saying. (It is easy to make the mistake of the double negative).

Revised Question-First Revision: Do you agree or disagree that children who have a Body Mass Index (BMI) at or above the 95th percentile should spend less time watching television, playing computer games, and listening to music?

As you examine this revised question you should also note that there still is a list of three choices embedded in the questions. Respondents sometimes can get hung up on the list of choices (see the "double-barreled" problem above). For example, they may feel that watching television is a bad idea for obese children but listening to music is not.


1.3 - Defining a Common Language

1.3 - Defining a Common Language

In the previous examples, we mostly considered problems associated with questions that measure opinion.  We need to discern what we want to measure and how we want to measure it in a wider variety of circumstances.  We must also think carefully about the properties of the measurements we gather.

Data is a collection of a number of pieces of information. Each specific piece of information is called an observation.  The observations are measurements of certain characteristics which we call "variables".   The word "variable" is used because the pieces of information, the observations, vary from one person to the next.

OBSERVED CHARACTERISTICS NOMINAL VARIABLES (variables) e.g., hair color ORDINAL VARIABLES DISCRETE VARIABLES e.g., agreement with a tax proposal CONTINUOUS VARIABLES agree/neutral/disagree CATEGORICAL VARIABLES MEASUREMENT VARIABLES(quantitative)

Figure 1.5: Types of Data

Example 1.11 Variables

Consider the following variables:

Table 1.2: Classification of Variables

Number Variable Type of Variable
1 Which are you? Near-sighted, far-sighted, neither Categorical
2 What is your height? Measurement and Continuous
3 How many phone calls did you make yesterday on a cell phone? Measurement and Discrete
4 What is your cholesterol level? Measurement and Continuous

Hopefully, you find the classification of the first three variables easy to understand.

Variable #1 is a categorical variable because the possible choices are "words" or "categories."

Variable #2 is a measurement variable because the possible choices are "numbers." This variable is also called a continuous variable because it can assume a range of values on a continuum.  You need an instrument, such as a tape measure or a ruler, to determine height. With measurement variables that are continuous, it is often necessary to use an instrument to determine the value of the variable.  Measurement variables that are continuous can be subdivided into fractional parts (subdivided into smaller and smaller units of measurement).  Typically, a continuous measurement variable is expressed as "an amount of " something.

Variable #3 is a measurement variable because the possible choices are numbers. It is also a discrete variable because one can simply count the number of phone calls made on a cell phone in any given day. The possible numbers are only integers such as 0, 1, 2, ... , 50, etc. (Some of you probably make a lot of cell phone calls.)   Discrete measurement variables cannot be subdivided into smaller and smaller fractional parts (smaller and smaller units of measurement).  Often, a discrete measurement variable is expressed as "a number of " something.

Variable #4 is somewhat ambiguous. Obviously what the variable is measuring (cholesterol levels) can be expressed on a continuum of possible values - but subjects are likely to round off or only know their levels as a discrete value.  Cholesterol levels must be determined by a blood test where an instrument is used to determine the final value. The reported value represents the concentration of cholesterol in the blood. The appropriate units are milligrams per deciliter (mg/dL). What typically happens is that the value of the cholesterol level is rounded to the nearest whole number. Consequently, the cholesterol level might look like a discrete variable - but the raw values are continuous and, since the amount of "discreteness" is not great, a variable like this would be treated the same way as a continuous variable in any analyses.

Example 1.12 Best Way to Determine Heart Rate

Sinus Rhythm

Consider an experiment where heart rate (heart beats/minute) is measured by three different methods. Let's consider three different methods to determine heart rate.

Method 1: Count heart beats for 6 seconds & multiply by 10 to get heartbeats/minute

Method 2: Count heart beats for 30 seconds & multiply by 2 to get heartbeats/minute

Method 3: Count heart beats for 60 seconds

We collected six measurements on an individual for each of the three methods. These results are found in Table 1.3.

Table 1.3: Results from the Heart Rate Experiment

Method Six Results Heart Rate (HeartBeats/Minute) Minimum and Maximum Heart Rate Average Heart Rate
1 7, 7, 7, 7, 7, 7 70, 70, 70, 70, 70, 70 70, 70 70
2 36, 35, 37, 38, 37, 37 72, 70, 74, 76, 74, 74 70, 76 73
3 73, 76, 74, 75, 74, 75 73, 76, 74, 75, 74, 75 73, 76 74.5

In this example, we will not explore whether or not heart rate is a valid measure of overall health and fitness. Obviously, it does provide some information about whether or not a person may have some health problems. But by itself, it usually does not provide a complete picture. The questions that we pose now are the following:

Question 1: Which method is the most reliable?

Question 2: Which method is the most biased?

What may surprise you is that the answer to both questions is method 1. Method 1 is the most reliable because every time we took the measurement we observed 7 beats in 6 seconds. The results are consistent. Results from method 1 are also the most biased because it consistently underestimates the individual's true heart rate.  If you look at the results from method 3, which is really the best method to determine heart rate, you find that the individual's average heart rate is 74.5 beats/minute. The results from method 1 always fell below this value. What this means is that even though method 1 is reliable, it still can have other problems, which in this case, is biasedness.

Example 1.13 Bias versus Reliability

Suppose you are interested in knowing whether the average price of homes in a certain county had gone up or down this year in comparison with last year. Would you be more interested in having a measure with low bias or a reliable measure of sales?

Ideally you would like the measure to be both unbiased and reliable. However, a reliable measure that is biased, can still often provide some meaningful information. Since the goal is to make a comparison of the average price of homes over two years, the measure must be reliable. So, even if the measure is biased, the amount of change from one year to the next may be sufficient information to make a comparison.


1.4 - A World Without Statistics!

1.4 - A World Without Statistics!

Example 1.14: Sample Surveys

Gallup and Miller

The following essays illustrate the importance of statistics in research and everyday life. To do this we ponder the question of what would things be like in A World Without Statistics?

During the 1920’s Ola Babcock Miller was active in the women’s suffrage movement and later became famous in Iowa for starting the State Highway Patrol in her first of three terms as Iowa’s Secretary of State. Her election in 1932 came as quite a surprise to the political pundits of the day as she became the first woman, and the first Democrat since the Civil War, to hold statewide political office in Iowa. However, her election did not surprise her son-in-law, George Gallup, who predicted her victory using the first scientifically sampled election poll ever. Three years later Gallup founded the American Institute of Public Opinion and became well-known for correctly predicting that Franklin Roosevelt would defeat Alf Landon in the 1936 Presidential election – in contrast to the predictions made by Literary Digest magazine based on a non-randomly gathered convenience sample more than 40 times as large.

The use of randomly generated samples pioneered by George Gallup is now the staple of hundreds of organizations that try to determine the outcome of elections before they happen. These organizations face severe challenges caused by the difficulty of reaching portions of the population, a response rate of less than 10% amongst those who are reachable, and the unknown demographic composition of the voters in an election yet to occur. New technological advances in reaching potential respondents and improved statistical modeling have helped to address some of these issues – but the level of bias in an individual election poll is typically on the same order as the random error. Luckily, the diversity of the methodology across many pollsters produces industry averages that still have very small errors in predicting the dozens of major races polled in each election cycle. Without Statistics we would have a poor understanding of public opinion … and without statistics, we wouldn’t know who won the election.

election cartoon - without statistics we wouldn't know who won the election!

Example 1.15: Search Engines

The first Internet search engine was created in 1990 at McGill University in Canada and attempted to create a searchable census of all of the files on FTP sites at that time. But, as the size of the Web grew exponentially, it soon became apparent that a method of sampling the Internet would be needed to produce the indexes required for searching. A solution was soon developed where sampling is done by a web crawler or “spider” – software that takes information from a web page and all of the pages that it links to and all of the pages that they link to and so on.

Statistical issues then arise in the indexing component of the process. What variables should be saved? For example, one variable in these indexes examines the number of inbound links to a page weighted by measures of the quality of the sites that link to the page in question. Interestingly, this is proportional to an estimate of the equilibrium probability of landing on a given page after a large number of clicks in a Markov model of Internet browsing. Next, what data structures and index size make for the quickest computation without sacrificing relevance? Even the large index maintained by Google, which is more than a hundred Petabytes in size, holds just a small fraction of the estimated 30 trillion pages on the Internet.

Finally, searching algorithms must produce results in a split second. Results are rank orders of websites in the index that should be strongly related to the probability that the site is relevant to user intent and needs. Models for predicting relevance are constantly updated – partly to ensure that website owners improve their sites using best practices and not simply to artificially match with search ranking variables. Current models are based on several hundred variables continuously examined using variable selection and model building experiments against user responses to search engine results. Do users click more often on the highest ranked items? Do they stay longer on the sites they go to?

Thus, statistical issues are addressed in the sampling (web crawling), indexing, and ranking phases of the operation of a good search engine. Without statistics, we would have to search the Internet one site at a time.

Search Engine cartoon - Without statistics we would have to search the internet one site at a time!

Example 1.16: Weather Forecasting

William Ernest Cooke

When astronomer William Ernest Cooke was hired to run the new Perth Observatory at the end of the nineteenth century, he was also charged with providing weather forecasts for the surrounding areas in Western Australia. But Mr. Cooke was not satisfied with merely presenting his best forecast - he also wanted to provide the public with a measure of its uncertainty. So, in 1905 his team began attaching uncertainty values on a five-point scale to their twice-daily forecasts. Indeed, the forecasts he gave the highest rating to turned out to be correct 98.5% of the time, while those given a rating in one of the two most uncertain categories were only correct for 56.5% of the forecasts. The century following these pioneering probability-based weather forecasts have brought vast improvements due to the same themes that improve the quality of any statistical enterprise:

  • Systematically Collected Data - remote sensing devices, including satellites, now automatically transfer data with nearly universal coverage of the planet;
  • Sound Statistical Models - now informed by a scientific understanding of the relationships amongst dozens of measured variables;
  • Computational Efficiency - weather forecasts and “cloud” computing make a perfect pair; and
  • Statistical Reporting - forecasts now routinely carry a prediction, and an assessment of the uncertainty over time and space, of multiple aspects of the weather.

If it wasn’t for statistics we wouldn’t know the probability of rain tomorrow and we wouldn’t know how to dress for the weather today.

weather cartoon - without statistics we wouldn't know how to dress for the weather!

Example 1.17: Animal Models in Medicine

Claude Bernaud

In his 1865 book An Introduction to the Study of Experimental Medicine Claude Bernard, the French scientist often called the father of modern medicine, argued against the use of “statistics” in medicine. Interestingly, he was really arguing against poor statistical practice at the time in observational clinical studies and for using the scientific method in laboratory investigations based on many ideas that are statistical in nature. He argued against the misuse of statistics that display only averages without understanding the sources of variability in the data. He argued that causative claims emerge more readily from experiments than from observation. He argued that experiments should have an underpinning of clear hypotheses that can be demonstrated or negated. He described how to eliminate sources of bias and was the first to suggest the use of blind experiments to foster objectivity. Claude Bernard often turned to the use of animals, especially mice, in experiments as a model of human physiology. Although mouse models are not appropriate for all human conditions, they have been very fruitful in investigating the mechanisms underlying many disease processes – especially those in cancer. Examples of such experimental systems that today provide great insight into the biology and genetics of human cancers include:

  • purebred mice that lack immune systems;
  • animals with a specific genetic aberration underlying a disease;
  • mice that can have their genome manipulated to remove a specific cancer fighting mechanism; and
  • mice that are amenable to transplantation of a human tumor.

Statistical ideas in handling variation are at the heart of all of these murine experiments. Statistics allows us to quantify the variability in measurements to decide on the scope of an experiment; to reduce the variability in designs through appropriate controls; to examine the variability in analyses though statistical modeling; and to precisely state the inferential conclusions that arise.

Without statistics pre-clinical medical science would be less efficient and more subject to ambiguous interpretation and without statistics lab mice would have nothing to do.

Mice cartoon - without statistics lab mice would have nothing to do!

Example 1.18: Process Control

W.E. Deming

In 1950 the Japanese Union of Scientists and Engineers (JUSE), hosted W. Edwards Deming in a series of workshops on statistical process control to Japanese engineers and upper-level managers. The workshops focused on improving business processes and reducing the variation in results. They presented the endless feedback loop of “Plan-Do-Check-Act” for improvement pioneered by Deming’s mentor Walter Shewart:

  • Plan: Establish objectives and design/revise business processes to improve results
  • Do: Implement the plan and systematically collect data
  • Check: Analyze the data, especially differences from planned implementation and expected results to identify weaknesses in the plan
  • Act: Determine root causes for variations from objectives and make changes to improve the process. Restart the cycle for continual improvement

Importantly, Deming’s workshops went beyond Shewart’s methods for improving production processes and providing technical details to staff. He also focused on management processes and insisted that his audiences include executives in position to make decisions. The results were stunning and led to the worldwide success of companies such as Sony and Toyota that quickly became known for the reliability of their products and the resulting loyalty of their customers.

Deming continued to refine his ideas and synthesized them in his famous 14 points of quality management presented in his 1982 book Out of the Crisis. Adaptations of Dr. Deming’s integrated approach have had a prodigious influence on businesses worldwide.

Without statistics we wouldn’t know that fixing things that are broken is less efficient than avoiding broken things – and automobile warranties just wouldn’t be the same.

Process cartoon - without statistics automobile warranties just wouldn't be the same!

Example 1.19: Astronomy

Hipparchus

Astronomy is perhaps the oldest science and the first to systematically collect data for analysis. For example, the ancient Greek astronomer and mathematician Hipparchus noticed the scatter in Babylonian measurements of the length of the year and wrote about the general problem of combining data to quantify a phenomenon – deciding on the middle of the range. Problems arising from the analysis of astronomical problems continued to fuel the development of statistical methods for centuries – including Legendre’s suggestion of the Least Squares approach and Gauss’ presentation of the normal distribution in their studies of the orbits of comets and planets. In the 20th century, astronomy turned toward physics for insights but important advances in the study of stochastic processes led to new discoveries about the clustering of galaxies and how they are distributed in the universe.

Today, the field of astrostatistics studies important problems like estimating how many Earth-like planets there might be in our Galaxy. It is very hard to find planets orbiting other stars, but since 1995 several thousand have been found by virtue of their tiny effects on the host stars (e.g. Doppler shifts as the planet orbits, or 0.01% diminution of light as it transits in front of the star). A major goal is estimating the parameter dubbed eta-Earth, the fraction of stars with Earth-like planets in Earth-like orbits. The problem is sensitivity to survey bias: it is easier to detect bigger and more massive planets the size of Jupiter or Neptune, and planets closer to the star (within the orbit of Mercury). Because of this, virtually no planets are known with Earth-size and Earth-orbit ... but we can use statistical models and methods to extrapolate from the bigger surveys. Several researchers have done this in the last few years, and perhaps the most important just emerged at the end of 2013 based on data from the National Aeronautics and Space Administration’s Kepler mission. The results are converging: about 6%±2% of Sun-like stars have Earth-like planets that may be habitable - not too close/hot to the star, not too far/cold from the star. That means there are billions of “Earths” in the Galaxy with the closest probably only about 10 light-years away. Thus, without statistics, we wouldn’t know that the Universe is full of Earth-like planets and we wouldn’t know where the planets are.

Astronomy Cartoon - Without statistics we wouldn't know where the planets are!

Example 1.20: Epidemiology and the affects of Smoking

Bradford Hill

At the end of World War II, Bradford Hill and Richard Doll in the Statistical Research Unit of the Medical Research Council decided to study the alarming increase in cases of lung cancer that was occurring in Great Britain. The two main causes for the increase being investigated were increased air pollution and increased use of tar in paving roads. They interviewed lung cancer patients in 20 London hospitals and patients in the same hospitals with a different diagnosis.

The results convinced Richard Doll to quit smoking – but others were not convinced so Doll and Hill sent letters to 59,000 medical doctors asking about their smoking habits and if they would agree to be followed for health effects over time. About 40,5000 replied and by 1954. The results convinced Bradford Hill to quit smoking.

These landmark observational studies and others in the 1950’s showed that all the hallmarks of causation were all there:

  • Time course
  • Dose-effect
  • Strong relationship
  • Unlikely to occur by chance

Today the biological mechanisms underlying the disease-causing effects of smoking have been thoroughly studied in statistically designed and evaluated laboratory experiments, genetic abnormalities, micro-RNA switches that change the behavior of genes, and even differences in the types of bacteria that inhabit a smoker’s body. But without statistics, the strength of the evidence could not be well evaluated and we wouldn’t know that smoking was bad for you.

Gym Cartoon - Without statistics we wouldn't know that smoking was bad for you!

Example 1.21: Unemployment Data

WPA Poster

Between 1880 and 1935, census takers in several countries began asking questions about employment and unemployment to help their nations’ economic planning. Census enumerators were told to define a person as unemployed if they had generally been gainfully employed previously but were not currently working. Unfortunately, this led to many difficulties such as pinpointing when an individual left the work force voluntarily for retirement or further training. The “previous gainful employment” definition of the workforce also underestimated the true unemployment rate since those who had never held a job, but wanted one, were uncounted.

However, a group of U.S. statisticians working in the Division of Social Research of the Works Progress Administration (WPA) under direction of John Nye Webb had a better idea. They reasoned that a well constructed survey would provide more accurate results and could be applied more often than a costly census. Secondly, they developed an operational definition of the unemployed as people who were not working in the previous week but were searching for work. They tested their ideas in a pioneering random sample in 1937 – the first scientifically constructed national sample conducted by the Census Bureau and the first to include confidence intervals in the subsequent reporting of results. The WPA “searching for work” definition of the unemployed performed well and later became the international standard after adoption by the United Nations’ International Labour Organization in 1954.

Without statistics governments wouldn’t understand key facts about their economies – and people wouldn’t know when they are unemployed.

Work Cartoon - Without statistics people wouldn't know when they are unemployed!

Example 1.22: Computer Controlled Devices

Elevator moving to upper floor

The first computer controlled machines were created soon after World War II with milling tools developed as an early application of the new technology. Today, an endless list of devices are run by computer software from cars to appliances to telephones to airplanes to traffic lights to medical instruments to elevators – using statistical algorithms to guide decision making at their core. Designing an algorithm that operates a bank of elevators optimally involves knowing the arrival times of people coming to operate them along with the matrix of probabilities that the next operator will request a ride from any particular floor to another at that time of day. Optimality criteria then involve making the wait time probability distribution for operators and the energy usage or runtime distribution for the elevator as small as possible (e.g. small expected value; small probability of going above some threshold; etc…). As the data necessary to fully know this matrix of probabilities is unavailable at the time of installation – methods that include a “learning” component, such as those applying Bayesian updates are a recent approach. Without the statistical modeling and computational advances that underlie these algorithms, wait times for elevators and their energy consumption would increase. Without statistics we’d have to take the stairs.

Control Cartoon - Without statistics we'd have to take the stairs!

Example 1.23: The Census

Statistics has helped to maintain the United States’ representative form of government. According to the U. S. Constitution (1787), the number of representatives assigned to a particular state is in proportion to its population count determined by a census that is conducted every ten years. The United Nations recommends that ten-year interval between censuses as a minimum for all member states. That standard has thus been adopted by most nations with only a few having a census more often (Australia, Canada, Japan, and New Zealand use a five-year interval).

Each decennial census is an attempt to provide complete counts of all people (including limited demographics such as name, age, date of birth, gender, race/ethnicity, household relationships) and all habitable dwellings in the country using statistical methods. In the example of the United States 2010 Census, the data is being used:

  1. To apportion the 435 seats in the U. S. House of Representatives
  2. To help draw boundaries to meet local, state, and federal requirements for representation
  3. To assist in the allocation of hundreds of billions of dollars per year in state and federal funding to local, state, and tribal governments
  4. To plan economic development and assess the need for schools, hospitals, job training, etc.
  5. To plan communities and to predict future needs
  6. To plan the location of roads and public facilities
  7. To analyze social and economic trends
  8. To plan and evaluate government programs and policies
  9. To help meet many local, state, and federal legal requirements

With decennial censuses serving as anchors, nations may also conduct more detailed sample surveys much more frequently (monthly, quarterly, and annually), to attempt to collect information from carefully selected (usually using probability) subsets of all the people in order to estimate specific characteristics using statistical methods.

Without statistics, any nation would lack vital information about its people - not knowing:

  1. Who they are
  2. How many they are
  3. What they do
  4. How they live
  5. Where the people are

Census Cartoon - Without statistics we wouldn't know where the people are!


1.5 - Test Yourself!

1.5 - Test Yourself!

Think About It!

Select the answer you think is correct - then click the 'Check' button to see how you did.

Click the right arrow to proceed to the next question. When you have completed all of the questions you will see how many you got right and the correct answers.


1.6 - Have Fun With It!

1.6 - Have Fun With It!

Have Fun With It!

cartoon about bias and reliability, "I invented a device that quantifies the validity, bias and reliability of any instrucment.  Now I am trying to figure out how it works!"

J.B. Landers ©

 
Stats Can Be Cool You See

Lyrics © by Michael Posner
May be sung to the tune of "Cooler Than Me" (Mike Posner)

If I could take me a course,
that I would really love,
I would've already chosen stats just because,
It teaches skills for life,
that everybody needs,
and you probably know,
Stats can be cool you see.


There's data everywhere,
that can hide or reveal,
you've got to understand them,
stats can be cool you see
You design and collect,
and then you analyze,
it's probably cause,
stats can be cool you see
 
You got to work hard,
and think on your feet,
with confounders abound,
masking relationships.
And you gotta know,
when samples you pick,
you...do...it...randomly.
 
I got it,
all figured out,
you gotta take this course and you'll seem smart.
Behind that black box
nobody knows...what they do
But now you do.
 
If I could take me a course,
that I would really love,
I would've already chosen stats just because,
It teaches skills for life,
that everybody needs,
and you probably know,
Stats can be cool you see.
 
The population mean,
is what you'll estimate,
but it can't be observed,
it's a mystery.
But say with confidence,
it's within margin of E,
of the sample mean,
stats can be cool you see.
 
You test hypotheses,
when inference you need,
to get your p-value and make decisions.
But you don't know
if the error you'll make,
Is...type...I...or...II.
 
I got it,
all figured out,
you need to know the assumptions of each test.
When you got that,
A statistician... is who you are
I know you are.
 
'Cause it sure seems
(it sure seems)
Data's all around
(all around)
with statistics
(statistics)
You got this all figured out
(out, out, out)
 
If I could take me a course,
that I would really love,
I would've already chosen stats just because,
It teaches skills for life,
that everybody needs,
and you probably know,
Stats can be cool you see.
 
You design your graphs,
just to visualize,
and you show them around
stats can be cool you see.
Correlation, Causality,
You remember their names,
its probably cuz,
stats can be cool you see.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility