2: Explaining Variability

2: Explaining Variability

Overview

 Case-Study: Risk of Heart Attack

As a health care professional, Susie is amazed at the large discussion about the value of taking low dose aspirin to decrease the risk of a heart attack. Recently, she read an article about taking low dose aspirin to prevent heart attacks. The article discussed stated that the variability between the two groups (taking low dose aspirin and not taking low dose aspirin) was significant, however, the calculation of an effect size actually demonstrated that this “significant difference” did not mean that the effect of taking the aspirin was meaningful. Thus, the effect of taking aspirin was so small that, given the possibility of negative side effects, it was better removed for most patients. Susie decided to read more about variability and effect sizes to help her understand that simply stating the statistical results may be misleading, and unethical, given the actual effect size may lead to a different conclusion.

Taking a step back from the low dose aspirin, let’s think about the incidence of heart attacks. There are some people who have a heart attack despite leading healthy lives, then there are other people we might think would have a heart attack and do not. The heart (pun intended) of statistics is to understand real-world phenomena, in this example, why some people get heart attacks and some do not. To put this more technically, we seek to understand the variability in terms of heart attacks. If everyone suffered a heart attack there would not be anything to explain! We would already know that everyone will get a heart attack, end of story and there is no need for any statistics! In statistical terms, we could say there is no variability in the incidence of a heart attack, or it is a constant. But our world is full of variability and curious researchers and people in the field are constantly trying to explain it. So this unit will take a look at variability and what it means. The importance of this unit cannot be understated. This is literally the pillar of everything else we will do in this course. So let’s get started!

Objectives

Upon completion of this lesson, you should be able to:

  • Identify variability in data
  • Differentiate within group variability (error) and between group variability
  • Identify small, medium, and large effect sizes
  • Be able to correctly interpret a mean, median, and standard deviation from statistical output
  • Identify the correct percentiles and standard deviations for the Empirical Rule
  • Correctly calculate and interpret a z score

2.1 - Measures of Spread

2.1 - Measures of Spread

Variance and standard deviation are measures of variability. The most common way we measure variability is by using the standard deviation.  However, when working with standard deviations we must first make sure that our data are normally distributed (otherwise we need to modify the way we look at distributions).  We most often think of distributions being "normal" when they look like the classical "bell curve" of quantitative data.  However it is very important for you to understand the idea of normality will apply to categorical data as well, although normality will be assessed in slightly different ways.  

But let's first take a quick look at standard deviations and variability.  We will take a look at how we calculate these by hand.  Subsequently we will discuss the assessment of normality for both quantitative and categorical data. 

The standard deviation is actually the square root of the variance. But wait a minute.. what is the variance? 

Don't get discouraged, we will walk through these calculations for you, and Susie, computing these values by hand. After this lesson, you will always be computing standard deviation using software such as Minitab Express.

Let's start step by step: 

  1. Step 1: Compute the sample mean (which we have already done):
  2. \(\overline{x} = \dfrac{\sum x}{n}\)
  3. Step 2:Calculate the deviance (or how far away each individual observation is away from the mean) by subtracting the sample mean from each individual value:
  4. \(x-\overline{x}\), these are the deviations
  5. Step 3:For each deviation, multiply it by itself (in other words, you are squaring the deviation):
  6. \((x-\overline{x})^{2}\), these are the squared deviations
  7. NOTE: This is an important step.  If you are curious you can skip step 3 and jump right to step 4.  What you will find is if you add up all of the deviation scores from step 2, the sum will always be equal to zero!  If you like algebra you can play with the formula for the mean and the total deviation scores and see that they are equivalent!
  8. Step 4: Find the sum of squares by summing of all of the squared deviations from Step 3:
  9. \(\sum (x-\overline{x})^{2}\), this is the sum of squares
  10. Step 5: Divide the sum of squares by \(n-1\) (where n is the total number of observations):
  11. \(\dfrac{\sum (x-\overline{x})^{2}}{n-1}\), this is the sample variance \((s^{2})\)
  12. Viola!  You have calculated the variance.  That was a lot of math.  Fortunately, you never have to do this by hand (but it is very interesting to see that variance is a simple matter of subtracting, multiplying, adding, and dividing!
  13. But this still hasn't got us to the standard deviation.  So we need to add one last step.  
  14. Step 6: Take the square root of the sample variance:
  15. \(\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}\), this is the sample standard deviation
  16. Viola!  The standard deviation! Just as the definition indicates, this is simply the square root of the variance
  17. NOTE: The reason this is the square root is because of Step 3 above.  Because of the math involved we had to square the deviation scores.  This results in the magnitude of the variance always being much larger than the original units of measurements in our data.  By taking the square root of the variance, the variability of the data can be expressed in the original measurement units by using the standard deviation.

So to keep track of some of the vocabulary introduced:

Deviation
An individual score minus the mean.
Sum of Squared Deviations
Deviations squared and added together. This is also known as the sum of squares or SS.
Sum of Squares
\(SS={\sum \left(x-\overline{x}\right)^{2}}\)
Variance
Approximately the average of all of the squared deviations; for a sample represented as \(s^{2}\)
Standard Deviation
Roughly the average difference between individual data values and the mean. The standard deviation of a sample is denoted as \(s\). The standard deviation of a population is denoted as \(\sigma\).
Sample Standard Deviation
\(s=\sqrt{\dfrac{\sum (x-\overline{x})^{2}}{n-1}}\)

2.2 - The Empirical Rule

2.2 - The Empirical Rule

While the standard deviation gives us a measure of variability (I.e. a variable with a standard deviation of zero has no variability), there are other important ways we can think about the standard deviation.

First, we start with a normal distribution, symmetrical and bell-shaped. 

The Empirical Rule is a rule telling us about where an observation lies in a normal distribution. The Empirical Rule states that approximately 68% of data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean

The normal curve showing the empirical rule.
mean−2s mean−1s mean+1s mean−3s mean+3s mean mean+2s 68% 95% 99.7%

 Example of Empirical Rule

Suppose Susie observes a sample of n = 200 hospitals with a more or less bell-shaped distribution with a sample mean of = 35 heart attacks per month and a standard deviation s = 6.

  • About 68% of the hospitals report heart attack rates in the interval 35 ± 6, which is 29 to 41.

  • About 95% of the hospitals report heart attack rates in the interval 35 ± (2 ×6), which is 23 to 47.

  • About 99.7% of the hospitals report heart attack rates in the interval 35 ± (3 ×6), which is 17 to 53

Z scores

While the Empirical Rule is a great tool for helping us understand the variability of our data and how extreme any one observation is, as in the example calculating the 68, 95 and 99.7 percentiles is somewhat labor intensive. An easier way to calculate where a score falls is to use a standardized or Z score.

We can convert any normal distribution into the standard normal distribution in order to find probability and apply the properties of the standard normal. In order to do this, we use the z-value.

Z-value, Z-score, or Z

The Z-value (or sometimes referred to as Z-score or simply Z) represents the number of standard deviations an observation is from the mean for a set of data. To find the z-score for a particular observation we apply the following formula:

\(Z = \dfrac{(observed\ value\ - mean)}{SD}\)

Let's take a look at the idea of a z-score within context.

For a recent final exam in STAT 800, the mean was 68.55 with a standard deviation of 15.45.

  • If you scored an 80%: \(Z = \dfrac{(80 - 68.55)}{15.45} = 0.74\), which means your score of 80 was 0.74 SD above the mean.

  • If you scored a 60%: \(Z = \dfrac{(60 - 68.55)}{15.45} = -0.55\), which means your score of 60 was 0.55 SD below the mean.

Is it always good to have a positive Z score? It depends on the question. For exams, you would want a positive Z-score (indicates you scored higher than the mean). However, if one was analyzing days of missed work then a negative Z-score would be more appealing as it would indicate the person missed less than the mean number of days.

Characteristics of Z-scores
  • The scores can be positive or negative.

  • For data that is symmetric (i.e. bell-shaped) or nearly symmetric, a common application of Z-scores for identifying potential outliers is for any Z-scores that are beyond ± 3.

  • Maximum possible Z-score for a set of data is \(\dfrac{(n−1)}{\sqrt{n}}\)

From Z-score to Probability and Percentiles

A more frequent application of standard normal curves are expressions of percentiles or probabilities. Because the value of a z score can be aligned with a specific position within a standard normal distribution, it is also possible to find the equivalent percentile for that observations. 

For example, an observation that has a z score of zero would be at the 50th percentile of the data.

We can also use these properties to make statements about probabilities about relationships among observations.  We will use this information extensively as we progress into hypothesis testing in future modules. However in the immediate application, a simple application for Susie illustrates this point. If Susie knows that a hospital has a z score of +4.00, then she can use software or consult a Standard Normal Table to convert this to a probability. For example with a mean of 50 heart attacks and a standard deviation of 10, Susie can calculate the probability of a hospital reporting more than 100 heart attacks is very low (because 100 heart attacks is in the 99th percentile, for a hospital to report more than that would be very UNLIKELY!)

We will not reference the standard normal tables in these notes, and instead rely on software to generate probabilities for us. 


2.4 - Measures of Position

2.4 - Measures of Position

 

Measures of position give a range where a certain percentage of the data fall. We briefly referred to percentiles in the last section, but we will also add quartiles to this overview.

Percentiles
The pth percentile of the data set is a measurement such that after the data are ordered from smallest to largest, at most, p% of the data are at or below this value and at most, (100 - p)% at or above it.

A common application of percentiles is their use in determining passing or failure cutoffs for standardized exams such as the GRE. If you have a 95th percentile score then you are at or above 95% of all test takers.

The median is the value where fifty percent or the data values fall at or below it. Therefore, the median is the 50th percentile.

Normal distribution with the lower 50% shaded to the median.

We can find any percentile we wish. There are two other important percentiles. The 25th percentile, typically denoted, Q1, and the 75th percentile, typically denoted as Q3. Q1 is commonly called the lower quartile and Q3 is commonly called the upper quartile.

Normal distribution with labels on the 25th, 50th and 75th percentiles.

Finding Quartiles

The method we will demonstrate for calculating Q1 and Q3 may differ from the method described in our textbook. The results shown here will always be the same as Minitab's result. The method here is also different than from the method presented in many undergraduate statistics courses. This method is what we require students to use. There are two steps to follow: Find the location of the desired quartile If there are n observations, arranged in increasing order, then the first quartile is at position \(\dfrac{n+1}{4}\), second quartile (i.e. the median) is at position \(\dfrac{2(n+1)}{4}\), and the third quartile is at position \(\dfrac{3(n+1)}{4}\). Find the value in that position for the ordered data. 

Note! If the value found in part 1 is not a whole number, interpolate the value. 

2.5 - 5-Number Summary

2.5 - 5-Number Summary
5-Number Summary
Minimum, \(Q_1\), Median, \(Q_3\), Maximum
\(Q_1\) is the first quartile, this is the 25th percentile
\(Q_3\) is the third quartile, this is the 75th percentile

Five number summaries are used to describe some of the key features of a distribution. Using the values in a five number summary we can also compute the range and interquartile range.

Range

The difference between the maximum and minimum values.

\(Range = Maximum - Minimum\)

Note! The range is heavily influenced by outliers. For this reason, the interquartile range is often preferred because it is resistant to outliers.
Interquartile range (IQR)

The difference between the first and third quartiles.

\(IQR = Q_3 - Q_1\)


2.6 - Identifying outliers: IQR Method

2.6 - Identifying outliers: IQR Method

The IQR Method

Some observations within a set of data may fall outside the general scope of the other observations. Such observations are called outliers. Here, you will learn a more objective method for identifying outliers.

We can use the IQR method of identifying outliers to set up a “fence” outside of Q1 and Q3. Any values that fall outside of this fence are considered outliers. To build this fence we take 1.5 times the IQR and then subtract this value from Q1 and add this value to Q3. This gives us the minimum and maximum fence posts that we compare each observation to. Any observations that are more than 1.5 IQR below Q1 or more than 1.5 IQR above Q3 are considered outliers. This is the method that Minitab Express uses to identify outliers by default.

 Case Study: Heart Attack Risk

Now that we have learned the basics of variability, along with different strategies to represent variability, return back Susie’s article and the heart attack example. If everyone has a heart attack, there is zero variability in the number (or percentages) of people having heart attacks, so there is no need to explain why some people do and do not get heart attacks! If there is variability and not everyone gets a heart attack, then another way to think about the variability is to think about this as error. In other words, the fact that we do NOT know precisely that everyone will get a heart attack means in the absence of any additional information we will automatically have a certain amount of error when predicting the occurrence of a heart attack as compared to when we KNOW everyone will have a heart attack. In this way, variability is the error! This may sound confusing at first, but if you think about this in another context, the bathroom scale, the idea might become clearer. If you have a broken down old bathroom scale you might get on it first thing in the morning. You see your weight and you know it is an old scale so you step off and then step back on, and voila your weight is different. Now you are thinking you have two different values for weight, so you try a third time and voila you have a third value (yes this is a really old scale). If you think about the three weights on the old scale you might want to say your scale has a lot of error, but this is also the variability of the measurement of your weight.

So now that this idea of error and variability are on your radar we can begin to differentiate two kinds of variability, within group and between group. We are not going to do much with between group variability yet, we will cover that later in the course. However, what wehave been discussing so far actually IS within group variability. Our three scale measurements represent the “within group” variability of the measurement of our weight. In our original example, the fact that not everyone will get a heart attack is within group variability (if you want to get more technical you could say that the chance, or probability of getting a heart attack will differ for each person). Now, you might be reading this and saying but almost everyone knows there are certain lifestyle or genetic reasons impacting the chance of a heart attack!! That is exactly the point of statistics and research! Over the years, researchers have done research to help EXPLAIN the variability of heart attacks (by showing the relationship between the lifestyle and genetic factors and heart attacks). If you understand this, then you understand why the idea of within group variability is fundamental to statistics!

Susie’s article pointed out some of the more recent developments around taking low dose aspirin. While taking low dose aspirin has long been accepted to be a sound strategy to prevent heart attacks, as noted, there are now mixed findings that are being reported in the scholarly literature and even the media. One of the reasons, identified in the example used above, is that the actual effect of the aspirin is actually quite low.

Effect Size

How do researchers know this all of the sudden? Weren’t the original findings based on science and statistics? Well, yes the original findings were based on science and statistics, but with a lot more research around low dose aspirin and heart attacks, researchers can now calculate an effect size to determine the actual impact of the aspirin. What is an effect size? The effect size arose from social science research, where researchers wanted to be able to draw conclusions across studies. Effect sizes measure the impact of a “treatment” on an outcome. We will learn later in the course that this is also what many of our statistical tools do, the advantage of effect sizes are that they parse out the impact of the size of the sample. If this does not make complete sense at this point, that is okay, as we progress the sample size issue will become clearer.

Adapted from Coe, R. (2002). It’s the effect size, stupid. What effect size is and why it is important. Annual Conference of the British Educational Research Association. Exeter: England. Sept 12-14, 2002.

Effect Size

Probability that you could guess which group a person was in from knowledge of their ‘score’.

Equivalent correlation, r

0.0

0.50

0.00

0.1

0.52

0.05

0.2

0.54

0.10

0.3

0.56

0.15

0.4

0.58

0.20

0.5

0.60

0.24

0.6

0.62

0.29

0.7

0.64

0.33

0.8

0.66

0.37

0.9

0.67

0.41

1.0

0.69

0.45

1.2

0.73

0.51

1.4

0.76

0.57

1.6

0.79

0.62

1.8

0.82

0.67

2.0

0.84

0.71

2.5

0.89

0.78

3.0

0.93

0.83

The effect size calculation is similar to the z score calculation, the difference is in the interpretation. Instead of relying on probabilities, the effect size uses industry standards (as summarized in the table) related to the probability of belonging in a “treatment” group compared to a “control” group (in our example, this could be the group receiving low-dose aspirin (treatment) and those who do not (control). An effect size of 0 indicates the outcome of receiving the treatment (low-dose aspirin) does not impact the chance of a heart attack any more than chance. In contrast, an effect size of 2, indicates a strong effect, meaning the outcome is “effected” by the treatment and it is now possible to conclude the group membership can be highly probably based on the outcome (with an effect size of 2, you can predict group membership with an 84% probability).

In recent years, prestigious social science publications require the inclusion of effect sizes to distinguish findings that are impactful, beyond statistically significant. Reporting effectsizes also allows other researchers to aggregate across studies, a technique called meta-analysis, which we will not cover in the course, but may be of interest to students in the social sciences.


2.7 - Summary

2.7 - Summary

It is of the utmost important for any student beginning the study of statistics to understand the concept of variability. As any researcher knows, the process of science seeks to answer questions, from this perspective, science seeks to explain variability. The concept of within group variability helps students understand why measures of “spread” are such a critical building block for all future work in statistical study.

 

Now Susie can understand the conclusions around the variability in the low dose aspirin study. She also has enough knowledge of effect sizes to understand that parsing out the impact of sample sizes across many studies revealed that the actual impact of the variability is very small. While the issue of low dose aspirin may not be solved, the blanket acceptance of this as a preventative measure is now looked at through a different lens, thanks largely to the important work of statisticians in our world!


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility