2.1 - Measures of Spread

Variance and standard deviation are measures of variability. The most common way we measure variability is by using the standard deviation. However, when working with standard deviations we must first make sure that our data are normally distributed (otherwise we need to modify the way we look at distributions). We most often think of distributions being "normal" when they look like the classical "bell curve" of quantitative data. However it is very important for you to understand the idea of normality will apply to categorical data as well, although normality will be assessed in slightly different ways.

But let's first take a quick look at standard deviations and variability. We will take a look at how we calculate these by hand. Subsequently we will discuss the assessment of normality for both quantitative and categorical data.

The standard deviation is actually the square root of the variance. But wait a minute.. what is the variance?

Don't get discouraged, we will walk through these calculations for you, and Susie, computing these values by hand. After this lesson, you will always be computing standard deviation using software such as Minitab Express.

Let's start step by step:

Step 1: Compute the sample mean (which we have already done):
\(\overline{x} = \dfrac{\sum x}{n}\)
Step 2:Calculate the deviance (or how far away each individual observation is away from the mean) by subtracting the sample mean from each individual value:
\(x-\overline{x}\), these are the deviations
Step 3:For each deviation, multiply it by itself (in other words, you are squaring the deviation):
\((x-\overline{x})^{2}\), these are the squared deviations
NOTE: This is an important step. If you are curious you can skip step 3 and jump right to step 4. What you will find is if you add up all of the deviation scores from step 2, the sum will always be equal to zero! If you like algebra you can play with the formula for the mean and the total deviation scores and see that they are equivalent!
Step 4: Find the sum of squares by summing of all of the squared deviations from Step 3:
\(\sum (x-\overline{x})^{2}\), this is the sum of squares
Step 5: Divide the sum of squares by \(n-1\) (where n is the total number of observations):
\(\dfrac{\sum (x-\overline{x})^{2}}{n-1}\), this is the sample variance \((s^{2})\)
Viola! You have calculated the variance. That was a lot of math. Fortunately, you never have to do this by hand (but it is very interesting to see that variance is a simple matter of subtracting, multiplying, adding, and dividing!
But this still hasn't got us to the standard deviation. So we need to add one last step.
Step 6: Take the square root of the sample variance:
\(\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}\), this is the sample standard deviation
Viola! The standard deviation! Just as the definition indicates, this is simply the square root of the variance
NOTE: The reason this is the square root is because of Step 3 above. Because of the math involved we had to square the deviation scores. This results in the magnitude of the variance always being much larger than the original units of measurements in our data. By taking the square root of the variance, the variability of the data can be expressed in the original measurement units by using the standard deviation.

So to keep track of some of the vocabulary introduced:

Deviation: An individual score minus the mean.

Sum of Squared Deviations: Deviations squared and added together. This is also known as the sum of squares or SS.

Sum of Squares: \(SS={\sum \left(x-\overline{x}\right)^{2}}\)

Variance: Approximately the average of all of the squared deviations; for a sample represented as \(s^{2}\)

Standard Deviation: Roughly the average difference between individual data values and the mean. The standard deviation of a sample is denoted as \(s\). The standard deviation of a population is denoted as \(\sigma\).

Sample Standard Deviation: \(s=\sqrt{\dfrac{\sum (x-\overline{x})^{2}}{n-1}}\)