Mean, Median, and Mode Section
A measure of central tendency is an important aspect of quantitative data. It is an estimate of a “typical” value.
Three of the many ways to measure central tendency are the mean, median and mode.
There are other measures, such as a trimmed mean, that we do not discuss here.
- Mean
- The mean is the average of data.
- Sample Mean
- Let $x_1, x_2, \ldots, x_n$ be our sample. The sample mean is usually denoted by $\bar{x}$
- \(\bar{x}=\sum_{i=1}^n \dfrac{x_i}{n}=\dfrac{1}{n}\sum_{i=1}^n x_i\)
- where n is the sample size and \(x_i\) are the measurements. One may need to use the sample mean to estimate the population mean since usually only a random sample is drawn and we don't know the population mean.
The sample mean is a statistic and a population mean is a parameter. Review the definitions of statistic and parameter in Lesson 0.2.
Note on Notation
What if we say we used $y_i$ for our measurements instead of $x_i$? Is this a problem? No. The formula would simply look like this: \(\bar{y}=\sum_{i=1}^n \dfrac{y_i}{n}=\dfrac{1}{n}\sum_{i=1}^n y_i\) The formulas are exactly the same. The letters that you select to denote the measurements are up to you. For instance, many textbooks use $y$ instead of $x$ to denote the measurements. The point is to understand how the calculation that is expressed in the formula works. In this case, the formula is calculating the mean by summing all of the observations and dividing by the number of observations. There is some notation that you will come to see as standards, i.e, n will always equal sample size. We will make a point of letting you know what these are. However, when it comes to the variables, these labels can (and do) vary.
- Median
-
The median is the middle value of the ordered data.
The most important step in finding the median is to first order the data from smallest to largest.
Steps to finding the median for a set of data:
- Arrange the data in increasing order, i.e. smallest to largest.
- Find the location of the median in the ordered data by \(\frac{n+1}{2}\), where n is the sample size.
- The value that represents the location found in Step 2 is the median.
Note on Odd or Even Sample Sizes
If the sample size is an odd number then the location point will produce a median that is an observed value. If the sample size is an even number, then the location will require one to take the mean of two numbers to calculate the median. The result may or may not be an observed value as the example below illustrates.
- Mode
- The mode is the value that occurs most often in the data. It is important to note that there may be more than one mode in the dataset.
Example 1-5: Test Scores Section
Consider the aptitude test scores of ten children below:
95, 78, 69, 91, 82, 76, 76, 86, 88, 80
Find the mean, median, and mode.
Answer
Mean
\(\bar{x}=\frac{1}{10}(95+78+69+91+82+76+76+86+88+80)=82.1\)
Median
First, order the data.
69, 76, 76, 78, 80, 82, 86, 88, 91, 95
With n = 10, the median position is found by (10 + 1) / 2 = 5.5. Thus, the median is the average of the fifth (80) and sixth (82) ordered value and the median = 81
Mode
The most frequent value in this data set is 76. Therefore the mode is 76.
Effects of Outliers Section
One shortcoming of the mean is that means are easily affected by extreme values. Measures that are not that affected by extreme values are called resistant. Measures that are affected by extreme values are called sensitive.
Example 1-6: Test Scores Cont'd... Section
Using the data from Example 1-5, how would the mean and median change, if the entry 91 is mistakenly recorded as 9?
Answer
The data set would be
9, 69, 76, 76, 78, 80, 82, 86, 88, 95
Mean
The mean would be \(\bar{x}=\frac{1}{10}(9+78+69+95+82+76+76+86+88+80)=73.9\)
The mean would be 73.9, which is very different from 82.1.
Median
Let us see the effect of the mistake on the median value.
The data set (with 91 coded as 9) in increasing order is:
9, 69, 76, 76, 78, 80, 82, 86, 88, 95
where the median = 79
The medians of the two sets are not that different. Therefore the median is not that affected by the extreme value 9.
The mean is a sensitive measure (or sensitive statistic) and the median is a resistant measure (or resistant statistic).
After reading this lesson you should know that there are quite a few options when one wants to describe central tendency. In future lessons, we talk about mainly about the mean. However, we need to be aware of one of its shortcomings, which is that it is easily affected by extreme values.
Unless data points are known mistakes, one should not remove them from the data set! One should keep the extreme points and use more resistant measures. For example, use the sample median to estimate the population median. We will discuss methods using the median in Lesson 11.
Adding and Multiplying Constants Section
What happens to the mean and median if we add or multiply each observation in a data set by a constant?
Consider for example if an instructor curves an exam by adding five points to each student’s score. What effect does this have on the mean and the median? The result of adding a constant to each value has the intended effect of altering the mean and median by the constant.
For example, if in the above example where we have 10 aptitude scores, if 5 was added to each score the mean of this new data set would be 87.1 (the original mean of 82.1 plus 5) and the new median would be 86 (the original median of 81 plus 5).
Similarly, if each observed data value was multiplied by a constant, the new mean and median would change by a factor of this constant. Returning to the 10 aptitude scores, if all of the original scores were doubled, the then the new mean and new median would be double the original mean and median. As we will learn shortly, the effect is not the same on the variance!
Looking Ahead!
Why would you want to know this? One reason, especially for those moving onward to more applied statistics (e.g. Regression, ANOVA), is the transforming data. For many applied statistical methods, a required assumption is that the data is normal, or very near bell-shaped. When the data is not normal, statisticians will transform the data using numerous techniques e.g. logarithmic transformation. We just need to remember the original data was transformed!!
Shape
The shape of the data helps us to determine the most appropriate measure of central tendency. The three most important descriptions of shape are Symmetric, Left-skewed, and Right-skewed. Skewness is a measure of the degree of asymmetry of the distribution.
-
Symmetric
-
- mean, median, and mode are all the same here
- no skewness is apparent
- the distribution is described as symmetric
-
Left-Skewed or Skewed Left
-
- mean < median
- long tail on the left
-
Right-skewed or Skewed Right
-
- mean > median
- long tail on the right
Application: The Skewed Nature of Salary Data Section
Salary distributions are almost always right-skewed, with a few people that make the most money. To illustrate this, consider your favorite sports team or even the company for which you work. There will be one or two players or personnel that earn the “big bucks”, followed by others who earn less. This will produce a shape that is skewed to the right. Knowing this can be a useful aid in negotiating a higher salary.
When one interviews for a position and the discussion gets around to compensation, it is common that the interviewer states an offer that is “typical for someone in your position”. That is, they are offering you the average salary for someone with your particular skill set (e.g. little experience). But is this average the mode, median, or mean? The company – for whom business is business! – will want to pay you the least they can while you prefer to earn the most you can. Since salaries tend to be skewed to the right, the offer will most likely reflect the mode or median. You simply need to ask to which “average” the offer refers and what is the mean of this average since the mean would be the highest of the three values. Once you have these averages, you can begin to negotiate toward the highest number.