Lesson #4: Descriptive Measures of the Strength of a Linear Association

(Pearson) correlation coefficient r

The correlation coefficient r is directly related to the coefficient of determination r2 in the obvious way. If r2 is represented in decimal form, e.g. 0.39 or 0.87, then all we have to do to obtain r is to take the square root of r2:

correlation coefficient

The sign of r depends on the sign of the estimated slope coefficient b1:

That is, the estimated slope and the correlation coefficient r always share the same sign. Furthermore, because r2 is always a number between 0 and 1, the correlation coefficient r is always a number between -1 and 1.

One advantage of r is that it is unitless, allowing researchers to make sense of correlation coefficients calculated on different data sets with different units. The "unitless-ness" of the measure can be seen from an alternative formula for r, namely:

coefficient r equation

If x is the height of an individual measured in inches and y is the weight of the individual measured in pounds, then the units for the numerator is inches × pounds. Similarly, the units for the denominator is inches × pounds. Because they are the same, the units in the numerator and denominator cancel eachother out, yielding a "unitless" measure.

Another formula for r that you might see in the regression literature is one that illustrates how the correlation coefficient r is a function of the estimated slope coefficient b1:

coefficient r equation

We are readily able to see from this version of the formula that:

That's enough with the formulas! As always, we will let statistical software such as Minitab do the dirty calculations for us. Here's what Minitab's output looks like for the skin cancer mortality and latitude example:

minitab output

The output tells us that the correlation between skin cancer mortality and latitude is -0.825 for this data set. Note that it doesn't matter the order in which you specify the variables:

minitab output

The output tells us that the correlation between skin cancer mortality and latitude is still -0.825. What does this correlation coefficient tells us? That is, how do we interpret the Pearson correlation coefficient r? In general, there is no nice practical operational interpretation for r as there is for r2. You can only use r to make a statement about the strength of the linear relationship between x and y. In general:

All other values of r tell us that the relationship between x and y is not perfect. The closer r is to 0, the weaker the linear relationship. The closer r is to -1, the stronger the negative linear relationship. And, the closer r is to 1, the stronger the positive linear relationship. As is true for the r2 value, what is deemed a large correlation coefficient r value depends greatly on the research area.

So, what does the correlation of -0.825 between skin cancer mortality and latitude tell us? It tells us:

© 2004 The Pennsylvania State University. All rights reserved.
Materials developed by Dr. Laura J. Simon (Lecturer, Penn State Department of Statistics).