18.5 - Use and Misuse of Correlation Coefficients

Correlation is a widely-used analysis tool that sometimes is applied inappropriately. Some caveats regarding the use of correlation methods follow.

The correlation methods discussed in this chapter should be used only with independent data; they should not be applied to repeated measures data where the data are not independent. For example, it would not be appropriate to use these measures of correlation to describe the relationship between Week 4 and Week 8 blood pressures in the same patients.
Caution should be used in interpreting results of correlation analysis when large numbers of variables have been examined, resulting in a large number of correlation coefficients.
The correlation of two variables that both have been recorded repeatedly over time can be misleading and spurious. Time trends should be removed from such data before attempting to measure correlation.
To extend correlation results to a given population, the subjects under study must form a representative (i.e., random) sample from that population. The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.
Care should be taken when attempting to correlate two variables where one is a part and one represents the total. For example, we would expect to find a positive correlation between height at age ten and adult height because the second quantity "contains" the first quantity.
Correlation should not be used to study the relation between an initial measurement, X, and the change in that measurement over time, Y - X. X will be correlated with Y - X due to the regression to the mean phenomenon.
Small correlation values do not necessarily indicate that two variables are unassociated. For example, Pearson's \(r_p\) will underestimate the association between two variables that show a quadratic relationship. Scatterplots should always be examined.
Correlation does not imply causation. If a strong correlation is observed between two variables A and B, there are several possible explanations:
1. A influences B
2. B influences A
3. A and B are influenced by one or more additional variables
4. the relationship observed between A and B was a chance error.
"Regular" correlation coefficients are often published when the researcher really intends to compare two methods of measuring the same quantity with respect to their agreement. This is a misguided analysis because correlation measures only the degree of association; it does not measure agreement. The next section of this lesson will present a measure of agreement.