7.2 - Correlation

The Correlation Coefficient Section

If we want to provide a measure of the strength of the linear relationship between home and away batting averages (two quantitative variables), a good way is to report the correlation coefficient between them.

The sample correlation coefficient is typically denoted as \(r\). It is also known as Pearson’s \(r\). The population correlation coefficient is generally denoted as \(\rho\), pronounced “rho.”

Sample Correlation Coefficient

The sample correlation coefficient, \(r\), is calculated using the following formula:

\( r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y}) }{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}} \)

If you have a solid foundation of the material covered in this course up to this point you should notice that the term \(x-\bar{x}\) (and also \(y-\bar{y}) are simple deviation scores. As you (should) know, the deviation score is the starting point to calculate the variance of a variable. Thus a correlation coefficient is simply the co-variance of two variables!

The advantage of the correlation coefficient is that the denominator provides a standardization of the value of the correlation coefficient because it divides the covariance by the product of the standard deviations of the two variables.

Properties of the Correlation Coefficient, r

To summarize, some important properties of the correlation coefficient, r :

  1. \(-1\le r\le 1\), i.e. \(r\) takes values between -1 and +1, inclusive.
  2. The sign of the correlation provides the direction of the linear relationship. The sign indicates whether the two variables are positively or negatively related.
  3. A correlation of 0 means there is no linear relationship.
  4. There are no units attached to \(r\).
  5. As the magnitude of \(r\) approaches 1, the stronger the linear relationship.
  6. As the magnitude of \(r \) approaches 0, the weaker the linear relationship.
  7. If we fit the simple linear regression model between Y and X, then \(r\) has the same sign as \(\beta_1\), which is the coefficient of X in the linear regression equation. -- more on this later.
  8. The correlation value would be the same regardless of which variable we defined as X and Y.