# 7.2 - Correlation

7.2 - Correlation

## The Correlation Coefficient

If we want to provide a measure of the strength of the linear relationship between home and away batting averages (two quantitative variables), a good way is to report the correlation coefficient between them.

The sample correlation coefficient is typically denoted as $$r$$. It is also known as Pearson’s $$r$$. The population correlation coefficient is generally denoted as $$\rho$$, pronounced “rho.”

Sample Correlation Coefficient

The sample correlation coefficient, $$r$$, is calculated using the following formula:

$$r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y}) }{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}}$$

If you have a solid foundation of the material covered in this course up to this point you should notice that the term $$x-\bar{x}$$ (and also $$y-\bar{y}) are simple deviation scores. As you (should) know, the deviation score is the starting point to calculate the variance of a variable. Thus a correlation coefficient is simply the co-variance of two variables! The advantage of the correlation coefficient is that the denominator provides a standardization of the value of the correlation coefficient because it divides the covariance by the product of the standard deviations of the two variables. #### Properties of the Correlation Coefficient, r To summarize, some important properties of the correlation coefficient, r : 1. \(-1\le r\le 1$$, i.e. $$r$$ takes values between -1 and +1, inclusive.
2. The sign of the correlation provides the direction of the linear relationship. The sign indicates whether the two variables are positively or negatively related.
3. A correlation of 0 means there is no linear relationship.
4. There are no units attached to $$r$$.
5. As the magnitude of $$r$$ approaches 1, the stronger the linear relationship.
6. As the magnitude of $$r$$ approaches 0, the weaker the linear relationship.
7. If we fit the simple linear regression model between Y and X, then $$r$$ has the same sign as $$\beta_1$$, which is the coefficient of X in the linear regression equation. -- more on this later.
8. The correlation value would be the same regardless of which variable we defined as X and Y.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility