9.1.2 - Correlation
9.1.2 - CorrelationIf we want to provide a measure of the strength of the linear relationship between two quantitative variables, a good way is to report the correlation coefficient between them.
The sample correlation coefficient is typically denoted as \(r\). It is also known as Pearson’s \(r\). The population correlation coefficient is generally denoted as \(\rho\), pronounced “rho.”
- Sample Correlation Coefficient
-
The sample correlation coefficient, \(r\), is calculated using the following formula:
\( r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y}) }{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}} \)
Properties of the correlation coefficient, \(r\):
- \(-1\le r\le 1\), i.e. \(r\) takes values between -1 and +1, inclusive.
- The sign of the correlation provides the direction of the linear relationship. The sign indicates whether the two variables are positively or negatively related.
- A correlation of 0 means there is no linear relationship.
- There are no units attached to \(r\).
- As the magnitude of \(r\) approaches 1, the stronger the linear relationship.
- As the magnitude of \(r \) approaches 0, the weaker the linear relationship.
- If we fit the simple linear regression model between Y and X, then \(r\) has the same sign as \(\beta_1\), which is the coefficient of X in the linear regression equation. -- more on this later.
- The correlation value would be the same regardless of which variable we defined as X and Y.
Note! The correlation is unit free. We can see this easier using the equation above. Consider, for example, that we are interested in the correlation between X = height (inches) and Y = weight (pounds). In the equation above, the numerator would have the units of \(\text{pounds}^*\text{inches}\). The denominator would include taking the square root of pounds squared and inches squared, leaving us again with units of \(\text{pounds}^*\text{inches}\). Therefore the units would cancel out.
Visualizing Correlation
The following four graphs illustrate four possible situations for the values of r. Pay particular attention to graph (d) which shows a strong relationship between y and x but where r = 0. Note that no linear relationship does not imply no relationship exists!
Example 9-2: Sales and Advertising (Correlation)
We have collected five months of sales and advertising dollars for a small company we own. Sales units are in thousands of dollars, and advertising units are in hundreds of dollars. Our interest is determining if a linear relationship exists between sales and advertising. The data is as follows:
Sales (Y) | Advertising (X) |
---|---|
1 | 1 |
1 | 2 |
2 | 3 |
2 | 4 |
4 | 5 |
Find the sample correlation and interpret the value.
\(y_i-\bar{y}\) | \(x_i-\bar{x}\) | \((x_i-\bar{x})(y_i-\bar{y})\) |
---|---|---|
\(1-2=-1\) | \(1-3=-2\) | \((-1)(-2)=2\) |
\(1-2=-1\) | \(2-3=-1\) | \((-1)(-1)=1\) |
\(2-2=0\) | \(3-3=0\) | \((0)(0)=0\) |
\(2-2=0\) | \(4-3=1\) | \((0)(1)=0\) |
\(4-2=2\) | \(5-3=2\) | \((2)(2)=4\) |
From the table we can calculate the following sums...
\(\sum(y_i-\bar{y})^2=(-1)^2+(-1)^2+0+0+2^2=6 \;\text{(sum of first column)}\)
\(\sum(x_i-\bar{x})^2=(-2)^2+(-1)^2+0+1^2+2^2=10 \;\text{(sum of second column)}\)
\(\sum(x_i-\bar{x})(y_i-\bar{y})=2+1+0+0+4=7 \;\text{(sum of third column)}\)
Using these numbers in the formula for r...
\(r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2}\sqrt{\sum(y_i-\bar{y})^2}}=\dfrac{7}{\sqrt{10}\sqrt{6}}=0.9037\)
Using Minitab to calculate r
To calculate r using Minitab:
- Open Minitab and upload the data (for this example type the Y data into a column (e.g., C1) and the X data into a column (e.g., C2))
- Choose Stat > Basic Statistics > Correlation
- Specify the response and explanatory variables in the dialog box (X and Y in this example).
Minitab output for this example:
Correlation: Y,X
Correlations
P-value
0.035
The sample correlation is 0.904. This value indicates a strong positive linear relationship between sales and advertising.
Try it!
Using the following data, calculate the correlation and interpret the value.
X | Y |
---|---|
2 | 7 |
4 | 11 |
14 | 29 |
13 | 28 |
15 | 32 |
The mean of \(X\) is 9.6 and the mean of \(Y\) is 21.4. The sums are...
\(\sum (x_i-\bar{x})^2=149.2\)
\(\sum (y_i-\bar{y})^2=529.2\)
\(\sum (x_i-\bar{x})(y_i-\bar{y})=280.8\)
Using these sums in the formula for r...
\(r=\dfrac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2}\sqrt{\sum(y_i-\bar{y})^2}}=0.9993\)
Following the steps for finding correlation with Minitab you should get the following output:
Pearson correlation | 0.999 |
p-value | 0.000 |