# 9.1.2 - Correlation

If we want to provide a measure of the strength of the linear relationship between two quantitative variables, a good way is to report the correlation coefficient between them.

The sample correlation coefficient is typically denoted as $$r$$. It is also known as Pearson’s $$r$$. The population correlation coefficient is generally denoted as $$\rho$$, pronounced “rho.”

Sample Correlation Coefficient

The sample correlation coefficient, $$r$$, is calculated using the following formula:

$$r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y}) }{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}}$$

## Properties of the correlation coefficient, $$r$$: Section

1. $$-1\le r\le 1$$, i.e. $$r$$ takes values between -1 and +1, inclusive.
2. The sign of the correlation provides the direction of the linear relationship. The sign indicates whether the two variables are positively or negatively related.
3. A correlation of 0 means there is no linear relationship.
4. There are no units attached to $$r$$.
5. As the magnitude of $$r$$ approaches 1, the stronger the linear relationship.
6. As the magnitude of $$r$$ approaches 0, the weaker the linear relationship.
7. If we fit the simple linear regression model between Y and X, then $$r$$ has the same sign as $$\beta_1$$, which is the coefficient of X in the linear regression equation. -- more on this later.
8. The correlation value would be the same regardless of which variable we defined as X and Y.

Note! The correlation is unit free. We can see this easier using the equation above. Consider, for example, that we are interested in the correlation between X = height (inches) and Y = weight (pounds). In the equation above, the numerator would have the units of $$\text{pounds}^*\text{inches}$$. The denominator would include taking the square root of pounds squared and inches squared, leaving us again with units of $$\text{pounds}^*\text{inches}$$. Therefore the units would cancel out.

## Visualizing Correlation Section

The following four graphs illustrate four possible situations for the values of r. Pay particular attention to graph (d) which shows a strong relationship between y and x but where r = 0. Note that no linear relationship does not imply no relationship exists!

a) $$r > 0$$

b) $$r < 0$$

c) $$r = 0$$

d) $$r=0$$

## Example 9-2: Sales and Advertising (Correlation) Section

We have collected five months of sales and advertising dollars for a small company we own. Sales units are in thousands of dollars, and advertising units are in hundreds of dollars. Our interest is determining if a linear relationship exists between sales and advertising. The data is as follows:

1 1
1 2
2 3
2 4
4 5

Find the sample correlation and interpret the value.

The mean of Sales (Y) is $$\bar{y}=2$$ and the mean of advertising (X) is $$\bar{x}=3$$. We can calculate the sample correlation in steps.
$$y_i-\bar{y}$$ $$x_i-\bar{x}$$ $$(x_i-\bar{x})(y_i-\bar{y})$$
$$1-2=-1$$ $$1-3=-2$$ $$(-1)(-2)=2$$
$$1-2=-1$$ $$2-3=-1$$ $$(-1)(-1)=1$$
$$2-2=0$$ $$3-3=0$$ $$(0)(0)=0$$
$$2-2=0$$ $$4-3=1$$ $$(0)(1)=0$$
$$4-2=2$$ $$5-3=2$$ $$(2)(2)=4$$

From the table we can calculate the following sums...

$$\sum(y_i-\bar{y})^2=(-1)^2+(-1)^2+0+0+2^2=6 \;\text{(sum of first column)}$$

$$\sum(x_i-\bar{x})^2=(-2)^2+(-1)^2+0+1^2+2^2=10 \;\text{(sum of second column)}$$

$$\sum(x_i-\bar{x})(y_i-\bar{y})=2+1+0+0+4=7 \;\text{(sum of third column)}$$

Using these numbers in the formula for r...

$$r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2}\sqrt{\sum(y_i-\bar{y})^2}}=\dfrac{7}{\sqrt{10}\sqrt{6}}=0.9037$$

Using Minitab to calculate r

To calculate r using Minitab:

1. Open Minitab and upload the data (for this example type the Y data into a column (e.g., C1) and the X data into a column (e.g., C2))
2. Choose Stat > Basic Statistics > Correlation
3. Specify the response and explanatory variables in the dialog box (X and Y in this example).

Minitab output for this example:

#### Correlation: Y,X

##### Correlations
Pearson correlation

P-value

0.904

0.035

The sample correlation is 0.904. This value indicates a strong positive linear relationship between sales and advertising.

Note! Minitab also provides a p-value. We will discuss this p-value and the test later in the Lesson.

## Try it! Section

Using the following data, calculate the correlation and interpret the value.

X Y
2 7
4 11
14 29
13 28
15 32

#### By Hand

The mean of $$X$$ is 9.6 and the mean of $$Y$$ is 21.4. The sums are...

$$\sum (x_i-\bar{x})^2=149.2$$

$$\sum (y_i-\bar{y})^2=529.2$$

$$\sum (x_i-\bar{x})(y_i-\bar{y})=280.8$$

Using these sums in the formula for r...

$$r=\dfrac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2}\sqrt{\sum(y_i-\bar{y})^2}}=0.9993$$

#### Using Minitab

Following the steps for finding correlation with Minitab you should get the following output.

#### Correlation: Y,X

##### Correlations
Pearson correlation

P-value

0.999

0.000