12.2  Correlation
12.2  CorrelationIn this course we have been using Pearson's \(r\) as a measure of the correlation between two quantitative variables. In a sample, we use the symbol \(r\). In a population, we use the symbol \(\rho\) ("rho").
Pearson's \(r\) can easily be computed using Minitab Express. However, understanding the conceptual formula may help you to better understand the meaning of a correlation coefficient.
 Pearson's \(r\): Conceptual Formula

\(r=\dfrac{\sum{z_x z_y}}{n1}\)
where \(z_x=\dfrac{x  \overline{x}}{s_x}\) and \(z_y=\dfrac{y  \overline{y}}{s_y}\)
When we replace \(z_x\) and \(z_y\) with the \(z\) score formulas and move the \(n1\) to a separate fraction we get the formula in your textbook: \(r=\dfrac{1}{n1}\sum{\left(\dfrac{x\overline x}{s_x}\right) \left( \dfrac{y\overline y}{s_y}\right)}\)
If conducting a test by hand, a \(t\) test statistic with \(df=n2\) is computed: \(t=\dfrac{r \rho_{0}}{\sqrt{\dfrac{1r^2}{n2}}} \)
In this course you will never need to compute \(r\) or the test statistic by hand, we will always be using Minitab Express to perform these calculations.
MinitabExpress – Computing Pearson's r
We previously created a scatterplot of quiz averages and final exam scores and observed a linear relationship. Here, we will compute the correlation between these two variables.
 Open the data set:
 On a PC: Select STATISTICS > Correlation > Correlation
On a MAC: Select Statistics > Regression > Correlation  Double click the Quiz_Average and Final in the box on the left to insert them into the Variables box
 Click OK
This should result in the following output:
Pearson correlation of Quiz_Average and Final = 0.608630 
PValue = <0.0001 
Select your operating system below to see a stepbystep guide for this example.
Properties of Pearson's r
 \(1\leq r \leq +1\)
 For a positive association, \(r>0\), for a negative association \(r<0\), if there is no relationship \(r=0\)
 The closer \(r\) is to 0 the weaker the relationship and the closer to +1 or 1 the stronger the relationship (e.g., \(r=.88\) is a stronger relationship than \(r=+.60\)); the sign of the correlation provides direction only
 Correlation is unit free; the \(x\) and \(y\) variables do NOT need to be on the same scale (e.g., it is possible to compute the correlation between height in centimeters and weight in pounds)
 It does not matter which variable you label as \(x\) and which you label as \(y\). The correlation between \(x\) and \(y\) is equal to the correlation between \(y\) and \(x\).
The following table may serve as a guideline when evaluating correlation coefficients
Absolute Value of \(r\)  Strength of the Relationship 

0  0.2  Very weak 
0.2  0.4  Weak 
0.4  0.6  Moderate 
0.6  0.8  Strong 
0.8  1.0  Very strong 
12.2.1  Hypothesis Testing
12.2.1  Hypothesis TestingIn testing the statistical significance of the relationship between two quantitative variables we will use the five step hypothesis testing procedure:
In order to use Pearson's \(r\) both variables must be quantitative and the relationship between \(x\) and \(y\) must be linear
Research Question  Is the correlation in the population different from 0?  Is the correlation in the population positive?  Is the correlation in the population negative? 

Null Hypothesis, \(H_{0}\)  \(\rho=0\)  \(\rho= 0\)  \(\rho = 0\) 
Alternative Hypothesis, \(H_{a}\)  \(\rho \neq 0\)  \(\rho > 0\)  \(\rho< 0\) 
Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional 
Use Minitab Express to compute \(r\)
Minitab Express will give you the pvalue for a twotailed test (i.e., \(H_a: \rho \neq 0\)). If you are conducting a onetailed test you will need to divide the pvalue in the output by 2.
If \(p \leq \alpha\) reject the null hypothesis, there is evidence of a relationship in the population.
If \(p>\alpha\) fail to reject the null hypothesis, there is not evidence of a relationship in the population.
Based on your decision in Step 4, write a conclusion in terms of the original research question.
12.2.1.1  Video Example: Quiz & Exam Scores
12.2.1.1  Video Example: Quiz & Exam ScoresThis example uses the
dataset.12.2.1.2  Example: Age & Height
12.2.1.2  Example: Age & HeightData concerning body measurements from 507 adults retrieved from body.dat.txt for more information see body.txt. In this example we will use the variables of age (in years) and height (in centimeters).
Research question: Is there a relationship between age and height in adults?
Age (in years) and height (in centimeters) are both quantitative variables. From the scatterplot below we can see that the relationship is linear (or at least not nonlinear).
\(H_0: \rho = 0\)
\(H_a: \rho \neq 0\)
From Minitab Express:
Pearson correlation of Height (cm) and Age = 0.067883 
PValue = 0.1269 
\(r=0.067883\)
\(p=.1269\)
\(p > \alpha\) therefore we fail to reject the null hypothesis.
There is not evidence of a relationship between age and height in the population from which this sample was drawn.
12.2.1.3  Example: Temperature & Coffee Sales
12.2.1.3  Example: Temperature & Coffee SalesData concerning sales at studentrun cafe were retrieved from cafedata.xls more information about this data set available at cafedata.txt. Let's determine if there is a statistically significant relationship between the maximum daily temperature and coffee sales.
Maximum daily temperature and coffee sales are both quantitative variables. From the scatterplot below we can see that the relationship is linear.
\(H_0: \rho = 0\)
\(H_a: \rho \neq 0\)
Pearson correlation of Max Daily Temperature (F) and Coffees = 0.741302 
PValue = <0.0001 
\(r=0.741302\)
\(p<.0001\)
\(p \leq \alpha\) therefore we reject the null hypothesis.
There is evidence of a relationship between the maximum daily temperature and coffee sales in the population.
12.2.2  Correlation Matrix
12.2.2  Correlation MatrixWhen examining correlations for more than two variables (i.e., more than one pair), correlation matrices are commonly used. In Minitab Express, if you request the correlations between three or more variables at once, your output will contain a correlation matrix with all of the possible pairwise correlations. For each pair of variables, Pearson's r will be given along with the p value. The following pages include examples of interpreting correlation matrices.
12.2.2.1  Video Example: Student Survey
12.2.2.1  Video Example: Student SurveyThis example uses the StudentSurvey.MTW dataset from the Lock^{5} textbook.
12.2.2.2  Example: Body Correlation Matrix
12.2.2.2  Example: Body Correlation MatrixThis correlation matrix was constructed using the body dataset. These data are from the Journal of Statistics Education data archive.
Six variables were used: age, weight (kg), height (cm), hip girth, abdominal girth, and wrist girth.
Cell contents grouped by Age, Weight, Height, Hip Girth, and Abdominal Girth; First row: Pearson correlation, Following row: PValue
Cell contents grouped by Age, Weight, Height, Hip Girth, and Abdominal Girth; First row: Pearson correlation, Following row: PValue
Age  Weight (kg)  Height (cm)  Hip Girth  Abdominal Girth  

Weight (kg)  0.207265  
<0.0001  
Height (cm)  0.067883  0.717301  
0.1269  <0.0001  
Hip Girth  0.227080  0.762969  0.338584  
<0.0001  <0.0001  <0.0001  
Abdominal Girth  0.422188  0.711816  0.313197  0.825892  
<0.0001  <0.0001  <0.0001  <0.0001  
Wrist Girth  0.192024  0.816488  0.690834  0.458857  0.435420 
<0.0001  <0.0001  <0.0001  <0.0001  <0.0001 
This correlation matrix presents 15 different correlations. For each of the 15 pairs of variables, the top box contains the Pearson's r correlation coefficient and the bottom box contains the p value.
The correlation between age and weight is \(r=0.207265\). This correlation is statistically significant (\(p<0.0001\)). That is, there is evidence of a relationship between age and weight in the population.
The correlation between age and height is \(r=0.0678863\). This correlation is not statistically significant (\(p=0.1269\)). There is not evidence of a relationship between age and height in the population.
The correlation between weight and height is \(r=0.717301\). This correlation is statistically significant (\(p<0.0001\)). That is, there is evidence of a relationship between weight and height in the population.
And so on.