12.2  Correlation
12.2  CorrelationIn this course, we have been using Pearson's \(r\) as a measure of the correlation between two quantitative variables. In a sample, we use the symbol \(r\). In a population, we use the symbol \(\rho\) ("rho").
Pearson's \(r\) can easily be computed using Minitab. However, understanding the conceptual formula may help you to better understand the meaning of a correlation coefficient.
 Pearson's \(r\): Conceptual Formula

\(r=\dfrac{\sum{z_x z_y}}{n1}\)
where \(z_x=\dfrac{x  \overline{x}}{s_x}\) and \(z_y=\dfrac{y  \overline{y}}{s_y}\)
When we replace \(z_x\) and \(z_y\) with the \(z\) score formulas and move the \(n1\) to a separate fraction we get the formula in your textbook: \(r=\dfrac{1}{n1}\sum{\left(\dfrac{x\overline x}{s_x}\right) \left( \dfrac{y\overline y}{s_y}\right)}\)
In this course you will never need to compute \(r\) by hand, we will always be using Minitab to perform these calculations.
Minitab^{®} – Computing Pearson's r
We previously created a scatterplot of quiz averages and final exam scores and observed a linear relationship. Here, we will compute the correlation between these two variables.
 Open the Minitab file: Exam.mpx
 Select Stat > Basic Statistics > Correlation
 Double click the Quiz_Average and Final in the box on the left to insert them into the Variables box
 Click OK
This should result in the following output:
Method
Correlation type Pearson
Number of rows used 50
Correlation
Quiz_Average  

Final  0.609 
Properties of Pearson's r
 \(1\leq r \leq +1\)
 For a positive association, \(r>0\), for a negative association \(r<0\), if there is no relationship \(r=0\)
 The closer \(r\) is to 0 the weaker the relationship and the closer to +1 or 1 the stronger the relationship (e.g., \(r=.88\) is a stronger relationship than \(r=+.60\)); the sign of the correlation provides direction only
 Correlation is unit free; the \(x\) and \(y\) variables do NOT need to be on the same scale (e.g., it is possible to compute the correlation between height in centimeters and weight in pounds)
 It does not matter which variable you label as \(x\) and which you label as \(y\). The correlation between \(x\) and \(y\) is equal to the correlation between \(y\) and \(x\).
The following table may serve as a guideline when evaluating correlation coefficients
Absolute Value of \(r\)  Strength of the Relationship 

0  0.2  Very weak 
0.2  0.4  Weak 
0.4  0.6  Moderate 
0.6  0.8  Strong 
0.8  1.0  Very strong 
12.2.1  Hypothesis Testing
12.2.1  Hypothesis TestingIn testing the statistical significance of the relationship between two quantitative variables we will use the five step hypothesis testing procedure:
In order to use Pearson's \(r\) both variables must be quantitative and the relationship between \(x\) and \(y\) must be linear
Research Question  Is the correlation in the population different from 0?  Is the correlation in the population positive?  Is the correlation in the population negative? 

Null Hypothesis, \(H_{0}\)  \(\rho=0\)  \(\rho= 0\)  \(\rho = 0\) 
Alternative Hypothesis, \(H_{a}\)  \(\rho \neq 0\)  \(\rho > 0\)  \(\rho< 0\) 
Type of Hypothesis Test  Twotailed, nondirectional  Righttailed, directional  Lefttailed, directional 
Minitab will not provide the test statistic for correlation. It will provide the sample statistic, \(r\), along with the pvalue (for step 3).
Optional: If you are conducting a test by hand, a \(t\) test statistic is computed in step 2 using the following formula:
\(t=\dfrac{r \rho_{0}}{\sqrt{\dfrac{1r^2}{n2}}} \)
In step 3, a \(t\) distribution with \(df=n2\) is used to obtain the pvalue.
Minitab will give you the pvalue for a twotailed test (i.e., \(H_a: \rho \neq 0\)). If you are conducting a onetailed test you will need to divide the pvalue in the output by 2.
If \(p \leq \alpha\) reject the null hypothesis, there is evidence of a relationship in the population.
If \(p>\alpha\) fail to reject the null hypothesis, there is not enough evidence of a relationship in the population.
Based on your decision in Step 4, write a conclusion in terms of the original research question.
12.2.1.1  Example: Quiz & Exam Scores
12.2.1.1  Example: Quiz & Exam ScoresExample: Quiz and exam scores
Is there a relationship between students' quiz averages in a course and their final exam scores in the course?
Let's use the 5 step hypothesis testing procedure to address this process research question.
In order to use Pearson's \(r\) both variables must be quantitative and the relationship between \(x\) and \(y\) must be linear. We can use Minitab to create the scatterplot using the file: Exam.mpx
Note that when creating the scatterplot it does not matter what you designate as the x or y axis. We get the following which shows a fairly linear relationship.
Our hypotheses:
 Null Hypothesis, \(H_{0}\): \(\rho=0\)
 Alternative Hypothesis, \(H_{a}\): \(\rho\ne0\)
Use Minitab to compute \(r\) and the pvalue.
 Open the file in Minitab
 Select Stat > Basic Statistics > Correlation
 Enter the columns Quiz_Average and Final in the Variables box
 Select the Results button and check the Pairwise correlation table in the new window
 OK and OK
Pairwise Pearson Correlations
Sample 1  Sample 2  N  Correlation  95% CI for \(\rho\)  PValue 

Final  Quiz_Average  50  0.609  (0.398, 0.758)  0.000 
Our sample statistic r = 0.609.
From our output the pvalue is 0.000.
If \(p \leq \alpha\) reject the null hypothesis, there is evidence of a relationship in the population.
There is evidence of a relationship between students' quiz averages and their final exam scores in the population.
12.2.1.2  Example: Age & Height
12.2.1.2  Example: Age & HeightData concerning body measurements from 507 adults retrieved from body.dat.txt for more information see body.txt. In this example, we will use the variables of age (in years) and height (in centimeters) only.
For the full data set and descriptions see the original files:
For this example, you can use the following Minitab file: body.dat.mpx
Research question: Is there a relationship between age and height in adults?
Age (in years) and height (in centimeters) are both quantitative variables. From the scatterplot below we can see that the relationship is linear (or at least not nonlinear).
\(H_0: \rho = 0\)
\(H_a: \rho \neq 0\)
From Minitab:
Pairwise Pearson Correlations
Sample 1  Sample 2  N  Correlation  95% CI for \(\rho\)  PValue 

Height (cm)  Age (years)  507  0.068  (0.019, 0.154)  0.127 
\(r=0.068\)
\(p=.127\)
\(p > \alpha\) therefore we fail to reject the null hypothesis.
There is not enough evidence of a relationship between age and height in the population from which this sample was drawn.
12.2.1.3  Example: Temperature & Coffee Sales
12.2.1.3  Example: Temperature & Coffee SalesData concerning sales at studentrun cafe were retrieved from cafedata.xls more information about this data set available at cafedata.txt. Let's determine if there is a statistically significant relationship between the maximum daily temperature and coffee sales.
For this example, you can use the following Minitab file: cafedata.mpx
Maximum daily temperature and coffee sales are both quantitative variables. From the scatterplot below we can see that the relationship is linear.
\(H_0: \rho = 0\)
\(H_a: \rho \neq 0\)
From Minitab:
Pairwise Pearson Correlations
Sample 1  Sample 2  N  Correlation  95% CI for \(\rho\)  PValue 

Max Daily Temperature (F)  Coffees  47  0.741  (0.848, 0.577)  0.000 
\(r=0.741\)
\(p=.000\)
\(p \leq \alpha\) therefore we reject the null hypothesis.
There is evidence of a relationship between the maximum daily temperature and coffee sales in the population.
12.2.2  Correlation Matrix
12.2.2  Correlation MatrixWhen examining correlations for more than two variables (i.e., more than one pair), correlation matrices are commonly used. In Minitab, if you request the correlations between three or more variables at once, your output will contain a correlation matrix with all of the possible pairwise correlations. For each pair of variables, Pearson's r will be given along with the p value. The following pages include examples of interpreting correlation matrices.
12.2.2.1  Example: Student Survey
12.2.2.1  Example: Student SurveyConstruct a correlation matrix to examine the relationship between how many hours per week students exercise, their heights, their weights, and their resting pulse rates.
This example uses the 'StudentSurvey' dataset from the Lock^{5} textbook. The data was collected from a sample of 362 college students.
To construct a correlation matrix in Minitab...
 Open the Minitab file: StudentSurvey.mpx
 Select Stat > Basic Statistics > Correlation
 Enter the variables Exercise, Height, Weight and Pulse into the Variables box
 Select the Graphs... button and select Correlations and pvalues from the dropdown
 Select the Results... button and verify that the Correlation matrix and the Pairwise correlation table boxes are checked
 Click OK and OK
This should result in the following output:
Correlation
Exercise  Height  Weight  

Height  0.118  
Weight  0.118  0.619  
Pulse  0.172  0.082  0.120 
Pairwise Pearson Correlations
Sample 1  Sample 2  N  Correlation  95% CI for ρ  PValue 

Height  Exercise  354  0.118  (0.014, 0.220)  0.026 
Weight  Exercise  356  0.118  (0.015, 0.220)  0.026 
Pulse  Exercise  361  0.172  (0.271, 0.071)  0.001 
Weight  Height  352  0.619  (0.551, 0.680)  0.000 
Pulse  Height  355  0.082  (0.184, 0.023)  0.124 
Pulse  Weight  357  0.120  (0.221, 0.016)  0.023 
Interpretation
When we look at the matrix graph or the pairwise Pearson correlations table we see that we have six possible pairwise combinations (every possible pairing of the four variables). Let's say we wanted to examine the relationship between exercise and height. We would find the row in the pairwise Pearson correlations table where these two variables are listed for sample 1 and sample 2. In this case, that is the first row. The correlation between exercise and height is 0.118 and the pvalue is 0.026.
If we were conducting a hypothesis test for this relationship, these would be step 2 and 3 in the 5 step process.
12.2.2.2  Example: Body Correlation Matrix
12.2.2.2  Example: Body Correlation MatrixConstruct a correlation matrix using the variables age (years), weight (Kg), height (cm), hip girth, navel (or abdominal girth), and wrist girth.
This example is using the body dataset. These data are from the Journal of Statistics Education data archive.
For this example, you can use the following Minitab file: body.dat.mpx
To construct a correlation matrix in Minitab...
 Open the Minitab file: StudentSurvey.mpx
 Select Stat > Basic Statistics > Correlation
 Enter the variables Age(years), Weight (Kg), Height (cm), Hip girth at level of bitrochan, Navel (or "Abdominal") girth, and Wrist minimum girth into the Variables box
 Select the Graphs... button and select Correlations and pvalues from the dropdown
 Select the Results... button and verify that the Correlation matrix and Pairwise correlation table boxes are checked
 Click OK and OK
This should result in the following partial output:
Pairwise Pearson Correlations
Sample 1  Sample 2  N  Correlation  95% CI for ρ  PValue 

Weight (Kg)  Age (years)  507  0.207  (0.122, 0.289)  0.000 
Height (cm)  Age (years)  507  0.068  (0.019, 0.154)  0.127 
Hip girth at level of bitrochan  Age (years)  507  0.227  (0.143, 0.308)  0.000 
Navel (or "Abdominal") girth at  Age (years)  507  0.422  (0.348, 0.491)  0.000 
Wrist minimum girth  Age (years)  507  0.192  (0.107, 0.275)  0.000 
Height (cm)  Weight (Kg)  507  0.717  (0.672, 0.757)  0.000 
Hip girth at level of bitrochan  Weight (Kg)  507  0.763  (0.724, 0.797)  0.000 
Navel (or "Abdominal") girth at  Weight (Kg)  507  0.712  (0.666, 0.752)  0.000 
Wrist minimum girth  Weight (Kg)  507  0.816  (0.785, 0.844)  0.000 
Hip girth at level of bitrochan  Height (cm)  507  0.339  (0.259, 0.413)  0.000 
Navel (or "Abdominal") girth at  Height (cm)  507  0.313  (0.232, 0.390)  0.000 
Wrist minimum girth  Height (cm)  507  0.691  (0.642, 0.734)  0.000 
Navel (or "Abdominal") girth at  Hip girth at level of bitrochan  507  0.826  (0.796, 0.852)  0.000 
Wrist minimum girth  Hip girth at level of bitrochan  507  0.459  (0.387, 0.525)  0.000 
Wrist minimum girth  Navel (or "Abdominal") girth at  507  0.435  (0.362, 0.503)  0.000 
Cell contents grouped by Age, Weight, Height, Hip Girth, and Abdominal Girth; First row: Pearson correlation, Following row: PValue
Cell contents grouped by Age, Weight, Height, Hip Girth, and Abdominal Girth; First row: Pearson correlation, Following row: PValue
This correlation matrix presents 15 different correlations. For each of the 15 pairs of variables, the 'Correlation' column contains the Pearson's r correlation coefficient and the last column contains the p value.
The correlation between age and weight is \(r=0.207\). This correlation is statistically significant (\(p=0.000\)). That is, there is evidence of a relationship between age and weight in the population.
The correlation between age and height is \(r=0.068\). This correlation is not statistically significant (\(p=0.127\)). There is not enough evidence of a relationship between age and height in the population.
The correlation between weight and height is \(r=0.717\). This correlation is statistically significant (\(p<0.000\)). That is, there is evidence of a relationship between weight and height in the population.
And so on.