9.4 - Inference for Correlation
9.4 - Inference for CorrelationLet’s review the notation for correlation.
\(r\): The sample correlation (Pearson’s correlation)
\(\rho\): “rho” is the population correlation
The sample correlation is found by:
\( r=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y}) }{\sqrt{\sum (x_i-\bar{x})^2}\sqrt{\sum (y_i-\bar{y})^2}} \)
In this section, we will present a hypothesis test for the population correlation. Then, we will compare the tests and interpretations for the slope and correlation.
9.4.1 - Hypothesis Testing for the Population Correlation
9.4.1 - Hypothesis Testing for the Population CorrelationIn this section, we present the test for the population correlation using a test statistic based on the sample correlation.
As with all hypothesis test, there are underlying assumptions. The assumptions for the test for correlation are:
- The are no outliers in either of the two quantitative variables.
- The two variables should follow a normal distribution
If there is no linear relationship in the population, then the population correlation would be equal to zero.
\(H_0\colon \rho=0\) (\(X\) and \(Y\) are linearly independent, or X and Y have no linear relationship)
\(H_a\colon \rho\ne0\) (\(X\) and \(Y\) are linearly dependent)
Research Question |
Is there a linear relationship? |
Is there a positive linear relationship? |
Is there a negative linear relationship? |
---|---|---|---|
Null Hypothesis |
\(\rho=0\) |
\(\rho=0\) |
\(\rho=0\) |
Alternative Hypothesis |
\(\rho\ne0\) |
\(\rho>0\) |
\(\rho<0\) |
Type of Test |
Two-tailed, non-directional |
Right-tailed, directional |
Left-tailed, directional |
Under the null hypothesis and with above assumptions, the test statistic, \(t^*\), found by:
\(t^*=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)
which follows a \(t\)-distribution with \(n-2\) degrees of freedom.
As mentioned before, we will use Minitab for the calculations. The output from Minitab previously used to find the sample correlation also provides a p-value. This p-value is for the two-sided test. If the alternative is one-sided, the p-value from the output needs to be adjusted.
Example 9-7: Student height and weight (Tests for \(\rho\))
For the height and weight example (university_ht_wt.TXT), conduct a test for correlation with a significance level of 5%.
The output from Minitab is:
Correlation: height, weight
Correlations
P-value
0.000
For the sake of this example, we will find the test statistic and the p-value rather than just using the Minitab output. There are 28 observations.
The test statistic is:
\begin{align} t^*&=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\\&=\dfrac{(0.711)\sqrt{28-2}}{\sqrt{1-0.711^2}}\\&=5.1556 \end{align}
Next, we need to find the p-value. The p-value for the two-sided test is:
\(\text{p-value}=2P(T>5.1556)<0.0001\)
Therefore, for any reasonable \(\alpha\) level, we can reject the hypothesis that the population correlation coefficient is 0 and conclude that it is nonzero. There is evidence at the 5% level that Height and Weight are linearly dependent.
Try it!
For the sales and advertising example, conduct a test for correlation with a significance level of 5% with Minitab.
Sales units are in thousands of dollars, and advertising units are in hundreds of dollars.
Sales (Y) | Advertising (X) |
---|---|
1 | 1 |
1 | 2 |
2 | 3 |
2 | 4 |
4 | 5 |
Correlation: Y,X
Correlations
P-value
0.035
The sample correlation is 0.904. This value indicates a strong positive linear relationship between sales and advertising.
For the Sales (Y) and Advertising (X) data, the test statistic is...
\(t^*=\dfrac{(0.904)\sqrt{5-2}}{\sqrt{1-(0.904)^2}}=3.66\)
...with df of 3, we arrive at a p-value = 0.035. For \(\alpha=0.05\), we can reject the hypothesis that the population correlation coefficient is 0 and conclude that it is nonzero, i.e., conclude that sales and advertising are linearly dependent.
9.4.2 - Comparing Correlation and Slope
9.4.2 - Comparing Correlation and SlopeSome of you may have noticed that the hypothesis test for correlation and slope are very similar. Also, the test statistic for both tests follows the same distribution with the same degrees of freedom, \(n-2\).
This similarity is because the two values are mathematically related. In fact,
\(\hat{\beta}_1=r\dfrac{\sqrt{\sum (y_i-\bar{y})^2}}{\sqrt{\sum(x_i-\bar{x})^2}}\)
Here is a summary of some of the similarities and differences between the sample correlation and the sample slope.
Similarities
- The test for correlation will lead to the same conclusion as the test for slope.
- The sign of the slope (i.e. negative or positive) will be the same for the correlation. In other words, both values indicate the direction of the relationship
Differences
- The value of the correlation indicates the strength of the linear relationship. The value of the slope does not.
- The slope interpretation tells you the change in the response for a one-unit increase in the predictor. Correlation does not have this kind of interpretation.