1.9  Hypothesis Test for the Population Correlation Coefficient
1.9  Hypothesis Test for the Population Correlation CoefficientThere is one more point we haven't stressed yet in our discussion about the correlation coefficient r and the coefficient of determination \(R^{2}\) — namely, the two measures summarize the strength of a linear relationship in samples only. If we obtained a different sample, we would obtain different correlations, different \(R^{2}\) values, and therefore potentially different conclusions. As always, we want to draw conclusions about populations, not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient \(\rho\) (the greek letter "rho").
In general, a researcher should use the hypothesis test for the population correlation \(\rho\) to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.
Consider evaluating whether or not a linear relationship exists between skin cancer mortality and latitude. We will see in Lesson 2 that we can perform either of the following tests:
 ttest for testing \(H_{0} \colon \beta_{1}= 0\)
 ANOVA Ftest for testing \(H_{0} \colon \beta_{1}= 0\)
For this example, it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response.
By contrast, suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age (Husband and Wife data). In this case, one could treat the husband's age as the response:
...or one could treat the wife's age as the response:
In cases such as these, we answer our research question concerning the existence of a linear relationship by using the ttest for testing the population correlation coefficient \(H_{0}\colon \rho = 0\).
Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient \(\rho\).
Steps for Hypothesis Testing for \(\boldsymbol{\rho}\)

Step 1: Hypotheses
First, we specify the null and alternative hypotheses:
 Null hypothesis \(H_{0} \colon \rho = 0\)
 Alternative hypothesis \(H_{A} \colon \rho ≠ 0\) or \(H_{A} \colon \rho < 0\) or \(H_{A} \colon \rho > 0\)

Step 2: Test Statistic
Second, we calculate the value of the test statistic using the following formula:
Test statistic: \(t^*=\dfrac{r\sqrt{n2}}{\sqrt{1R^2}}\)

Step 3: PValue
Third, we use the resulting test statistic to calculate the Pvalue. As always, the Pvalue is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The Pvalue is determined by referring to a tdistribution with n2 degrees of freedom.

Step 4: Decision
Finally, we make a decision:
 If the Pvalue is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. We conclude that "there is sufficient evidence at the\(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y."
 If the Pvalue is larger than the significance level \(\alpha\), we fail to reject the null hypothesis. We conclude "there is not enough evidence at the \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y."
Example 15: Husband and Wife Data
Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test \(H_{0} \colon \rho = 0\) against the alternative \(H_{A} \colon \rho ≠ 0\), we obtain the following test statistic:
\begin{align} t^*&=\dfrac{r\sqrt{n2}}{\sqrt{1R^2}}\\ &=\dfrac{0.939\sqrt{1702}}{\sqrt{10.939^2}}\\ &=35.39\end{align}
To obtain the Pvalue, we need to compare the test statistic to a tdistribution with 168 degrees of freedom (since 170  2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a twosided test, multiply the probability by 2. Minitab helps us out here:
Student's t distribution with 168 DF
x  P(X<= x) 

35.3900  1.0000 
The output tells us that the probability of getting a teststatistic smaller than 35.39 is greater than 0.999. Therefore, the probability of getting a teststatistic greater than 35.39 is less than 0.001. As illustrated in the following video, we multiply by 2 and determine that the Pvalue is less than 0.002.
Since the Pvalue is small — smaller than 0.05, say — we can reject the null hypothesis. There is sufficient statistical evidence at the \(\alpha = 0.05\) level to conclude that there is a significant linear relationship between a husband's age and his wife's age.
Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:
Correlation: WAge, HAge
Pearson correlation of WAge and HAge = 0.939
PValue = 0.000
Final Note
One final note ... as always, we should clarify when it is okay to use the ttest for testing \(H_{0} \colon \rho = 0\)? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:
 When it is not obvious which variable is the response.
 When the (x, y) pairs are a random sample from a bivariate normal population.
 For each x, the y's are normal with equal variances.
 For each y, the x's are normal with equal variances.
 Either, y can be considered a linear function of x.
 Or, x can be considered a linear function of y.
 The (x, y) pairs are independent