Lesson 15: Tests Concerning Regression and Correlation

Lesson 15: Tests Concerning Regression and Correlation

Overview

In lessons 35 and 36, we learned how to calculate point and interval estimates of the intercept and slope parameters, \(\alpha\) and \(\beta\), of a simple linear regression model:

\(Y_i=\alpha+\beta(x_i-\bar{x})+\epsilon_i\)

with the random errors \(\epsilon_i\) following a normal distribution with mean 0 and variance \(\sigma^2\). In this lesson, we'll learn how to conduct a hypothesis test for testing the null hypothesis that the slope parameter equals some value, \(\beta_0\), say. Specifically, we'll learn how to test the null hypothesis \(H_0:\beta=\beta_0\) using a \(t\)-statistic.

Now, perhaps it is not a point that has been emphasized yet, but if you take a look at the form of the simple linear regression model, you'll notice that the response \(Y\)'s are denoted using a capital letter, while the predictor \(x\)'s are denoted using a lowercase letter. That's because, in the simple linear regression setting, we view the predictors as fixed values, whereas we view the responses as random variables whose possible values depend on the population \(x\) from which they came. Suppose instead that we had a situation in which we thought of the pair \((X_i, Y_i)\) as being a random sample, \(i=1, 2, \ldots, n\), from a bivariate normal distribution with parameters \(\mu_X\), \(\mu_Y\), \(\sigma^2_X\), \(\sigma^2_Y\) and \(\rho\). Then, we might be interested in testing the null hypothesis \(H_0:\rho=0\), because we know that if the correlation coefficient is 0, then \(X\) and \(Y\) are independent random variables. For this reason, we'll learn, not one, but three (!) possible hypothesis tests for testing the null hypothesis that the correlation coefficient is 0. Then, because we haven't yet derived an interval estimate for the correlation coefficient, we'll also take the time to derive an approximate confidence interval for \(\rho\).


15.1 - A Test for the Slope

15.1 - A Test for the Slope

Once again we've already done the bulk of the theoretical work in developing a hypothesis test for the slope parameter \(\beta\) of a simple linear regression model when we developed a \((1-\alpha)100\%\) confidence interval for \(\beta\). We had shown then that:

\(T=\dfrac{\hat{\beta}-\beta}{\sqrt{\frac{MSE}{\sum(x_i-\bar{x})^2}}}\)

follows a \(t_{n-2}\) distribution. Therefore, if we're interested in testing the null hypothesis:

\(H_0:\beta=\beta_0\)

against any of the alternative hypotheses:

\(H_A:\beta \neq \beta_0\), \(H_A:\beta < \beta_0\), \(H_A:\beta > \beta_0\)

we can use the test statistic:

\(t=\dfrac{\hat{\beta}-\beta_0}{\sqrt{\frac{MSE}{\sum(x_i-\bar{x})^2}}}\)

and follow the standard hypothesis testing procedures. Let's take a look at an example.

Example 15-1

Alligator warning sign

In alligators' natural habitat, it is typically easier to observe the length of an alligator than it is the weight. This data set contains the log weight (\(y\)) and log length (\(x\)) for 15 alligators captured in central Florida. A scatter plot of the data suggests that there is a linear relationship between the response \(y\) and the predictor \(x\). Therefore, a wildlife researcher is interested in fitting the linear model:

\(Y_i=\alpha+\beta x_i+\epsilon_i\)

to the data. She is particularly interested in testing whether there is a relationship between the length and weight of alligators. At the \(\alpha=0.05\) level, perform a test of the null hypothesis \(H_0:\beta=0\) against the alternative hypothesis \(H_A:\beta \neq 0\).

Answer

The easiest way to perform the hypothesis test is to let Minitab do the work for us! Under the Stat menu, selecting Regression, and then Regression, and specifying the response logW (for log weight) and the predictor logL (for log length), we get:

The regression equation is
logW = - 8.48 + 3.43 logL

Predictor Coef SE Coef T P
Constant -8.4761 0.5007 -16.93 0.000
logL 3.4311 0.1330 25.80 0.000

Analysis of Variance

Source DF SS MS F P
Regression 1 10.064 10.064 665.81    0.000
Residual Error 13 0.196 0.015    

Total

14 10.260

 

   

Easy as pie! Minitab tells us that the test statistic is \(t=25.80\) (in blue) with a \(p\)-value (0.000) that is less than 0.001. Because the \(p\)-value is less than 0.05, we reject the null hypothesis at the 0.05 level. There is sufficient evidence to conclude that the slope parameter does not equal 0. That is, there is sufficient evidence, at the 0.05 level, to conclude that there is a linear relationship, among the population of alligators, between the log length and log weight.

Of course, since we are learning this material for just the first time, perhaps we could go through the calculation of the test statistic at least once. Letting Minitab do some of the dirtier calculations for us, such as calculating:

\(\sum(x_i-\bar{x})^2=0.8548\)

as well as determining that \(MSE=0.015\) and that the slope estimate = 3.4311, we get:

\(t=\dfrac{\hat{\beta}-\beta_0}{\sqrt{\frac{MSE}{\sum(x_i-\bar{x})^2}}}=\dfrac{3.4311-0}{\sqrt{0.015/0.8548}}=25.9\)

which is the test statistic that Minitab calculated... well, with just a bit of round-off error.


15.2 - Three Tests for Rho

15.2 - Three Tests for Rho
muffins

The hypothesis test for the slope \(\beta\) that we developed on the previous page was developed under the assumption that a response \(Y\) is a linear function of a nonrandom predictor \(x\). This situation occurs when the researcher has complete control of the values of the variable \(x\). For example, a researcher might be interested in modeling the linear relationship between the temperature \(x\) of an oven and the moistness \(y\) of chocolate chip muffins. In this case, the researcher sets the oven temperatures (in degrees Fahrenheit) to 350, 360, 370, and so on, and then observes the values of the random variable \(Y\), that is, the moistness of the baked muffins. In this case, the linear model:

\(Y_i=\alpha+\beta x_i+\epsilon_i\)

implies that the average moistness:

\(E(Y)=\alpha+\beta x\)

is a linear function of the temperature setting.

There are other situations, however, in which the variable \(x\) is not nonrandom (yes, that's a double negative!), but rather an observed value of a random variable \(X\). For example, a fisheries researcher may want to relate the age \(Y\) of a sardine to its length \(X\). If a linear relationship could be established, then in the future fisheries researchers could predict the age of a sardine simply by measuring its length. In this case, the linear model:

\(Y_i=\alpha+\beta x_i+\epsilon_i\)

implies that the average age of a sardine, given its length is \(X=x\):

\(E(Y|X=x)=\alpha+\beta x\)

is a linear function of the length. That is, the conditional mean of \(Y\) given \(X=x\) is a linear function. Now, in this second situation, in which both \(X\) and \(Y\) are deemed random, we typically assume that the pairs \((X_1, Y_1), (X_2, Y_2), \ldots, (X_n, Y_n)\) are a random sample from a bivariate normal distribution with means \(\mu_X\) and \(\mu_Y\), variances \(\sigma^2_X\) and \(\sigma^2_Y\), and correlation coefficient \(\rho\). If that's the case, it can be shown that the conditional mean:

\(E(Y|X=x)=\alpha+\beta x\)

must be of the form:

\(E(Y|X=x)=\left(\mu_Y-\rho \dfrac{\sigma_Y}{\sigma_X} \mu_X\right)+\left(\rho \dfrac{\sigma_Y}{\sigma_X}\right)x\)

That is:

\(\beta=\rho \dfrac{\sigma_Y}{\sigma_X}\)

Now, for the case where \((X_i, Y_i)\) has a bivariate distribution, the researcher may not necessarily be interested in estimating the linear function:

\(E(Y|X=x)=\alpha+\beta x\)

but rather simply knowing whether \(X\) and \(Y\) are independent. In STAT 414, we've learned that if \((X_i, Y_i)\) follows a bivariate normal distribution, then testing for the independence of \(X\) and \(Y\) is equivalent to testing whether the correlation coefficient \(\rho\) equals 0. We'll now work on developing three different hypothesis tests for testing \(H_0:\rho=0\) assuming \((X_i, Y_i)\) follows a bivariate normal distribution.

A T-Test for Rho

Given our wordy prelude above, this test may be the simplest of all of the tests to develop. That's because we argued above that if \((X_i, Y_i)\) follows a bivariate normal distribution, and the conditional mean is a linear function:

\(E(Y|X=x)=\alpha+\beta x\)

then:

\(\beta=\rho \dfrac{\sigma_Y}{\sigma_X}\)

That suggests, therefore, that testing for \(H_0:\rho=0\) against any of the alternative hypotheses \(H_A:\rho\neq 0\), \(H_A:\rho> 0\) and \(H_A:\rho< 0\) is equivalent to testing \(H_0:\beta=0\) against the corresponding alternative hypothesis \(H_A:\beta\neq 0\), \(H_A:\beta<0\) and \(H_A:\beta>0\). That is, we can simply compare the test statistic:

\(t=\dfrac{\hat{\beta}-0}{\sqrt{MSE/\sum(x_i-\bar{x})^2}}\)

to a \(t\) distribution with \(n-2\) degrees of freedom. It should be noted, though, that the test statistic can be instead written as a function of the sample correlation coefficient:

\(R=\dfrac{\dfrac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X}) (Y_i-\bar{Y})}{\sqrt{\dfrac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2} \sqrt{\dfrac{1}{n-1} \sum\limits_{i=1}^n (Y_i-\bar{Y})^2}}=\dfrac{S_{xy}}{S_x S_y}\)

That is, the test statistic can be alternatively written as:

\(t=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)

and because of its algebraic equivalence to the first test statistic, it too follows a \(t\) distribution with \(n-2\) degrees of freedom. Huh? How are the two test statistics algebraically equivalent? Well, if the following two statements are true:

  1. \(\hat{\beta}=\dfrac{\dfrac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X}) (Y_i-\bar{Y})}{\dfrac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2}=\dfrac{S_{xy}}{S_x^2}=R\dfrac{S_y}{S_x}\)

  2. \(MSE=\dfrac{\sum\limits_{i=1}^n(Y_i-\hat{Y}_i)^2}{n-2}=\dfrac{\sum\limits_{i=1}^n\left[Y_i-\left(\bar{Y}+\dfrac{S_{xy}}{S_x^2} (X_i-\bar{X})\right) \right]^2}{n-2}=\dfrac{(n-1)S_Y^2 (1-R^2)}{n-2}\)

then simple algebra illustrates that the two test statistics are indeed algebraically equivalent:

\(\displaystyle{t=\frac{\hat{\beta}}{\sqrt{\frac{MSE}{\sum (x_i-\bar{x})^2}}} =\frac{r\left(\frac{S_y}{S_x}\right)}{\sqrt{\frac{(n-1)S^2_y(1-r^2)}{(n-2)(n-1)S^2_x}}}=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}} \)

Now, for the veracity of those two statements? Well, they are indeed true. The first one requires just some simple algebra. The second one requires a bit of trickier algebra that you'll soon be asked to work through for homework.

An R-Test for Rho

It would be nice to use the sample correlation coefficient \(R\) as a test statistic to test more general hypotheses about the population correlation coefficient:

\(H_0:\rho=\rho_0\)

but the probability distribution of \(R\) is difficult to obtain. It turns out though that we can derive a hypothesis test using just \(R\) provided that we are interested in testing the more specific null hypothesis that \(X\) and \(Y\) are independent, that is, for testing \(H_0:\rho=0\).

Theorem

Provided that \(\rho=0\), the probability density function of the sample correlation coefficient \(R\) is:

\(g(r)=\dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]}(1-r^2)^{(n-4)/2}\)

over the support \(-1<r<1\).

Proof

We'll use the distribution function technique, in which we first find the cumulative distribution function \(G(r)\), and then differentiate it to get the desired probability density function \(g(r)\). The cumulative distribution function is:

\(G(r)=P(R \leq r)=P \left(\dfrac{R\sqrt{n-2}}{\sqrt{1-R^2}}\leq \dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)=P\left(T \leq \dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)\)

The first equality is just the definition of the cumulative distribution function, while the second and third equalities come from the definition of the \(T\) statistic as a function of the sample correlation coefficient \(R\). Now, using what we know of the p.d.f. \(h(t)\) of a \(T\) random variable with \(n-2\) degrees of freedom, we get:

\(G(r)=\int^{\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}}_{-\infty} h(t)dt=\int^{\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}}_{-\infty} \dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]} \dfrac{1}{\sqrt{n-2}}\left(1+\dfrac{t^2}{n-2}\right)^{-\frac{(n-1)}{2}} dt\)

Now, it's just a matter of taking the derivative of the c.d.f. \(G(r)\) to get the p.d.f. \(g(r)\)). Using the Fundamental Theorem of Calculus, in conjunction with the chain rule, we get:

\(g(r)=h\left(\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right) \dfrac{d}{dr}\left(\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)\)

Focusing first on the derivative part of that equation, using the quotient rule, we get:

\(\dfrac{d}{dr}\left[\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right]=\dfrac{(1-r^2)^{1/2} \cdot \sqrt{n-2}-r\sqrt{n-2}\cdot \frac{1}{2}(1-r^2)^{-1/2} \cdot -2r }{(\sqrt{1-r^2})^2}\)

Simplifying, we get:

\(\dfrac{d}{dr}\left[\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right]=\sqrt{n-2}\left[ \dfrac{(1-r^2)^{1/2}+r^2 (1-r^2)^{-1/2} }{1-r^2} \right]\)

Now, if we multiply by 1 in a special way, that is, this way:

\(\dfrac{d}{dr}\left[\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right]=\sqrt{n-2}\left[ \dfrac{(1-r^2)^{1/2}+r^2 (1-r^2)^{-1/2} }{1-r^2} \right]\left(\frac{(1-r^2)^{1/2}}{(1-r^2)^{1/2}}\right) \)

and then simplify, we get:

\(\dfrac{d}{dr}\left[\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right]=\sqrt{n-2}\left[ \dfrac{1-r^2+r^2 }{(1-r^2)^{3/2}} \right]=\sqrt{n-2}(1-r^2)^{-3/2}\)

Now, looking back at \(g(r)\), let's work on the \(h(.)\) part. Replacing the function in the one place where a t appears in the p.d.f. of a \(T\) random variable with \(n-2\) degrees of freedom, we get:

\( h\left(\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)= \frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{1}{2}\right)\Gamma\left(\frac{n-2}{2}\right)}\left(\frac{1}{\sqrt{n-2}}\right)\left[1+\frac{\left(\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)^2}{n-2} \right]^{-\frac{n-1}{2}} \)

Canceling a few things out we get:

\(h\left(\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)=\dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]}\cdot \dfrac{1}{\sqrt{n-2}}\left(1+\dfrac{r^2}{1-r^2}\right)^{-\frac{(n-1)}{2}}\)

Now, because:

\(\left(1+\dfrac{r^2}{1-r^2}\right)^{-\frac{(n-1)}{2}}=\left(\dfrac{1-r^2+r^2}{1-r^2}\right)^{-\frac{(n-1)}{2}}=\left(\dfrac{1}{1-r^2}\right)^{-\frac{(n-1)}{2}}=(1-r^2)^{\frac{(n-1)}{2}}\)

we finally get:

\(h\left(\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}\right)=\dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]}\cdot \dfrac{1}{\sqrt{n-2}}(1-r^2)^{\frac{(n-1)}{2}}\)

We're almost there! We just need to multiply the two parts together. Doing so, we get:

\(g(r)=\left[\frac{\Gamma\left(\frac{n-1}{2}\right)}{\Gamma\left(\frac{1}{2}\right)\Gamma\left(\frac{n-2}{2}\right)}\left(\frac{1}{\sqrt{n-2}}\right)(1-r^2)^{\frac{n-1}{2}}\right]\left[\sqrt{n-2}(1-r^2)^{-3/2}\right]\)

which simplifies to:

\(g(r)=\dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]}(1-r^2)^{(n-4)/2}\)

over the support \(-1<r<1\), as was to be proved.

Now that we know the p.d.f. of \(R\), testing \(H_0:\rho=0\) against any of the possible alternative hypotheses just involves integrating \(g(r)\) to find the critical value(s) to ensure that \(\alpha\), the probability of a Type I error is small. For example, to test \(H_0:\rho=0\) against the alternative \(H_A:\rho>0\), we find the value \(r_\alpha(n-2)\) such that:

\(P(R \geq r_\alpha(n-2))=\int_{r_\alpha(n-2)}^1 \dfrac{\Gamma[(n-1)/2]}{\Gamma(1/2)\Gamma[(n-2)/2]}(1-r^2)^{\frac{(n-4)}{2}}dr=\alpha\)

Yikes! Do you have any interest in integrating that function? Well, me neither! That's why we'll instead use an \(R\) Table, such as the one we have in Table IX at the back of our textbook.

An Approximate Z-Test for Rho

Okay, the derivation for this hypothesis test is going to be MUCH easier than the derivation for that last one. That's because we aren't going to derive it at all! We are going to simply state, without proof, the following theorem.

Theorem

The statistic:

\(W=\dfrac{1}{2}\ln\dfrac{1+R}{1-R}\)

follows an approximate normal distribution with mean \(E(W)=\dfrac{1}{2}\ln\dfrac{1+\rho}{1-\rho}\) and variance \(Var(W)=\dfrac{1}{n-3}\).

The theorem, therefore, allows us to test the general null hypothesis \(H_0:\rho=\rho_0\) against any of the possible alternative hypotheses comparing the test statistic:

\(Z=\dfrac{\dfrac{1}{2}ln\dfrac{1+R}{1-R}-\dfrac{1}{2}ln\dfrac{1+\rho_0}{1-\rho_0}}{\sqrt{\dfrac{1}{n-3}}}\)

to a standard normal \(N(0,1)\) distribution.

What? We've looked at no examples yet on this page? Let's take care of that by closing with an example that utilizes each of the three hypothesis tests we derived above.

Example 15-2

Student doing calculus work on paper

An admissions counselor at a large public university was interested in learning whether freshmen calculus grades are independent of high school math achievement test scores. The sample correlation coefficient between the mathematics achievement test scores and calculus grades for a random sample of \(n=10\) college freshmen was deemed to be 0.84.

Does this observed sample correlation coefficient suggest, at the \(\alpha=0.05\) level, that the population of freshmen calculus grades are independent of the population of high school math achievement test scores?

Answer

The admissions counselor is interested in testing:

\(H_0:\rho=0\) against \(H_A:\rho \neq 0\)

Using the \(t\)-statistic we derived, we get:

\(t=\dfrac{r\sqrt{n-2}}{\sqrt{1-r^2}}=\dfrac{0.84\sqrt{8}}{\sqrt{1-0.84^2}}=4.38\)

We reject the null hypothesis if the test statistic is greater than 2.306 or less than −2.306.

2.306-2.306

Because \(t=4.38>2.306\), we reject the null hypothesis in favor of the alternative hypothesis. There is sufficient evidence at the 0.05 level to conclude that the population of freshmen calculus grades are not independent of the population of high school math achievement test scores.

Using the R-statistic, with 8 degrees of freedom, Table IX in the back of the book tells us to reject the null hypothesis if the absolute value of \(R\) is greater than 0.6319. Because our observed \(r=0.84>0.6319\), we again reject the null hypothesis in favor of the alternative hypothesis. There is sufficient evidence at the 0.05 level to conclude that freshmen calculus grades are not independent of high school math achievement test scores.

Using the approximate Z-statistic, we get:

\(z=\dfrac{\dfrac{1}{2}ln\left(\dfrac{1+0.84}{1-0.84}\right)-\dfrac{1}{2}ln\left(\dfrac{1+0}{1-0}\right)}{\sqrt{1/7}}=3.23\)

In this case, we reject the null hypothesis if the absolute value of \(Z\) were greater than 1.96. It clearly is, and so we again reject the null hypothesis in favor of the alternative hypothesis. There is sufficient evidence at the 0.05 level to conclude that freshmen calculus grades are not independent of high school math achievement test scores.


15.3 - An Approximate Confidence Interval for Rho

15.3 - An Approximate Confidence Interval for Rho

To develop an approximate \((1-\alpha)100\%\) confidence interval for \(\rho\), we'll use the normal approximation for the statistic \(Z\) that we used on the previous page for testing \(H_0:\rho=\rho_0\).

Theorem

An approximate \((1-\alpha)100\%\) confidence interval for \(\rho\) is \(L\leq \rho \leq U\) where:

\(L=\dfrac{1+R-(1-R)\text{exp}(2z_{\alpha/2}/\sqrt{n-3})}{1+R+(1-R)\text{exp}(2z_{\alpha/2}/\sqrt{n-3})}\)

and

\(U=\dfrac{1+R-(1-R)\text{exp}(-2z_{\alpha/2}/\sqrt{n-3})}{1+R+(1-R)\text{exp}(-2z_{\alpha/2}/\sqrt{n-3})}\)

Proof

We previously learned that:

\(Z=\dfrac{\dfrac{1}{2}ln\dfrac{1+R}{1-R}-\dfrac{1}{2}ln\dfrac{1+\rho}{1-\rho}}{\sqrt{\dfrac{1}{n-3}}}\)

follows at least approximately a standard normal \(N(0,1)\) distribution. So, we can do our usual trick of starting with a probability statement:

\(P\left(-z_{\alpha/2} \leq \dfrac{\dfrac{1}{2}ln\dfrac{1+R}{1-R}-\dfrac{1}{2}ln\dfrac{1+\rho}{1-\rho}}{\sqrt{\dfrac{1}{n-3}}} \leq z_{\alpha/2} \right)\approx 1-\alpha\)

and manipulating the quantity inside the parentheses:

\(-z_{\alpha/2} \leq \dfrac{\dfrac{1}{2}ln\dfrac{1+R}{1-R}-\dfrac{1}{2}ln\dfrac{1+\rho}{1-\rho}}{\sqrt{\dfrac{1}{n-3}}} \leq z_{\alpha/2}\)

to get ..... can you fill in the details?! ..... the formula for a \((1-\alpha)100\%\) confidence interval for \(\rho\):

\(L\leq \rho \leq U\)

where:

\(L=\dfrac{1+R-(1-R)\text{exp}(2z_{\alpha/2}/\sqrt{n-3})}{1+R+(1-R)\text{exp}(2z_{\alpha/2}/\sqrt{n-3})}\) and \(U=\dfrac{1+R-(1-R)\text{exp}(-2z_{\alpha/2}/\sqrt{n-3})}{1+R+(1-R)\text{exp}(-2z_{\alpha/2}/\sqrt{n-3})}\)

as was to be proved!

Example 15-2 (Continued)

student doing calculus work

An admissions counselor at a large public university was interested in learning whether freshmen calculus grades are independent of high school math achievement test scores. The sample correlation coefficient between the mathematics achievement test scores and calculus grades for a random sample of \(n=10\) college freshmen was deemed to be 0.84.

Estimate the population correlation coefficient \(\rho\) with 95% confidence.

Answer

Because we are interested in a 95% confidence interval, we use \(z_{0.025}=1.96\). Therefore, the lower limit of an approximate 95% confidence interval for \(\rho\) is:

\(L=\dfrac{1+0.84-(1-0.84)\text{exp}(2(1.96)/\sqrt{10-3})}{1+0.84+(1-0.84)\text{exp}(2(1.96)/\sqrt{10-3})}=0.447\)

and the upper limit of an approximate 95% confidence interval for \(\rho\) is:

\(U=\dfrac{1+0.84-(1-0.84)\text{exp}(-2(1.96)/\sqrt{10-3})}{1+0.84+(1-0.84)\text{exp}(-2(1.96)/\sqrt{10-3})}=0.961\)

We can be (approximately) 95% confident that the correlation between the population of high school mathematics achievement test scores and freshmen calculus grades is between 0.447 and 0.961. (Not a particularly useful interval, I might say! It might behoove the admissions counselor to collect data on a larger sample, so that he or she can obtain a narrower confidence interval.)


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility