Sometimes we are interested in more than one linear combination or variable. In this case, we may be interested in the association between those two linear combinations. More specifically, we can consider the covariance between two linear combinations of the data.
Consider the pair of linear combinations:
\(Y_1 = \sum_{j=1}^{p}c_jX_j \;\;\; \text{and} \;\;\; Y_2 = \sum_{k=1}^{p}d_kX_k\)
Here \(Y_{1}\) and \(Y_{2}\) are two distinct linear combinations. Both variables \(Y_{1}\) and \(Y_{2}\) are going to be random and so they will be potentially correlated. We can assess the association between these variables using the covariance as the two vectors c and d are distinct.
The population covariance between \(Y_{1}\) and \(Y_{2}\) is obtained by summing over all pairs of variables. We then multiply respective coefficients from the two linear combinations as \(d_{j}\) times \(d_{k}\) times the covariances between j and k.
- Population Covariance between two linear combinations
- \(cov(Y_1, Y_2) = \sum_{j=1}^{p}\sum_{k=1}^{p}c_jd_k\sigma_{jk}\)
We can then estimate the population covariance by using the sample covariance. This is obtained by simply substituting the sample covariances between the pairs of variables for the population covariances between the pairs of variables.
- Sample Covariance between two linear combinations
- \(s_{Y_1,Y_2}= \sum_{j=1}^{p}\sum_{k=1}^{p}c_jd_ks_{jk}\)
Correlation
The population correlation between variables \(Y_{1}\) and \(Y_{2}\) can be obtained by using the usual formula of the covariance between \(Y_{1}\) and \(Y_{2}\) divided by the standard deviation for the two variables as shown below.
- Population Correlation between two linear combinations
- \(\rho_{Y_1,Y_2} = \dfrac{\sigma_{Y_1,Y_2}}{\sigma_{Y_1}\sigma_{Y_2}}\)
This population correlation is estimated by the sample correlation where we simply substitute the sample quantities for the population quantities below
- Sample Correlation between two linear combinations
- \(r_{Y_1,Y_2} = \dfrac{s_{Y_1, Y_2}}{s_{Y_1}s_{Y_2}}\)
Example 2-5: Women’s Health Survey (Pop. Covariance and Correlation) Section
Here is the matrix of the data as was shown previously.
\(S = \left(\begin{array}{RRRRR}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\)
We may wish to define the total intake of vitamins A and C in mg as before.
\(Y _ { 1 } = 0.001 X _ { 4 } + X _ { 5 }\)
and we may also want to take a look at the total intake of calcium and iron:
\(Y _ { 2 } = X _ { 1 } + X _ { 2 }\)
Then the sample covariance between \(Y_{1}\) and \(Y_{2}\) can then be obtained by looking at the covariances between each pair of the component variables time the respective coefficients. So in this case we are looking at pairing \(X_{1}\) and \(X_{4}\), \(X_{1}\) and \(X_{5}\), \(X_{2}\) and \(X_{4}\), and \(X_{2}\) and \(X_{5}\). You will notice that in the expression below \(s_{41}\), \(s_{42}\), \(s_{51}\) and \(s_{52}\) all appear. The variables are taken from the matrix above and substituted into the expression and the math is carried out below.
\begin{align} s_{Y_1, Y_2} & = 0.001s_{41} + 0.001s_{42} + s_{51}+s_{52}\\& = 0.001 \times 102411.1 + 0.001 \times 2383.2 + 6701.6 +137.7\\ & = 102.4 + 2.4 + 6701.6 + 137.7\\ & = 6944.1 \end{align}
You should be able at this point to be able to confirm that the sample variance of \(Y_{2}\) is 159,745.4 as shown below:
\begin{align} s^2_{Y_2} & = s_{11}+s_{22}+2s_{12}\\ & = 157829.4 + 35.8 + 2 \times 940.1\\ & = 157829.4 + 35.8 + 1880.2 \\ & = 159745.4 \end{align}
And, if we care to obtain the sample correlation between \(Y_{1}\) and \(Y_{2}\), we take the sample covariance that we just obtained and divide it by the square root of the product of the two component variances, 5463.1, for \(Y_{1}\), which we obtained earlier, and 159745.4, which we just obtained above. Following this math through, we end up with a correlation of about 0.235 as shown below.
\begin{align} r_{Y_1,Y_2} &= \dfrac{s_{Y_1, Y_2}}{s_{Y_1}s_{Y_2}}\\ &= \dfrac{6944.1}{\sqrt{5463.1 \times 159745.4}}\\&=0.235 \end{align}