Lesson 6: Multivariate Conditional Distribution and Partial Correlation

Overview

In a multivariable setting partial correlations are used to explore the relationships between pairs of variables after we take into account the values of other variables.

For example, in a study of the relationship between blood pressure and blood cholesterol, it might be thought that both of these variables are related to the age of the subject. That is, we might be interested in looking at the correlation between these two variables for subjects of the same age.

Objectives

Upon completion of this lesson, you should be able to:

Construct a conditional distribution;
Explain the definition of a partial correlation;
Compute partial correlations using SAS and Minitab
Test the hypothesis that the partial correlation is equal to zero, and draw appropriate conclusions from that test;
Compute and interpret confidence intervals for partial correlations.

6.1 - Conditional Distributions

Partial correlations may only be defined after introducing the concept of conditional distributions. We will restrict ourselves to conditional distributions from multivariate normal distributions only.

If we have a p × 1 random vector \(\mathbf{Z}\), we can partition it into two random vectors \(\mathbf{X}_1\) and \(\mathbf{X}_2\) where \(\mathbf{X}_1\) is a p₁ × 1 vector and \(\mathbf{X}_2\) is a p₂ × 1 vector as shown in the expression below:

\(\textbf{Z} = \left(\begin{array}{c} \textbf{X}_1 \\ \textbf{X}_2\end{array}\right)\)

Conditional Distribution Properties

Further, suppose that we partition the mean vector and covariance matrix in a corresponding manner. That is,

\(\boldsymbol{\mu} = \left(\begin{array}{c}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{array}\right)\) and \(\mathbf{\Sigma} = \left(\begin{array}{cc}\mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12}\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{array}\right)\)

For instance, \(\boldsymbol{\mu}_{1}\) gives the means for the variables in the vector \(\mathbf{X}_{1}\), and \(\Sigma _ { 11 }\) gives variances and covariances for vector \(\mathbf{X}_{1}\). The matrix \(\Sigma _ { 12 }\) gives covariances between variables in vector \(\mathbf{X_{1}}\)and vector \(\mathbf{X_{2}}\) (as does matrix \(\Sigma _ { 21 }\)).

Any distribution for a subset of variables from a multivariate normal, conditional on known values for another subset of variables, is a multivariate normal distribution.

Conditional Distribution: The conditional distribution of \(\mathbf{X}_{1}\)given known values for \(\mathbf{X}_2=\mathbf{x}_{2}\)is a multivariate normal with:; \begin{align} \text{mean vector} & = \mathbf{\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2)}\\ \text{covariance matrix} & = \mathbf{\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}} \end{align}

Bivariate Case

Suppose that we have p = 2 variables with a multivariate normal distribution. The conditional distribution of \(X_{1}\) given knowledge of \(x_{2}\) is a normal distribution with

\begin{align} \text{Mean} & = \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2) \\ \text{Variance} & = \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\end{align}

Example 6-1: Conditional Distribution of Weight Given Height for College Men

Suppose that the weights (lbs) and heights (inches) of undergraduate college men have a multivariate normal distribution with mean vector \(\mathbf{\mu} =
\left(\begin{array}{c} 175\\ 71 \end{array}\right)\) and covariance matrix \(\mathbf{\Sigma} = \left(\begin{array}{cc} 550 & 40\\ 40 & 8 \end{array}\right)\).

The conditional distribution of \(X_{1}\) weight given \(x_{2}\) = height is a normal distribution with

\begin{align} \text{Mean} &= \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2)\\[5pt] &= 175 + \frac{40}{8}(x_2-71) \\[5pt] &= -180+5x_2 \end{align}

\begin{align} \text{Variance} &= \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\\ &= 550-\frac{40^2}{8} \\[5pt] &= 350 \end{align}

For instance, for men with height = 70, weights are normally distributed with mean = -180 + 5(70) = 170 pounds and variance = 350. (So standard deviation \(\sqrt{350} = 18.71\) = pounds)

Notice that we have generated a simple linear regression model that relates weight to height.

Conditional Means, Variances and Covariances

So far, we have only considered unconditional population means, variances, covariances, and correlations. These quantities are defined under the setting in which the subjects are sampled from the entire population. For example, blood pressure and cholesterol may be measured from a sample selected from the population of all adult citizens of the United States.

To understand partial correlations, we must first consider conditional means, variances, and covariances. These quantities are defined for some subset of the population. For example, blood pressure and cholesterol may be measured from a sample of all 51-year-old citizens of the United States. Thus, we may consider the population mean blood pressure of 51-year-old citizens. This quantity is called the conditional mean blood pressure given that the subject is a 51-year-old citizen.

More than one condition may be applied. For example, we may consider the population mean blood pressure of 51-year-old citizens who weigh 190 pounds. This quantity is the conditional mean blood pressure given that the subject is 51 years old and weighs 190 pounds.

Conditional Mean

Let Y denote a vector of variables (e.g., blood pressure, cholesterol, etc.) of interest, and let X denote a vector of variables on which we wish to condition (e.g., age, weight, etc.). Then the conditional mean of Y given that X equals a particular value x (i.e., X = x) is denoted by

\(\mu_{\textbf{Y.x}} = E(\textbf{Y}|\textbf{X=x})\)

This is interpreted as the population mean of the vector Y given a sample from the subpopulation where X = x.

Conditional Variance

Let Y denote a variable of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional variance of Y given that X = x is

\(\sigma^2_{\textbf{Y.x}} = \text{var}(\mathbf{Y}|\textbf{X=x}) = E\{(\mathbf{Y}-\boldsymbol{\mu}_{\textbf{Y.x}})^2|\textbf{X=x}\}\)

Because Y is random, so is \(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) and hence\(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) has a conditional mean. This can be interpreted as the variance of Y given a sample from the subpopulation where X = x.

Conditional Covariance

Let \(Y_{i}\) and \(Y_{j}\) denote two variables of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional covariance between \(Y_{i}\) and \(Y_{j}\) given that X = x is

\(\sigma_{i,j.\textbf{x}} = \text{cov}(Y_i, Y_j| \textbf{X=x}) = E\{(Y_i-\mu_{Y_i.x})(Y_j-\mu_{Y_j.x})|\textbf{X=x}\}\)

Because \(Y_{i}\) and \(Y_{j}\) are random, so is \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) and hence \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) has a conditional mean. This can be interpreted as the covariance between \(Y_{i}\) and \(Y_{j}\) given a sample from the subpopulation where X = x.

Just as the unconditional variances and covariances can be collected into a variance-covariance matrix \(Σ\), the conditional variances and covariances can be collected into a conditional variance-covariance matrix:

\(\mathbf{\Sigma_{Y.x}}= \text{var}\mathbf{(Y|X=x)} = \left(\begin{array}{cccc}\sigma^2_{Y_1\textbf{.X}} & \sigma_{12\textbf{.X}} & \dots & \sigma_{1p\textbf{.X}}\\ \sigma_{21\textbf{.X}} & \sigma^2_{Y_2 \textbf{.X}} & \dots & \sigma_{2p \textbf{.X}} \\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{p1 \textbf{.X}} & \sigma_{p2 \textbf{.X}} & \dots & \sigma^2_{Y_p\textbf{.X}} \end{array}\right)\)

Partial Correlation

The partial correlation between \(Y_{j}\) and \(Y_{k}\) given X = x is:

\[\rho_{jk\textbf{.X}} = \dfrac{\sigma_{jk\text{.X}}}{\sigma_{Y_j\textbf{.X}}\sigma_{Y_k \textbf{.X}}}\]

Note! This is computed in the same way as unconditional correlations, replacing unconditional variances and covariances with conditional variances and covariances. This can be interpreted as the correlation between \(Y_{j}\) and \(Y_{k}\) given a sample from the subpopulation where X = x.

The Multivariate Normal Distribution

Next, let us return to the multivariate normal distribution. Suppose that we have a random vector Z that is partitioned into components X and Y that is realized from a multivariate normal distribution with a mean vector with corresponding components \(\boldsymbol{\mu}_{X}\) and \(\boldsymbol{\mu}_{Y}\), and variance-covariance matrix which has been partitioned into four parts as shown below:

\(\textbf{Z} = \left(\begin{array}{c}\textbf{X}\\ \textbf{Y} \end{array}\right) \sim N \left(\left(\begin{array}{c}\boldsymbol{\mu}_X\\\boldsymbol{\mu}_Y \end{array}\right), \left(\begin{array}{cc} \mathbf{\Sigma_{X}} & \mathbf{\Sigma_{XY}}\\ \mathbf{\Sigma_{YX}} & \mathbf{\Sigma_Y} \end{array}\right)\right)\)

Here, \(\mathbf{\Sigma_{X}}\) is the variance-covariance matrix for the random vector X. \( \mathbf{\Sigma_Y}\)is the variance-covariance matrix for the random vector Y. And, \(\mathbf{\Sigma_{YX}}\) contains the covariances between the elements of X and the corresponding elements of Y.

Then the conditional distribution of Y given that X takes a particular value x is also going to be a multivariate normal with conditional expectation as shown below:

\(E(\textbf{Y}|\textbf{X=x}) = \mathbf{\mu_Y} + \mathbf{\Sigma_{YX}\Sigma^{-1}_X}(\mathbf{x}-\boldsymbol{\mu}_X)\)

Note that this is equal to the mean of Y plus an adjustment. This adjustment involves the covariances between X and Y, the inverse of the variance-covariance matrix of X, and the difference between the value x and the mean for the random variable X. If little x is equal to \(\boldsymbol{\mu}_{X}\), then the conditional expectation of Y given that X is simply equal to the ordinary mean for Y.

In general, if there are positive covariances between the X's and Y's, then a value of X, greater than \(\boldsymbol{\mu}_{X}\) will result in a positive adjustment in the calculation of this conditional expectation. Conversely, if X is less than \(\boldsymbol{\mu}_{X}\), then we will end up with a negative adjustment.

The conditional variance-covariance matrix of Y given that X = x is equal to the variance-covariance matrix for Y minus the term that involves the covariances between X and Y and the variance-covariance matrix for X. For now, we will call this conditional variance-covariance matrix A as shown below:

\(\text{var}(\textbf{Y|X=x}) = \mathbf{\Sigma_Y - \Sigma_{YX}\Sigma^{-1}_X\Sigma_{XY}} = \textbf{A}\)

We are finally now ready to define the partial correlation between two variables \(Y_{j}\) and \(Y_{k}\) given that the random vector X = x. This is shown in the expression below:

\(\rho_{jk\textbf{.x}} = \dfrac{a_{jk}}{\sqrt{a_{jj}a_{kk}}}\)

This is basically the same formula that we would have for the ordinary correlation, in this case, calculated using the conditional variance-covariance matrix in place of the ordinary variance-covariance matrix.

Partial correlations can be estimated by substituting the sample variance-covariance matrixes for the population variance-covariance matrixes as shown in the expression below:

\(\widehat{\text{var}}(\textbf{Y|X=x}) = \mathbf{S_Y - S_{YX}S^{-1}_X S_{XY}}= \hat{\textbf{A}}\)

where

\(\mathbf{S} = \left(\begin{array}{cc} \mathbf{S_X} & \mathbf{S_{XY}}\\ \mathbf{S_{YX}} & \mathbf{S_Y}\end{array}\right)\)

is the sample variance-covariance matrix of the data.

Then the elements of the estimated conditional variance-covariance matrix can be used to obtain the partial correlation as shown below:

\(r_{jk\textbf{.x}} = \dfrac{\hat{a}_{jk}}{\sqrt{\hat{a}_{jj}\hat{a}_{kk}}}\)

If we are just conditioning on a single variable, then we have a simpler expression available to us. If we are looking at the partial correlation between variables j and k, given that the \(i^{th}\) variable takes the value of little \(y_{i}\), this calculation can be obtained by using the expression below. The partial correlation between \(Y_{j}\) and \(y_{k}\) given \(Y_{i}\) = \(y_{i}\) is estimated by:

\(r_{jk.i} = \dfrac{r_{jk}-r_{ij}r_{ik}}{\sqrt{(1-r^2_{ij})(1-r^2_{ik})}}\)

6.2 - Example: Wechsler Adult Intelligence Scale

Example 6-2: Wechsler Adult Intelligence Scale

To illustrate these calculations we will return to the Wechsler Adult Intelligence Scale data.

This dataset includes data on n = 37 subjects taking the Wechsler Adult Intelligence Test. This test is broken up into four components:

Information
Similarities
Arithmetic
Picture Completion

Recall from the last lesson that the correlation between Information and Similarities was \(r = 0.77153\).

Example
Example

The partial correlation between Information and Similarities given Arithmetic and Picture Completion may be computed using the SAS program shown below.

Download the SAS program: wechsler2.sas

Download the SAS Output: wechsler2.lst

Explore the code below to find the partial correlation of Information and Similarities given Arithmetic and Picture Completion using the Wechsler Adult Intelligence Test data in SAS.

Note: In the upper right-hand corner of the code block you will have the option of copying () the code to your clipboard or downloading () the file to your computer.

options ls=78;
title "Partial Correlations - Wechsler Data";  
/*The first two lines define the name of the data set with the name 'wechsler'
* and specify the path where the contents of the data set are read from.
* Since we have a header row, the first observation begins on the 2nd row, 
* and the delimiter option is needed because columns are separated by commas.
* The input statement is where we provide names for the variables in order 
* of the columns in the data set. If any were categorical (not the case here), 
* we would need to put a '$' character after its name.*/


data wechsler;
  infile "D:\Statistics\STAT 505\data\wechsler.csv" firstobs=2 delimiter=',';
  input id info sim arith pict;
  run;

 /*glm stands for 'general linear model'
  * the model statement specifies info and sim
  * as responses and arith and pict as predictors
  * the 'nouni' option suppresses univariate stats
  * the 'manova' statement requests multivariate
  * statistics for info and sim jointly
  * the 'printe' option provides the sum of squares
  * and cross products matrix for error*/

proc glm; 
  model info sim = arith pict / nouni;
  manova / printe;
  run;

Finding the partial correlation

To find the partial correlation of Information and Similarities given Arithmetic and Picture Completion:

Open the ‘wechsler’ data set in a new worksheet.
Stat > Regression > Regression > Fit Regression Model
1. Highlight and select ‘info’ for the Responses window and both ‘arith’ and ‘pic’ for the Continuous Predictors window.
2. Under Storage, choose ‘Residuals’ and then ‘OK’ twice. The residuals are displayed in a new column ‘RESI’ in the worksheet.
Repeat step 2. above with ‘sim’ substituted for ‘info’ as the response. The new residuals are stored in a new column ‘RESI_1’
Stat > Basic Statistics > Correlation
Highlight and select ‘RESI’ and ‘RESI_1’ to move them to the Variables window.
Select ‘OK’. The partial correlation is displayed in the results area.

Analysis

The output is in two tables. The first table gives the conditional variance-covariance matrix for Information and Similarities given Arithmetic and Picture Completion. The second table gives the partial correlation. Here we can see that the partial correlation is:

\(r = 0.71188\)

Conclusion: Comparing this to the previous value for the ordinary correlation, we can see that the partial correlation is not much smaller than the ordinary correlation. This suggests that little of the relationship between Information and Similarities can be explained by performance on the Arithmetic and Picture Completion portions of the test.

Interpretation

Partial correlations should be compared to the corresponding ordinary correlations. When interpreting partial correlations, three results can potentially occur. Each of these results yields a different interpretation.

Partial and ordinary correlations are approximately equal. This occurred in our present setting. This suggests that the relationship between the variables of interest cannot be explained by the remaining explanatory variables upon which we are conditioning.
Partial correlations are closer to zero than ordinary correlations. This is a common result and often what we anticipate. This suggests that the relationship between the variables of interest might be explained by their common relationships to the explanatory variables upon which we are conditioning. For example, we might find the ordinary correlation between blood pressure and blood cholesterol might be a high, strong positive correlation. We could potentially find a very small partial correlation between these two variables after we have taken into account the age of the subject. If this were the case, this might suggest that both variables are related to age, and the observed correlation is only due to their common relationship to age.
Partial correlations are farther from zero than ordinary correlations. This rarely happens. This situation would suggest that unless we take into account the explanatory variables upon which we are conditioning, the relationship between the variables of interest is hidden or masked.

6.3 - Testing for Partial Correlation

When discussing ordinary correlations we looked at tests for the null hypothesis that the ordinary correlation is equal to zero, against the alternative that it is not equal to zero. If that null hypothesis is rejected, then we look at confidence intervals for the ordinary correlation. Similar objectives can be considered for the partial correlation.

First, consider testing the null hypothesis that a partial correlation is equal to zero against the alternative that it is not equal to zero. This is expressed below:

\(H_0\colon \rho_{jk\textbf{.x}}=0\) against \(H_a\colon \rho_{jk\textbf{.x}}\ne 0\)

Here we will use a test statistic that is similar to the one we used for an ordinary correlation. This test statistic is shown below:

\(t = r_{jk\textbf{.x}}\sqrt{\frac{n-2-c}{1-r^2_{jk\textbf{.x}}}}\) \(\dot{\sim}\) \(t_{n-2-c}\)

The only difference between this and the previous one is what appears in the numerator of the radical. Before we just took n - 2. Here we take n - 2 - c, where c is the number of variables upon which we are conditioning. In our Adult Intelligence data, we conditioned on two variables so c would be equal to 2 in this case.

Under the null hypothesis, this test statistic will be approximately t-distributed, also with n - 2 - c degrees of freedom.

We would reject \(H_{o}\colon\) if the absolute value of the test statistic exceeded the critical value from the t-table evaluated at \(\alpha\) over 2:

\(|t| > t_{n-2-c, \alpha/2}\)

Example 6-3: Wechsler Adult Intelligence Data

For the Wechsler Adult Intelligence Data, we found a partial correlation of 0.711879, which we enter into the expression for the test statistic as shown below:

\(t = 0.711879 \sqrt{\dfrac{37-2-2}{1-0.711879^2}}=5.82\)

The sample size is 37, along with the 2 variables upon which we are conditioning is also substituted in. Carry out the math and we get a test statistic of 5.82 as shown above.

Here we want to compare this value to a t-distribution with 33 degrees of freedom for an \(\alpha\) = 0.01 level test. Therefore, we are going to look at the critical value for 0.005 in the table (because 33 does not appear to use the closest df that does not exceed 33 which is 30). In this case it is 2.75, meaning that \(t _ { ( d f , 1 - \alpha / 2 ) } = t _ { ( 33,0.995 ) } \) is 2.75.

Note! Some text tables provide the right tail probability (the graph at the top will have the area in the right tail shaded in) while other texts will provide a table with the cumulative probability - the graph will be shaded into the left. The concept is the same. For example, if the alpha was 0.01 then using the first text you would look under 0.005, and in the second text look under 0.995.

Because \(5.82 > 2.75 = t _ { ( 33,0.995 ) }\), we can reject the null hypothesis, \(H_{o}\) at the \(\alpha = 0.01\) level and conclude that there is a significant partial correlation between these two variables. In particular, we would include that this partial correlation is positive indicating that even after taking into account Arithmetic and Picture Completion, there is a positive association between Information and Similarities.

Confidence Interval for the partial correlation, \(\rho_{jk\textbf{.x}}\)

The procedure here is very similar to the procedure we used for ordinary correlation.

Steps

Compute Fisher's transformation of the partial correlation using the same formula as before.

\(z_{jk} = \dfrac{1}{2}\log \left( \dfrac{1+r_{jk\textbf{.X}}}{1-r_{jk\textbf{.X}}}\right) \)

In this case, for a large n, this Fisher transform variable will be possibly normally distributed. The mean is equal to the Fisher transform for the population value for this partial correlation, and the variance is equal to 1 over n-3-c.

\(z_{jk}\) \(\dot{\sim}\) \(N \left( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}, \dfrac{1}{n-3-c}\right)\)
Compute a \((1 - \alpha) × 100\%\) confidence interval for the Fisher transform correlation. This expression is shown below:

\( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}\)

This yields the bounds \(Z_{l}\) and \(Z_{u}\) as before.

\(\left(\underset{Z_l}{\underbrace{Z_{jk}-\dfrac{Z_{\alpha/2}}{\sqrt{n-3-c}}}}, \underset{Z_U}{\underbrace{Z_{jk}+\dfrac{Z_{\alpha/2}}{\sqrt{n-3-c}}}}\right)\)
Back transform to obtain the desired confidence interval for the partial correlation - \(\rho_{jk\textbf{.X}}\)

\(\left(\dfrac{e^{2Z_l}-1}{e^{2Z_l}+1}, \dfrac{e^{2Z_U}-1}{e^{2Z_U}+1}\right)\)

Example 6-3: Wechsler Adult Intelligence Data (Steps Shown)

The confidence interval is calculated by substituting the results from the Wechsler Adult Intelligence Data into the appropriate steps below:

Step 1: Compute the Fisher transform:

\begin{align} Z_{12} &= \dfrac{1}{2}\log \frac{1+r_{12.34}}{1-r_{12.34}}\\[5pt] &= \dfrac{1}{2} \log \frac{1+0.711879}{1-0.711879}\\[5pt] &= 0.89098 \end{align}
Step 2: Compute the 95% confidence interval for \( \frac{1}{2}\log \frac{1+\rho_{12.34}}{1-\rho_{12.34}}\) :

\begin{align} Z_l &= Z_{12}-Z_{0.025}/\sqrt{n-3-c}\\[5pt] & = 0.89098 - \dfrac{1.96}{\sqrt{37-3-2}}\\[5pt] &= 0.5445 \end{align}

\begin{align} Z_U &= Z_{12}+Z_{0.025}/\sqrt{n-3-c}\\[5pt] &= 0.89098 + \dfrac{1.96}{\sqrt{37-3-2}} \\[5pt] &= 1.2375 \end{align}
Step 3: Back-transform to obtain the 95% confidence interval for \(\rho_{12.34}\) :

\(\left(\dfrac{\exp\{2Z_l\}-1}{\exp\{2Z_l\}+1}, \dfrac{\exp\{2Z_U\}-1}{\exp\{2Z_U\}+1}\right)\)

\(\left(\dfrac{\exp\{2\times 0.5445\}-1}{\exp\{2\times 0.5445\}+1}, \dfrac{\exp\{2\times 1.2375\}-1}{\exp\{2\times 1.2375\}+1}\right)\)

\((0.4964, 0.8447)\)

Based on this result, we can conclude that we are 95% confident that the interval (0.4964, 0.8447) contains the partial correlation between Information and Similarities scores given scores on Arithmetic and Picture Completion.

6.4 - Summary

In this lesson we learned about:

Conditional means, variances, and covariances
The definition of the partial correlation and how it may be estimated for data sampled from a multivariate normal distribution
Interpretation of the partial correlation
Methods for testing the null hypothesis that there is zero partial correlation
How to compute confidence intervals for the partial correlation

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility