6.1 - Conditional Distributions

Partial correlations may only be defined after introducing the concept of conditional distributions. We will restrict ourselves to conditional distributions from multivariate normal distributions only.

If we have a p × 1 random vector \(\mathbf{Z}\), we can partition it into two random vectors \(\mathbf{X}_1\) and \(\mathbf{X}_2\) where \(\mathbf{X}_1\) is a p₁ × 1 vector and \(\mathbf{X}_2\) is a p₂ × 1 vector as shown in the expression below:

\(\textbf{Z} = \left(\begin{array}{c} \textbf{X}_1 \\ \textbf{X}_2\end{array}\right)\)

Conditional Distribution Properties

Further, suppose that we partition the mean vector and covariance matrix in a corresponding manner. That is,

\(\boldsymbol{\mu} = \left(\begin{array}{c}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{array}\right)\) and \(\mathbf{\Sigma} = \left(\begin{array}{cc}\mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12}\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{array}\right)\)

For instance, \(\boldsymbol{\mu}_{1}\) gives the means for the variables in the vector \(\mathbf{X}_{1}\), and \(\Sigma _ { 11 }\) gives variances and covariances for vector \(\mathbf{X}_{1}\). The matrix \(\Sigma _ { 12 }\) gives covariances between variables in vector \(\mathbf{X_{1}}\)and vector \(\mathbf{X_{2}}\) (as does matrix \(\Sigma _ { 21 }\)).

Any distribution for a subset of variables from a multivariate normal, conditional on known values for another subset of variables, is a multivariate normal distribution.

Conditional Distribution: The conditional distribution of \(\mathbf{X}_{1}\)given known values for \(\mathbf{X}_2=\mathbf{x}_{2}\)is a multivariate normal with:; \begin{align} \text{mean vector} & = \mathbf{\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2)}\\ \text{covariance matrix} & = \mathbf{\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}} \end{align}

Bivariate Case

Suppose that we have p = 2 variables with a multivariate normal distribution. The conditional distribution of \(X_{1}\) given knowledge of \(x_{2}\) is a normal distribution with

\begin{align} \text{Mean} & = \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2) \\ \text{Variance} & = \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\end{align}

Example 6-1: Conditional Distribution of Weight Given Height for College Men

Suppose that the weights (lbs) and heights (inches) of undergraduate college men have a multivariate normal distribution with mean vector \(\mathbf{\mu} =
\left(\begin{array}{c} 175\\ 71 \end{array}\right)\) and covariance matrix \(\mathbf{\Sigma} = \left(\begin{array}{cc} 550 & 40\\ 40 & 8 \end{array}\right)\).

The conditional distribution of \(X_{1}\) weight given \(x_{2}\) = height is a normal distribution with

\begin{align} \text{Mean} &= \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2)\\[5pt] &= 175 + \frac{40}{8}(x_2-71) \\[5pt] &= -180+5x_2 \end{align}

\begin{align} \text{Variance} &= \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\\ &= 550-\frac{40^2}{8} \\[5pt] &= 350 \end{align}

For instance, for men with height = 70, weights are normally distributed with mean = -180 + 5(70) = 170 pounds and variance = 350. (So standard deviation \(\sqrt{350} = 18.71\) = pounds)

Notice that we have generated a simple linear regression model that relates weight to height.

Conditional Means, Variances and Covariances

So far, we have only considered unconditional population means, variances, covariances, and correlations. These quantities are defined under the setting in which the subjects are sampled from the entire population. For example, blood pressure and cholesterol may be measured from a sample selected from the population of all adult citizens of the United States.

To understand partial correlations, we must first consider conditional means, variances, and covariances. These quantities are defined for some subset of the population. For example, blood pressure and cholesterol may be measured from a sample of all 51-year-old citizens of the United States. Thus, we may consider the population mean blood pressure of 51-year-old citizens. This quantity is called the conditional mean blood pressure given that the subject is a 51-year-old citizen.

More than one condition may be applied. For example, we may consider the population mean blood pressure of 51-year-old citizens who weigh 190 pounds. This quantity is the conditional mean blood pressure given that the subject is 51 years old and weighs 190 pounds.

Conditional Mean

Let Y denote a vector of variables (e.g., blood pressure, cholesterol, etc.) of interest, and let X denote a vector of variables on which we wish to condition (e.g., age, weight, etc.). Then the conditional mean of Y given that X equals a particular value x (i.e., X = x) is denoted by

\(\mu_{\textbf{Y.x}} = E(\textbf{Y}|\textbf{X=x})\)

This is interpreted as the population mean of the vector Y given a sample from the subpopulation where X = x.

Conditional Variance

Let Y denote a variable of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional variance of Y given that X = x is

\(\sigma^2_{\textbf{Y.x}} = \text{var}(\mathbf{Y}|\textbf{X=x}) = E\{(\mathbf{Y}-\boldsymbol{\mu}_{\textbf{Y.x}})^2|\textbf{X=x}\}\)

Because Y is random, so is \(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) and hence\(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) has a conditional mean. This can be interpreted as the variance of Y given a sample from the subpopulation where X = x.

Conditional Covariance

Let \(Y_{i}\) and \(Y_{j}\) denote two variables of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional covariance between \(Y_{i}\) and \(Y_{j}\) given that X = x is

\(\sigma_{i,j.\textbf{x}} = \text{cov}(Y_i, Y_j| \textbf{X=x}) = E\{(Y_i-\mu_{Y_i.x})(Y_j-\mu_{Y_j.x})|\textbf{X=x}\}\)

Because \(Y_{i}\) and \(Y_{j}\) are random, so is \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) and hence \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) has a conditional mean. This can be interpreted as the covariance between \(Y_{i}\) and \(Y_{j}\) given a sample from the subpopulation where X = x.

Just as the unconditional variances and covariances can be collected into a variance-covariance matrix \(Σ\), the conditional variances and covariances can be collected into a conditional variance-covariance matrix:

\(\mathbf{\Sigma_{Y.x}}= \text{var}\mathbf{(Y|X=x)} = \left(\begin{array}{cccc}\sigma^2_{Y_1\textbf{.X}} & \sigma_{12\textbf{.X}} & \dots & \sigma_{1p\textbf{.X}}\\ \sigma_{21\textbf{.X}} & \sigma^2_{Y_2 \textbf{.X}} & \dots & \sigma_{2p \textbf{.X}} \\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{p1 \textbf{.X}} & \sigma_{p2 \textbf{.X}} & \dots & \sigma^2_{Y_p\textbf{.X}} \end{array}\right)\)

Partial Correlation

The partial correlation between \(Y_{j}\) and \(Y_{k}\) given X = x is:

\[\rho_{jk\textbf{.X}} = \dfrac{\sigma_{jk\text{.X}}}{\sigma_{Y_j\textbf{.X}}\sigma_{Y_k \textbf{.X}}}\]

Note! This is computed in the same way as unconditional correlations, replacing unconditional variances and covariances with conditional variances and covariances. This can be interpreted as the correlation between \(Y_{j}\) and \(Y_{k}\) given a sample from the subpopulation where X = x.

The Multivariate Normal Distribution

Next, let us return to the multivariate normal distribution. Suppose that we have a random vector Z that is partitioned into components X and Y that is realized from a multivariate normal distribution with a mean vector with corresponding components \(\boldsymbol{\mu}_{X}\) and \(\boldsymbol{\mu}_{Y}\), and variance-covariance matrix which has been partitioned into four parts as shown below:

\(\textbf{Z} = \left(\begin{array}{c}\textbf{X}\\ \textbf{Y} \end{array}\right) \sim N \left(\left(\begin{array}{c}\boldsymbol{\mu}_X\\\boldsymbol{\mu}_Y \end{array}\right), \left(\begin{array}{cc} \mathbf{\Sigma_{X}} & \mathbf{\Sigma_{XY}}\\ \mathbf{\Sigma_{YX}} & \mathbf{\Sigma_Y} \end{array}\right)\right)\)

Here, \(\mathbf{\Sigma_{X}}\) is the variance-covariance matrix for the random vector X. \( \mathbf{\Sigma_Y}\)is the variance-covariance matrix for the random vector Y. And, \(\mathbf{\Sigma_{YX}}\) contains the covariances between the elements of X and the corresponding elements of Y.

Then the conditional distribution of Y given that X takes a particular value x is also going to be a multivariate normal with conditional expectation as shown below:

\(E(\textbf{Y}|\textbf{X=x}) = \mathbf{\mu_Y} + \mathbf{\Sigma_{YX}\Sigma^{-1}_X}(\mathbf{x}-\boldsymbol{\mu}_X)\)

Note that this is equal to the mean of Y plus an adjustment. This adjustment involves the covariances between X and Y, the inverse of the variance-covariance matrix of X, and the difference between the value x and the mean for the random variable X. If little x is equal to \(\boldsymbol{\mu}_{X}\), then the conditional expectation of Y given that X is simply equal to the ordinary mean for Y.

In general, if there are positive covariances between the X's and Y's, then a value of X, greater than \(\boldsymbol{\mu}_{X}\) will result in a positive adjustment in the calculation of this conditional expectation. Conversely, if X is less than \(\boldsymbol{\mu}_{X}\), then we will end up with a negative adjustment.

The conditional variance-covariance matrix of Y given that X = x is equal to the variance-covariance matrix for Y minus the term that involves the covariances between X and Y and the variance-covariance matrix for X. For now, we will call this conditional variance-covariance matrix A as shown below:

\(\text{var}(\textbf{Y|X=x}) = \mathbf{\Sigma_Y - \Sigma_{YX}\Sigma^{-1}_X\Sigma_{XY}} = \textbf{A}\)

We are finally now ready to define the partial correlation between two variables \(Y_{j}\) and \(Y_{k}\) given that the random vector X = x. This is shown in the expression below:

\(\rho_{jk\textbf{.x}} = \dfrac{a_{jk}}{\sqrt{a_{jj}a_{kk}}}\)

This is basically the same formula that we would have for the ordinary correlation, in this case, calculated using the conditional variance-covariance matrix in place of the ordinary variance-covariance matrix.

Partial correlations can be estimated by substituting the sample variance-covariance matrixes for the population variance-covariance matrixes as shown in the expression below:

\(\widehat{\text{var}}(\textbf{Y|X=x}) = \mathbf{S_Y - S_{YX}S^{-1}_X S_{XY}}= \hat{\textbf{A}}\)

where

\(\mathbf{S} = \left(\begin{array}{cc} \mathbf{S_X} & \mathbf{S_{XY}}\\ \mathbf{S_{YX}} & \mathbf{S_Y}\end{array}\right)\)

is the sample variance-covariance matrix of the data.

Then the elements of the estimated conditional variance-covariance matrix can be used to obtain the partial correlation as shown below:

\(r_{jk\textbf{.x}} = \dfrac{\hat{a}_{jk}}{\sqrt{\hat{a}_{jj}\hat{a}_{kk}}}\)

If we are just conditioning on a single variable, then we have a simpler expression available to us. If we are looking at the partial correlation between variables j and k, given that the \(i^{th}\) variable takes the value of little \(y_{i}\), this calculation can be obtained by using the expression below. The partial correlation between \(Y_{j}\) and \(y_{k}\) given \(Y_{i}\) = \(y_{i}\) is estimated by:

\(r_{jk.i} = \dfrac{r_{jk}-r_{ij}r_{ik}}{\sqrt{(1-r^2_{ij})(1-r^2_{ik})}}\)

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility