11.1 - Principal Component Analysis (PCA) Procedure

Suppose that we have a random vector \(\mathbf{X}\).

\(\textbf{X} = \left(\begin{array}{c} X_1\\ X_2\\ \vdots \\X_p\end{array}\right)\)

with population variance-covariance matrix

\(\text{var}(\textbf{X}) = \Sigma = \left(\begin{array}{cccc}\sigma^2_1 & \sigma_{12} & \dots &\sigma_{1p}\\ \sigma_{21} & \sigma^2_2 & \dots &\sigma_{2p}\\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \dots & \sigma^2_p\end{array}\right)\)

Consider the linear combinations

\(\begin{array}{lll} Y_1 & = & e_{11}X_1 + e_{12}X_2 + \dots + e_{1p}X_p \\ Y_2 & = & e_{21}X_1 + e_{22}X_2 + \dots + e_{2p}X_p \\ & & \vdots \\ Y_p & = & e_{p1}X_1 + e_{p2}X_2 + \dots +e_{pp}X_p\end{array}\)

Each of these can be thought of as linear regression, predicting \(Y_{i}\) from \(X_{1}\), \(X_{2}\), ... , \(X_{p}\). There is no intercept, but \(e_{i1}\), \(e_{i2}\), ..., \(e_{ip}\) can be viewed as regression coefficients.

Note that \(Y_{i}\) is a function of our random data, and so is also random. Therefore it has a population variance

\(\text{var}(Y_i) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{ik}e_{il}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_i\)

Moreover, \(Y_{i}\) and \(Y_{j}\) have population covariance

\(\text{cov}(Y_i, Y_j) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{ik}e_{jl}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_j\)

Collect the coefficients \(e_{ij}\) into the vector

\(\mathbf{e}_i = \left(\begin{array}{c} e_{i1}\\ e_{i2}\\ \vdots \\ e_{ip}\end{array}\right)\)

First Principal Component (PCA1): \(\boldsymbol{Y}_{1}\) Section

The first principal component is the linear combination of x-variables that has maximum variance (among all linear combinations). It accounts for as much variation in the data as possible.

Specifically, we define coefficients \( \boldsymbol { e } _ { 11, } \boldsymbol { e } _ { 12 }, \ldots, \boldsymbol { e } _ { 1 p }\) for the first component in such a way that its variance is maximized, subject to the constraint that the sum of the squared coefficients is equal to one. This constraint is required so that a unique answer may be obtained.

More formally, select \(\boldsymbol { e } _ { 11 , } \boldsymbol { e } _ { 12 } , \ldots , \boldsymbol { e } _ { 1 p }\) that maximizes

\(\text{var}(Y_1) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{1k}e_{1l}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_1\)

subject to the constraint that

\(\mathbf{e}'_1\mathbf{e}_1 = \sum\limits_{j=1}^{p}e^2_{1j} = 1\)

Second Principal Component (PCA2): \(\boldsymbol{Y}_{2}\) Section

The second principal component is the linear combination of x-variables that accounts for as much of the remaining variation as possible, with the constraint that the correlation between the first and second components is 0

Select \(\boldsymbol { e } _ { 21 , } \boldsymbol { e } _ { 22 } , \ldots , \boldsymbol { e } _ { 2 p }\) that maximizes the variance of this new component...

\(\text{var}(Y_2) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{2k}e_{2l}\sigma_{kl} = \mathbf{e}'_2\Sigma\mathbf{e}_2\)

subject to the constraint that the sums of squared coefficients add up to one,

\(\mathbf{e}'_2\mathbf{e}_2 = \sum\limits_{j=1}^{p}e^2_{2j} = 1\)

along with the additional constraint that these two components are uncorrelated.

\(\text{cov}(Y_1, Y_2) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{1k}e_{2l}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_2 = 0\)

All subsequent principal components have this same property – they are linear combinations that account for as much of the remaining variation as possible and they are not correlated with the other principal components.

We will do this in the same way with each additional component. For instance:

\(i^{th}\) Principal Component (PCAi): \(\boldsymbol{Y}_{i}\) Section

We select \(\boldsymbol { e } _ { i1 , } \boldsymbol { e } _ { i2 } , \ldots , \boldsymbol { e } _ { i p }\) to maximize

\(\text{var}(Y_i) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{ik}e_{il}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_i\)

subject to the constraint that the sums of squared coefficients add up to one...along with the additional constraint that this new component is uncorrelated with all the previously defined components.

\(\mathbf{e}'_i\mathbf{e}_i = \sum\limits_{j=1}^{p}e^2_{ij} = 1\)

\(\text{cov}(Y_1, Y_i) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{1k}e_{il}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_i = 0\),

\(\text{cov}(Y_2, Y_i) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{2k}e_{il}\sigma_{kl} = \mathbf{e}'_2\Sigma\mathbf{e}_i = 0\),

\(\vdots\)

\(\text{cov}(Y_{i-1}, Y_i) = \sum\limits_{k=1}^{p}\sum\limits_{l=1}^{p}e_{i-1,k}e_{il}\sigma_{kl} = \mathbf{e}'_{i-1}\Sigma\mathbf{e}_i = 0\)

Therefore all principal components are uncorrelated from one another.