4.5 - Eigenvalues and Eigenvectors

The next thing that we would like to be able to do is to describe the shape of this ellipse mathematically so that we can understand how the data are distributed in multiple dimensions under a multivariate normal. To do this we first must define the eigenvalues and the eigenvectors of a matrix.

In particular, we will consider the computation of the eigenvalues and eigenvectors of a symmetric matrix \(\textbf{A}\) as shown below:

\(\textbf{A} = \left(\begin{array}{cccc}a_{11} & a_{12} & \dots & a_{1p}\\ a_{21} & a_{22} & \dots & a_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ a_{p1} & a_{p2} & \dots & a_{pp} \end{array}\right)\)

Note: we would call the matrix symmetric if the elements \(a^{ij}\) are equal to \(a^{ji}\) for each i and j.

Usually, \(\textbf{A}\) is taken to be either the variance-covariance matrix \(Σ\), the correlation matrix, or their estimates S and R, respectively.

Eigenvalues and eigenvectors are used for:

Computing prediction and confidence ellipses
Principal Components Analysis (later in the course)
Factor Analysis (also later in this course)

For the present, we will be primarily concerned with eigenvalues and eigenvectors of the variance-covariance matrix.

First of all, let's define what these terms are...

Eigenvalues

If we have a p x p matrix \(\textbf{A}\) we are going to have p eigenvalues, \(\lambda _ { 1 , } \lambda _ { 2 } \dots \lambda _ { p }\). They are obtained by solving the equation given in the expression below:

\(|\textbf{A}-\lambda\textbf{I}|=0\)

On the left-hand side, we have the matrix \(\textbf{A}\) minus \(λ\) times the Identity matrix. When we calculate the determinant of the resulting matrix, we end up with a polynomial of order p. Setting this polynomial equal to zero, and solving for \(λ\) we obtain the desired eigenvalues. In general, we will have p solutions and so there are p eigenvalues, not necessarily all unique.

Eigenvectors

The corresponding eigenvectors \(\mathbf { e } _ { 1 } , \mathbf { e } _ { 2 } , \ldots , \mathbf { e } _ { p }\) are obtained by solving the expression below:

\((\textbf{A}-\lambda_j\textbf{I})\textbf{e}_j = \mathbf{0}\)

Here, we have the difference between the matrix \(\textbf{A}\) minus the \(j^{th}\) eigenvalue times the Identity matrix, this quantity is then multiplied by the \(j^{th}\) eigenvector and set it all equal to zero. This will obtain the eigenvector \(e_{j}\) associated with eigenvalue \(\lambda_{j}\).

This does not generally have a unique solution. So, to obtain a unique solution we will often require that \(e_{j}\) transposed \(e_{j}\) is equal to 1. Or, if you like, the sum of the square elements of \(e_{j}\) is equal to 1.

\(\textbf{e}'_j\textbf{e}_j = 1\)

Note! Eigenvectors also correspond to different eigenvalues that are orthogonal. In situations, where two (or more) eigenvalues are equal, corresponding eigenvectors may still be chosen to be orthogonal.

Example 4-3: Consider the 2 x 2 matrix Section

To illustrate these calculations consider the correlation matrix R as shown below:

\(\textbf{R} = \left(\begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array}\right)\)

Then, using the definition of the eigenvalues, we must calculate the determinant of \(R - λ\) times the Identity matrix.

\(\left|\bf{R} - \lambda\bf{I}\bf\right| = \left|\color{blue}{\begin{pmatrix} 1 & \rho \\ \rho & 1\\ \end{pmatrix}} -\lambda \color{red}{\begin{pmatrix} 1 & 0 \\ 0 & 1\\ \end{pmatrix}}\right|\)

So, \(\textbf{R}\) in the expression above is given in blue, and the Identity matrix follows in red, and \(λ\) here is the eigenvalue that we wish to solve for. Carrying out the math we end up with the matrix with \(1 - λ\) on the diagonal and \(ρ\) on the off-diagonal. Then calculating this determinant we obtain \((1 - λ)^{2} - \rho ^{2}\) squared minus \(ρ^{2}\).

\(\left|\begin{array}{cc}1-\lambda & \rho \\ \rho & 1-\lambda \end{array}\right| = (1-\lambda)^2-\rho^2 = \lambda^2-2\lambda+1-\rho^2\)

Setting this expression equal to zero we end up with the following...

\( \lambda^2-2\lambda+1-\rho^2=0\)

To solve for \(λ\) we use the general result that any solution to the second-order polynomial below:

\(ay^2+by+c = 0\)

is given by the following expression:

\(y = \dfrac{-b\pm \sqrt{b^2-4ac}}{2a}\)

Here, \(a = 1, b = -2\) (the term that precedes \(λ\)) and c is equal to \(1 - ρ^{2}\) Substituting these terms in the equation above, we obtain that \(λ\) must be equal to 1 plus or minus the correlation \(ρ\).

\begin{align} \lambda &= \dfrac{2 \pm \sqrt{2^2-4(1-\rho^2)}}{2}\\ & = 1\pm\sqrt{1-(1-\rho^2)}\\& = 1 \pm \rho \end{align}

Here we will take the following solutions:

\( \begin{array}{ccc}\lambda_1 & = & 1+\rho \\ \lambda_2 & = & 1-\rho \end{array}\)

Next, to obtain the corresponding eigenvectors, we must solve a system of equations below:

\((\textbf{R}-\lambda\textbf{I})\textbf{e} = \mathbf{0}\)

This is the product of \(R - λ\) times I and the eigenvector e set equal to 0. Or in other words, this is translated for this specific problem in the expression below:

\(\left\{\left(\begin{array}{cc}1 & \rho \\ \rho & 1 \end{array}\right)-\lambda\left(\begin{array}{cc}1 &0\\0 & 1 \end{array}\right)\right \}\left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)

This simplifies as follows:

\(\left(\begin{array}{cc}1-\lambda & \rho \\ \rho & 1-\lambda \end{array}\right) \left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)

Yielding a system of two equations with two unknowns:

\(\begin{array}{lcc}(1-\lambda)e_1 + \rho e_2 & = & 0\\ \rho e_1+(1-\lambda)e_2 & = & 0 \end{array}\)

Note! This does not have a unique solution. If \((e_{1}, e_{2})\) (\(e_{1}\), \(e_{2}\)) is one solution, then a second solution can be obtained by multiplying the first solution by any non-zero constant c, i.e., \((ce_{1}, ce_{2})\). Therefore, we will require the additional condition that the sum of the squared values of \((e_{1}\) and \(e_{2})\) are equal to 1 (ie., \(e^2_1+e^2_2 = 1\))

Consider the first equation:

\((1-\lambda)e_1 + \rho e_2 = 0\)

Solving this equation for \(e_{2}\) and we obtain the following:

\(e_2 = -\dfrac{(1-\lambda)}{\rho}e_1\)

Substituting this into \(e^2_1+e^2_2 = 1\) we get the following:

\(e^2_1 + \dfrac{(1-\lambda)^2}{\rho^2}e^2_1 = 1\)

Recall that \(\lambda = 1 \pm \rho\). In either case we end up finding that \((1-\lambda)^2 = \rho^2\), so that the expression above simplifies to:

\(2e^2_1 = 1\)

Or, in other words:

\(e_1 = \dfrac{1}{\sqrt{2}}\)

Using the expression for \(e_{2}\) which we obtained above,

\(e_2 = -\dfrac{1-\lambda}{\rho}e_1\)

we get

\(e_2 = \dfrac{1}{\sqrt{2}}\) for \(\lambda = 1 + \rho\) and \(e_2 = - \dfrac{1}{\sqrt{2}}\) for \(\lambda = 1-\rho\)

Therefore, the two eigenvectors are given by the two vectors as shown below:

\(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_1 = 1+ \rho\) and \(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_2 = 1- \rho\)

Some properties of the eigenvalues of the variance-covariance matrix are to be considered at this point. Suppose that \(\lambda_{1}\) through \(\lambda_{p}\) are the eigenvalues of the variance-covariance matrix \(Σ\). By definition, the total variation is given by the sum of the variances. It turns out that this is also equal to the sum of the eigenvalues of the variance-covariance matrix. Thus, the total variation is:

\(\sum_{j=1}^{p}\sigma^2_j = \sigma^2_1 + \sigma^2_2 +\dots + \sigma^2_p = \lambda_1 + \lambda_2 + \dots + \lambda_p = \sum_{j=1}^{p}\lambda_j\)

The generalized variance is equal to the product of the eigenvalues:

\(|\Sigma| = \prod_{j=1}^{p}\lambda_j = \lambda_1 \times \lambda_2 \times \dots \times \lambda_p\)