Lesson 6: Principal Components Analysis
Introduction
Key Learning Goals for this Lesson: |
Textbook reading: Consult Course Schedule |
Principal Component Analysis (PCA) is a method of dimension reduction. This is not directly related to prediction problem, but several regression methods are directly dependant on it. The regression methods (PCR and PLS) will be considered later. Now a motivation for dimension reduction is being set up.
Notation
The input matrix X of dimension \(N \times p\):
\[\begin{pmatrix}
x_{1,1} & x_{1,2} & ... & x_{1,p} \\
x_{2,1} & x_{2,2} & ... & x_{2,p}\\
... & ... & ... & ...\\
x_{N,1} & x_{N,2} & ... & x_{N,p}
\end{pmatrix}\]
The rows of the above matrix represent the cases or observations.
The columns represent the variables observed on each unit. These represent the characteristics.
Assume that the columns of X are centered, i.e., the estimated column mean is subtracted from each column.