12.12 - Estimation of Factor Scores

Factor scores are similar to the principal components in the previous lesson. Just as we plotted principal components against each other, a similar scatter plot of factor scores is also helpful. We also might use factor scores as explanatory variables in future analyses. It may even be of interest to use the factor score as the dependent variable in a future analysis.

The methods for estimating factor scores depend on the method used to carry out the principal components analysis. The vectors of common factors f are of interest. There are m unobserved factors in our model and we would like to estimate those factors. Therefore, given the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}; i = 1,2,\dots, n,\)

we may wish to estimate the vectors of factor scores

\(\mathbf{f_1, f_2, \dots, f_n}\)

for each observation.

Methods Section

There are a number of different methods for estimating factor scores from the data. These include:

  • Ordinary Least Squares
  • Weighted Least Squares
  • Regression method

Ordinary Least Squares Section

By default, this is the method that SAS uses if you use the principal component method. The difference between the \(j^{th}\) variable on the \(i^{th}\) subject and its value under the factor model is computed. The \(\mathbf{L}\)'s are factor loadings and the f's are the unobserved common factors. The vector of common factors for subject i, or \( \hat{\mathbf{f}}_i \), is found by minimizing the sum of the squared residuals:

\[\sum_{j=1}^{p}\epsilon^2_{ij} = \sum_{j=1}^{p}(y_{ij}-\mu_j-l_{j1}f_1 - l_{j2}f_2 - \dots - l_{jm}f_m)^2 = (\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})'(\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})\]

This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:

\(\mathbf{\hat{f}_i = (L'L)^{-1}L'(Y_i-\boldsymbol{\mu})}\)

In practice, we substitute our estimated factor loadings into this expression as well as the sample mean for the data:

\(\mathbf{\hat{f}_i = \left(\hat{L}'\hat{L}\right)^{-1}\hat{L}'(Y_i-\bar{y})}\)

Using the principal component method with the unrotated factor loadings, this yields:

\[\mathbf{\hat{f}_i} = \left(\begin{array}{c} \frac{1}{\sqrt{\hat{\lambda}_1}}\mathbf{\hat{e}'_1(Y_i-\bar{y})}\\  \frac{1}{\sqrt{\hat{\lambda}_2}}\mathbf{\hat{e}'_2(Y_i-\bar{y})}\\ \vdots \\  \frac{1}{\sqrt{\hat{\lambda}_m}}\mathbf{\hat{e}'_m(Y_i-\bar{y})}\end{array}\right)\]

\(e_i\) through \(e_m\) are our first m eigenvectors.

Weighted Least Squares (Bartlett) Section

The difference between WLS and OLS is that the squared residuals are divided by the specific variances as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances.  The factor model fits the data best for variables with low specific variances.  The variables with low specific variances should give us more information regarding the true values for the specific factors.

Therefore, for the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}\)

we want to find \(\boldsymbol{f_i}\) that minimizes

\( \sum\limits_{j=1}^{p}\frac{\epsilon^2_{ij}}{\Psi_j} = \sum\limits_{j=1}^{p}\frac{(y_{ij}-\mu_j - l_{j1}f_1 - l_{j2}f_2 -\dots - l_{jm}f_m)^2}{\Psi} = \mathbf{(Y_i-\boldsymbol{\mu}-Lf_i)'\Psi^{-1}(Y_i-\boldsymbol{\mu}-Lf_i)}\)

The solution is given by this expression where \(\mathbf{\Psi}\) is the diagonal matrix whose diagonal elements are equal to the specific variances:

\(\mathbf{\hat{f}_i = (L'\Psi^{-1}L)^{-1}L'\Psi^{-1}(Y_i-\boldsymbol{\mu})}\)

and can be estimated by substituting the following:

\(\mathbf{\hat{f}_i = (\hat{L}'\hat{\Psi}^{-1}\hat{L})^{-1}\hat{L}'\hat{\Psi}^{-1}(Y_i-\bar{y})}\)

Regression Method Section

This method is used for maximum likelihood estimates of factor loadings. A vector of the observed data, supplemented by the vector of factor loadings for the ith subject, is considered.

The joint distribution of the data \(\boldsymbol{Y}_i\) and the factor \(\boldsymbol{f}_i\) is

\(\left(\begin{array}{c}\mathbf{Y_i} \\ \mathbf{f_i}\end{array}\right) \sim N \left[\left(\begin{array}{c}\mathbf{\boldsymbol{\mu}} \\ 0 \end{array}\right), \left(\begin{array}{cc}\mathbf{LL'+\Psi} & \mathbf{L} \\ \mathbf{L'} & \mathbf{I}\end{array}\right)\right]\)

Using this we can calculate the conditional expectation of the common factor score \(\boldsymbol{f}_i\) given the data \(\boldsymbol{Y}_i\) as expressed here:

\(E(\mathbf{f_i|Y_i}) = \mathbf{L'(LL'+\Psi)^{-1}(Y_i-\boldsymbol{\mu})}\)

This suggests the following estimator by substituting in the estimates for L and \(\mathbf{\Psi}\):

\(\mathbf{\hat{f}_i = \hat{L}'\left(\hat{L}\hat{L}'+\hat{\Psi}\right)^{-1}(Y_i-\bar{y})}\)

There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.

\(\mathbf{\tilde{f}_i = \hat{L}'S^{-1}(Y_i-\bar{y})}\)