12.9 - Goodness-of-Fit

Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:

\(\mathbf{\Sigma = LL' + \Psi}\)

\(\mathbf{L}\) is the matrix of factor loadings, and the diagonal elements of \(Ψ\) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:

\(X^2 = \left(n-1-\frac{2p+4m-5}{6}\right)\log \frac{|\mathbf{\hat{L}\hat{L}'}+\mathbf{\hat{\Psi}}|}{|\hat{\mathbf{\Sigma}}|}\)

The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant in the statistic is called the Bartlett correction. The log is the natural log. In the numerator, we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:

\(\hat{\boldsymbol{\Sigma}} = \frac{n-1}{n}\mathbf{S}\)

and \(\mathbf{S}\) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for \(X_{2}\). However, if the model does not fit well, then the determinants will be different and \(X_{2}\) will be large.

Under the null hypothesis that the factor model adequately describes the relationships among the variables,

\(\mathbf{X}^2 \sim \chi^2_{\frac{(p-m)^2-p-m}{2}} \)

Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if \(X_{2}\) exceeds the critical value from the chi-square table.


Back to the Output...

Looking just past the iteration results, we have...

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 3 Factors are sufficient 12 92.6652 < 0.0001
\(H_{A}\colon\) More Factors are needed      

For our Places Rated dataset, we find a significant lack of fit. \(X _ { 2 } = 92.67 ; d . f = 12 ; p < 0.0001\). We conclude that the relationships among the variables are not adequately described by the factor model. This suggests that we do not have the correct model.

The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy

\(p(m+1) \le \frac{p(p+1)}{2}\)

In the present example, this means that m ≤ 4.

Let's return to the SAS program and change the "nfactors" value from 3 to 4:

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 4 Factors are sufficient 6 41.6867 < 0.0001
\(H_{A}\colon\) More Factors are needed      

We find that the factor model with m = 4 does not fit the data adequately either, \(X _ { 2 } = 41.69 ; d . f . = 6 ; p < 0.0001\). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. The next step could be to drop variables from the data set to obtain a better-fitting model.