12.5 - Communalities

Example 12-1: Continued... Section

The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:

\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)

To understand the computation of communulaties, recall the table of factor loadings:

  Factor
Variable(HEADING) 1 2 3
Climate 0.286 0.076 0.841
Housing 0.698 0.153 0.084
Health 0.744 -0.410 -0.020
Crime 0.471 0.522 0.135
Transportation 0.681 -0.156 -0.148
Education 0.498 -0.498 -0.253
Arts 0.861 -0.115 0.011
Recreation 0.642 0.322 0.044
Economics 0.298 0.595 -0.533

Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:

\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)

The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:

Final Communality Estimates: Total = 5.616885
Climate housing health crime trans educate arts recreate econ
0.79500707 0.51783185 0.72230182 0.51244913 0.50977159 0.56073895 0.75382091 0.51725940 0.72770402

5.616885, (located just above the individual communalities), is the "Total Communality".

Performing factor analysis (MLE extraction)

To perform factor analysis and obtain the communities:

  1. Open the ‘places_tf.csv’ data set in a new worksheet.
  2. Transform variables. This step is optional but used in the steps below.  
    1. Calc > Calculator
    2. Highlight and select ‘climate’ to move it to the Store result window.
    3. In the Expression window, enter LOGTEN( 'climate') to apply the (base 10) log transformation to the climate variable.
    4. Choose OK. The transformed values replace the originals in the worksheet under ‘climate’.
    5. Repeat sub-steps 1) through 4) above for all variables housing through econ.
  3. Stat > Multivariate > Factor Analysis
    1. Highlight and select climate through econ to move all 9 variables to the Variables window.
    2. Choose 3 for the number of factors to extract.
    3. Choose Principal Components for the Method of Extraction.
    4. Under Options, select Correlation as Matrix to Factor.
    5. Under Graphs, select Scree Plot.
  4. Choose OK and OK again. The numeric results are shown in the results area, along with the screen plot graph. The last column has the communality values.

In summary, the communalities are placed into a table:

 

Variable Communality
Climate 0.795
Housing 0.518
Health 0.722
Crime 0.512
Transportation 0.510
Education 0.561
Arts 0.754
Recreation 0.517
Economics 0.728
Total 5.617

You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variations in climate, the arts, economics, and health.

One assessment of how well this model performs can be obtained from the communalities.  We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best and is not bad for other variables such as Economics, Health, and the Arts. However, for other variables such as Crime, Recreation, Transportation, and Housing the model does not do a good job, explaining only about half of the variation.

The sum of all communality values is the total communality value:

\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)

Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is

\(\dfrac{5.617}{9} = 0.624\)

This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.

Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:

\(\hat{\Psi}_i = 1-\hat{h}^2_i\)

Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:

\(\hat{\Psi}_1 = 1-0.795 = 0.205\)

The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:

Residual Correlation with Uniqueness on the Diagonal

  Climate Housing Health crime Trans Educate Arts Recreate Econ
Climate 0.20499 -0.00924 -0.01476 -0.06027 -0.03720 0.18537 -0.07518 -0.12475 0.21735
Housing -0.00924 0.48217 -0.02317 -0.28063 -0.12119 -0.04803 -0.07518 -0.04032 0.04249
Health -0.01476 -0.02317 0.27770 0.05007 -0.15480 -0.11537 -0.00929 -0.09108 0.06527
Crime -0.06027 -0.28063 0.05007 0.48755 0.05497 0.11562 0.00009 -0.18377 -0.10288
Trans -0.03720 -0.12119 -0.15480 0.05497 0.49023 -0.14318 -0.05439 0.01041 -0.12641
Educate 0.18537 -0.04803 -0.11537 0.11562 -0.14318 0.43926 -0.13515 -0.05531 0.14197
Arts -0.07518 -0.07552 -0.00929 0.00009 -0.05439 -0.13515 0.24618 -0.01926 -0.04687
Recreate -0.12475 -0.04032 -0.09108 -0.18377 0.01041 -0.05531 -0.01926 0.48274 -0.18326
Econ 0.21735 0.04249 0.06527 -0.10288 -0.12641 0.14197 -0.04687 -0.18326 0.27230

For example, the specific variance for housing is 0.482.

This model provides an approximation to the correlation matrix.  We can assess the model's appropriateness with the residuals obtained from the following calculation:

\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)

This is basically the difference between R and LL', or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217.  These values give an indication of how well the factor model fits the data.

One disadvantage of the principal component method is that it does not provide a test for lack of fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this.  Such a test is available for the maximum likelihood method.