12.5 - Communalities
12.5 - CommunalitiesExample 12-1: Continued...
The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:
\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)
To understand the computation of communulaties, recall the table of factor loadings:
Factor | |||
Variable | 1 | 2 | 3 |
Climate | 0.286 | 0.076 | 0.841 |
Housing | 0.698 | 0.153 | 0.084 |
Health | 0.744 | -0.410 | -0.020 |
Crime | 0.471 | 0.522 | 0.135 |
Transportation | 0.681 | -0.156 | -0.148 |
Education | 0.498 | -0.498 | -0.253 |
Arts | 0.861 | -0.115 | 0.011 |
Recreation | 0.642 | 0.322 | 0.044 |
Economics | 0.298 | 0.595 | -0.533 |
Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:
\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)
Using SAS
The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:
Final Communality Estimates: Total = 5.616885 | ||||||||
---|---|---|---|---|---|---|---|---|
Climate | housing | health | crime | trans | educate | arts | recreate | econ |
0.79500707 | 0.51783185 | 0.72230182 | 0.51244913 | 0.50977159 | 0.56073895 | 0.75382091 | 0.51725940 | 0.72770402 |
5.616885, (located just above the individual communalities), is the "Total Communality".
Using Minitab
View the video below to see hhow to get the communalities using the Minitab statistical software application.
>In summary, the communalities are placed into a table:
Variable | Communality |
Climate | 0.795 |
Housing | 0.518 |
Health | 0.722 |
Crime | 0.512 |
Transportation | 0.510 |
Education | 0.561 |
Arts | 0.754 |
Recreation | 0.517 |
Economics | 0.728 |
Total | 5.617 |
You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variation in climate, the arts, economics, and health.
One assessment of how well this model performs can be obtained from the communalities. We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best, and is not bad for other variables such as Economics, Health and the Arts. However, for other variables such as Crime, Recreation, Transportation and Housing the model does not do a good job, explaining only about half of the variation.
The sum of all communality values is the total communality value:
\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)
Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is
\(\dfrac{5.617}{9} = 0.624\)
This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.
Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:
\(\hat{\Psi}_i = 1-\hat{h}^2_i\)
Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:
\(\hat{\Psi}_1 = 1-0.795 = 0.205\)
The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:
Climate | Housing | Health | crime | Trans | Educate | Arts | Recreate | Econ | |
---|---|---|---|---|---|---|---|---|---|
Climate | 0.20499 | -0.00924 | -0.01476 | -0.06027 | -0.03720 | 0.18537 | -0.07518 | -0.12475 | 0.21735 |
Housing | -0.00924 | 0.48217 | -0.02317 | -0.28063 | -0.12119 | -0.04803 | -0.07518 | -0.04032 | 0.04249 |
Health | -0.01476 | -0.02317 | 0.27770 | 0.05007 | -0.15480 | -0.11537 | -0.00929 | -0.09108 | 0.06527 |
Crime | -0.06027 | -0.28063 | 0.05007 | 0.48755 | 0.05497 | 0.11562 | 0.00009 | -0.18377 | -0.10288 |
Trans | -0.03720 | -0.12119 | -0.15480 | 0.05497 | 0.49023 | -0.14318 | -0.05439 | 0.01041 | -0.12641 |
Educate | 0.18537 | -0.04803 | -0.11537 | 0.11562 | -0.14318 | 0.43926 | -0.13515 | -0.05531 | 0.14197 |
Arts | -0.07518 | -0.07552 | -0.00929 | 0.00009 | -0.05439 | -0.13515 | 0.24618 | -0.01926 | -0.04687 |
Recreate | -0.12475 | -0.04032 | -0.09108 | -0.18377 | 0.01041 | -0.05531 | -0.01926 | 0.48274 | -0.18326 |
Econ | 0.21735 | 0.04249 | 0.06527 | -0.10288 | -0.12641 | 0.14197 | -0.04687 | -0.18326 | 0.27230 |
For example, the specific variance for housing is 0.482.
This model provides an approximation to the correlation matrix. We can assess the model's appropriateness with the residuals obtained from the following calculation:
\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)
This is basically the difference between R and LL', or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217. These values give an indication of how well the factor model fits the data.
One disadvantage of the principal component method is that it does not provide a test for lack-of-fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this. Such a test is available for the maximum likelihood method.