12.4 - Example: Places Rated Data - Principal Component Method

Example 12-1: Places Rated Section

Let's revisit the Places Rated Example from Lesson 11.  Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:

  1. Climate and Terrain
  2. Housing
  3. Health Care & Environment
  4. Crime
  5. Transportation
  6. Education
  7. The Arts
  8. Recreation
  9. Economic

Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.

Our objective here is to describe the relationships among the variables.

Before carrying out a factor analysis we need to determine m. How many common factors should be included in the model? This requires a determination of how many parameters will be involved.

For p = 9, the variance-covariance matrix \(\Sigma\) contains

\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)

unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to

\(p(m+1) = 9(m+1)\)

Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.

It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.

Component Eigenvalue Proportion Cumulative
1 3.2978 0.3664 0.3664
2 1.2136 0.1348 0.5013
3 1.1055 0.1228 0.6241
4 0.9073 0.1008 0.7249
5 0.8606 0.0956 0.8205
6 0.5622 0.0625 0.8830
7 0.4838 0.0538 0.9368
8 0.3181 0.0353 0.9721
9 0.2511 0.0279 1.0000

We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.

Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.

The factor analysis is carried out using the program as shown below:

Download the SAS Program here: places2.sas


Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

options ls=78;
title "Factor Analysis - Principal Component Method - Places Rated";

 /* After reading in the places data, the (base 10) log transformations are taken.
  * This is an optional step and not required for the factor analysis.

data places;
  infile "D:\Statistics\STAT 505\data\places.csv" firstobs=2 delimiter=',';
  input climate housing health crime trans educate arts recreate econ id;

 /* Options for the factor statement are
  * method= specifes the estimation method from principal components
  * nfactors= specifies the number of factors to work with
  * rotate=varimax specifies the rotation type varimax
  * simple outputs several statistics, such as means and std deviations
  * scree displays a scree plot of the eigenvalues
  * ev outputs the eigenvectors
  * preplot displays a scatterplot of the factor pattern before rotation

proc factor data=places method=principal nfactors=3 rotate=varimax simple scree ev preplot
     plot residuals;
  var climate housing health crime trans educate arts recreate econ;

Performing factor analysis (principal components extraction)

To perform factor analysis and obtain the communalities:

  1. Open the ‘places_tf.csv’ data set in a new worksheet.
  2. Transform variables. This step is optional but used in the steps below.  
    1. Calc > Calculator
    2. Highlight and select ‘climate’ to move it to the Store result window.
    3. In the Expression window, enter LOGTEN( 'climate') to apply the (base 10) log transformation to the climate variable.
    4. Choose OK. The transformed values replace the originals in the worksheet under ‘climate’.
    5. Repeat sub-steps 1) through 4) above for all variables housing through econ.
  3. Stat > Multivariate > Factor Analysis
    1. Highlight and select climate through econ to move all 9 variables to the Variables window.
    2. Choose 3 for the number of factors to extract.
    3. Choose Principal Components for the Method of Extraction.
    4. Under Options, select Correlation as Matrix to Factor.
    5. Under Graphs, select Scree Plot.
  4. Choose OK and OK again. The numeric results are shown in the results area, along with the screen plot graph. The last column has the communality values.

Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression

\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)

These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m=3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.

Variable 1 2 3
Climate 0.286 0.076 0.841
Housing 0.698 0.153 0.084
Health 0.744 -0.410 -0.020
Crime 0.471 0.522 0.135
Transportation 0.681 -0.156 -0.148
Education 0.498 -0.498 -0.253
Arts 0.861 -0.115 0.011
Recreation 0.642 0.322 0.044
Economics 0.298 0.595 -0.533

Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criteria, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:

  1. Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.

  2. Similarly, Factor 2 is correlated most strongly with Crime, Education, and Economics. You can say that the second factor is primarily a measure of these variables.

  3. Likewise, Factor 3 is correlated most strongly with Climate and Economics. You can say that the first factor is primarily a measure of these variables.

The interpretation above is very similar to that obtained in the standardized principal component analysis.