Sometimes data are collected on a large number of variables from a single population. As an example consider the Places Rated dataset below
Example 11-1: Places Rated Section
In the Places Rated Almanac, Boyer and Savageau rated 329 communities according to the following nine criteria:
- Climate and Terrain
- Health Care & the Environment
- The Arts
With a large number of variables, the dispersion matrix may be too large to study and interpret properly. There would be too many pairwise correlations between the variables to consider. Graphical displays may also not be particularly helpful when the data set is very large. With 12 variables, for example, there will be more than 200 three-dimensional scatterplots.
To interpret the data in a more meaningful form, it is necessary to reduce the number of variables to a few, interpretable linear combinations of the data. Each linear combination will correspond to a principal component.
(There is another very useful data reduction technique called Factor Analysis discussed in a subsequent lesson.)
- Carry out a principal components analysis using SAS and Minitab
- Assess how many principal components are needed;
- Interpret principal component scores and describe a subject with a high or low score;
- Determine when a principal component analysis should be based on the variance-covariance matrix or the correlation matrix;
- Use principal component scores in further analyses.