11.7 - Once the Components Are Calculated

One can interpret this component by component. One method of deciding how many components to include is to choose only those that give unambiguous results, i.e., where no variable appears in two different columns as a significant contribution.

Note! The primary purpose of this analysis is descriptive - it is not hypothesis testing! So your decision in many respects needs to be made based on what provides you with a good, concise description of the data.

We have to make a decision as to what is an important correlation, not necessarily from a statistical hypothesis testing perspective, but from, in this case, an urban-sociological perspective. You have to decide what is important in the context of the problem at hand. This decision may differ from discipline to discipline. In some disciplines such as sociology and ecology, the data tend to be inherently 'noisy', and in this case, you would expect 'messier' interpretations. If you are looking in a discipline such as engineering where everything has to be precise, you might put higher demands on the analysis. You would want to have very high correlations. Principal component analyses are mostly implemented in sociological and ecological types of applications as well as in marketing research.

As before, you can plot the principal components against one another and explore where the data for certain observations lies.

Sometimes the principal component scores will be used as explanatory variables in a regression. Sometimes in regression settings, you might have a very large number of potential explanatory variables and you may not have much of an idea as to which ones you might think are important. You might perform a principal components analysis first and then perform a regression predicting the variables from the principal components themselves. The nice thing about this analysis is that the regression coefficients will be independent of one another because the components are independent of one another. In this case, you actually say how much of the variation in the variable of interest is explained by each of the individual components. This is something that you can not normally do in multiple regression.

One of the problems with this analysis is that the analysis is not as 'clean' as one would like with all of the numbers involved. For example, in looking at the second and third components, the economy is considered to be significant for both of those components. As you can see, this will lead to an ambiguous interpretation in our analysis.

An alternative method of data reduction is Factor Analysis where factor rotations are used to reduce the complexity and obtain a cleaner interpretation of the data.