3.3 - Example Results

Let's take a look at some results for our earlier example about the number of active physicians in a Standard Metropolitan Statistical Area (SMSA - data available on the Welcome page). If I do the optimization using the equations, I obtain these values below:

\(\hat{Y}_{i}= –143.89+0.341X_{i1}–0.019X_{i2}+0.254X_{i3} \)

\(RSS(\hat{\beta})=52,942,438 \)


Let's take a look at some scatter plots. We plot one variable versus another. For instance, in the upper left-hand plot, we plot the pairs of \(x_{1}\) and y. These are two-dimensional plots, each variable plotted individually against any other variable.


STAT 501 on Linear Regression goes deeper into which scatter plots are more helpful than others. These can be indicative of potential problems that exist in your data. For instance, in the plots above you can see that \(x_{3}\) is almost a perfectly linear function of \(x_{1}\). This might indicate that there might be some problems when you do the optimization. What happens is that if \(x_{3}\) is a perfectly linear function of \(x_{1}\), then when you solve the linear equation to determine the \(β\)'s, there is no unique solution. The scatter plots help to discover such potential problems.

In practice, because there is always measurement error, you rarely get a perfect linear relationship. However, you might get something very close. In this case, the matrix,\(X ^ { T } X\), will be close to singular, causing large numerical errors in computation. Therefore, we would like to have predictor variables that are not so strongly correlated.