10.5 - Uncorrelated Predictors

In order to get a handle on this multicollinearity thing, let's first investigate the effects that uncorrelated predictors have on regression analyses. To do so, we'll investigate a "contrived" data set, in which the predictors are perfectly uncorrelated. Then, we'll investigate a second example of a "real" data set, in which the predictors are nearly uncorrelated. Our two investigations will allow us to summarize the effects that uncorrelated predictors have on regression analyses.

Then, on the next page, we'll investigate the effects that highly correlated predictors have on regression analyses. In doing so, we'll learn — and therefore be able to summarize — the various effects multicollinearity has on regression analyses.

What is the effect on regression analyses if the predictors are perfectly uncorrelated?

Consider the following matrix plot of the response y and two predictors x₁ and x₂, of a contrived data set (uncorrpreds.txt), in which the predictors are perfectly uncorrelated:

matrix plot

As you can see there is no apparent relationship at all between the predictors x₁ and x₂. That is, the correlation between x₁ and x₂ is zero:

suggesting the two predictors are perfectly uncorrelated.

Now, let's just proceed quickly through the output of a series of regression analyses collecting various pieces of information along the way. When we're done, we'll review what we learned by collating the various items in a summary table.

The regression of the response y on the predictor x₁:

Minitab output

yields the estimated coefficient b₁ = –1.00, the standard error se(b₁) = 1.47, and the regression sum of squares SSR(x₁) = 8.000.

The regression of the response y on the predictor x₂:

Minitab output

yields the estimated coefficient b₂ = –1.75, the standard error se(b₂) = 1.35, and the regression sum of squares SSR(x₂) = 24.50.

The regression of the response y on the predictors x₁and x₂ (in that order):

Minitab output

yields the estimated coefficients b₁ = –1.00 and b₂ = –1.75, the standard errors se(b₁) = 1.41 and se(b₂) = 1.41, and the sequential sum of squares SSR(x₂|x₁) = 24.500.

The regression of the response y on the predictors x₂and x₁ (in that order):

Minitab output

yields the estimated coefficients b₁ = –1.00 and b₂ = –1.75, the standard errors se(b₁) = 1.41 and se(b₂) = 1.41, and the sequential sum of squares SSR(x₁|x₂) = 8.000.

Okay — as promised — compiling the results in a summary table, we obtain:

Model	b₁	se(b₁)	b₂	se(b₂)	Seq SS
x₁ only	-1.00	1.47	---	---	SSR(x₁) 8.000
x₂ only	---	---	-1.75	1.35	SSR(x₂) 24.50
x₁, x₂ (in order)	-1.00	1.41	-1.75	1.41	SSR(x₂\|x₁) 24.500
x₂, x₁ (in order)	-1.00	1.41	-1.75	1.41	SSR(x₁\|x₂) 8.000

What do we observe?

The estimated slope coefficients b₁ and b₂ are the same regardless of the model used.
The standard errors se(b₁) and se(b₂) don't change much at all from model to model.
The sum of squares SSR(x₁) is the same as the sequential sum of squares SSR(x₁|x₂). The sum of squares SSR(x₂) is the same as the sequential sum of squares SSR(x₂|x₁).

These all seem to be good things! Because the slope estimates stay the same, the effect on the response ascribed to a predictor doesn't depend on the other predictors in the model. Because SSR(x₁) = SSR(x₁|x₂), the marginal contribution that x₁ has in reducing the variability in the response y doesn't depend on the predictor x₂. Similarly, because SSR(x₂) = SSR(x₂|x₁), the marginal contribution that x₂ has in reducing the variability in the response y doesn't depend on the predictor x₁.

These are the things we can hope for in a regression analyis —but, then reality sets in! Recall that we obtained the above results for a contrived data set, in which the predictors are perfectly uncorrelated. Do we get similar results for real data with only nearly uncorrelated predictors? Let's see!

What is the effect on regression analyses if the predictors are nearly uncorrelated?

To investigate this question, let's go back and take a look at the blood pressure data set (bloodpress.txt). In particular, let's focus on the relationships among the response y = BP and the predictors x₃ = BSA and x₆ = Stress:

plot

As the above matrix plot and the following correlation matrix suggest:

minitab output

there appears to be a strong relationship between y = BP and the predictor x₃ = BSA (r = 0.866), a weak relationship between y = BP and x₆ = Stress (r = 0.164), and an almost non-existent relationship between x₃ = BSA and x₆ = Stress (r = 0.018). That is, the two predictors are nearly perfectly uncorrelated.

What effect do these nearly perfectly uncorrelated predictors have on regression analyses? Let's proceed similarly through the output of a series of regression analyses collecting various pieces of information along the way. When we're done, we'll review what we learned by collating the various items in a summary table.

The regression of the response y = BP on the predictor x₆= Stress:

minitab output

yields the estimated coefficient b₆ = 0.0240, the standard error se(b₆) = 0.0340, and the regression sum of squares SSR(x₆) = 15.04.

The regression of the response y = BP on the predictor x₃= BSA:

minitab output

yields the estimated coefficient b₃ = 34.44, the standard error se(b₃) = 4.69, and the regression sum of squares SSR(x₃) = 419.858.

The regression of the response y = BP on the predictors x₆= Stress and x₃= BSA (in that order):

minitab output

yields the estimated coefficients b₆ = 0.0217 and b₃ = 34.33, the standard errors se(b₆) = 0.0170 and se(b₃) = 4.61, and the sequential sum of squares SSR(x₃|x₆) = 417.07.

Finally, the regression of the response y = BP on the predictors x₃= BSA and x₆= Stress (in that order):

minitab output

yields the estimated coefficients b₆ = 0.0217 and b₃ = 34.33, the standard errors se(b₆) = 0.0170 and se(b₃) = 4.61, and the sequential sum of squares SSR(x₆|x₃) = 12.26.

Again — as promised — compiling the results in a summary table, we obtain:

Model	b₆	se(b₆)	b₃	se(b₃)	Seq SS
x₆ only	0.0240	0.0340	---	---	SSR(x₆) 15.04
x₃ only	---	---	34.44	4.69	SSR(x₃) 419.858
x₆, x₃ (in order)	0.0217	0.0170	34.33	4.61	SSR(x₃\|x₆) 417.07
x₃, x₆ (in order)	0.0217	0.0170	34.33	4.61	SSR(x₆\|x₃) 12.26

What do we observe? If the predictors are nearly perfectly uncorrelated:

We don't get identical, but very similar slope estimates b₃ and b₆, regardless of the predictors in the model.
The sum of squares SSR(x₃) is not the same, but very similar to the sequential sum of squares SSR(x₃|x₆).
The sum of squares SSR(x₆) is not the same, but very similar to the sequential sum of squares SSR(x₆|x₃).

Again, these are all good things! In short, the effect on the response ascribed to a predictor is similar regardless of the other predictors in the model. And, the marginal contribution of a predictor doesn't appear to depend much on the other predictors in the model.

10.5 - Uncorrelated Predictors

What is the effect on regression analyses if the predictors are perfectly uncorrelated?

What is the effect on regression analyses if the predictors are nearly uncorrelated?

Navigation

Start Here!

Lessons

Resources