Lesson #8: Overview of Multiple Linear Regression

Note: This portion of the lesson is most important for those students who will continue studying statistics after taking Stat 501. We will only rarely use the material within the remainder of this course. It is, however, particularly important for students who plan on taking Stat 502, 503, 504, or 505.

A matrix formulation of the multiple regression model

In the multiple regression setting, because of the potentially large number of predictors, it is more efficient to use matrices to define the regression model and the subsequent analyses. Here, we review basic matrix algebra, as well as learn some of the more important multiple regression formulas in matrix form.

As always, let's start with the simple case first. Consider the following simple linear regression function:

If we actually let i = 1, ..., n, we see that we obtain n equations:

Well, that's a pretty inefficient way of writing it all out! As you can see, there is a pattern that emerges. By taking advantage of this pattern, we can instead formulate the above simple linear regression function in matrix notation:

That is, instead of writing out the n equations, using matrix notation, our simple linear regression function reduces to a short and simple statement:

Now, what does this statement mean? Well, here's the answer:

Now, that might not mean anything to you, if you've never studied matrix algebra — or if you have and you forgot it all! So, let's start with a quick and basic review.

Definition of a matrix

An r × c matrix is a rectangular array of symbols or numbers arranged in r rows and c columns. A matrix is almost always denoted by a single capital letter in boldface type.

Here are three examples of simple matrices. The matrix A is a 2 × 2 square matrix containing numbers:

The matrix B is a 5 × 3 matrix containing numbers:

And, the matrix X is a 6 × 3 matrix containing a column of 1s and two columns of various x variables:

Definition of a vector and a scalar

A column vector is an r × 1 matrix, that is, a matrix with only one column. A vector is almost often denoted by a single lowercase letter in boldface type. The following vector q is a 3 × 1 column vector containing numbers:

A row vector is an 1 × c matrix, that is, a matrix with only one row. The vector h is a 1 × 4 row vector containing numbers:

A 1 × 1 "matrix" is called a scalar, but it's just an ordinary number, such as 29 or σ2.

Matrix multiplication

Recall that Xβ that appears in the regression function:

is an example of matrix multiplication. Now, there are some restrictions — you can't just multiply any two old matrices together. Two matrices can be multiplied together only if the number of columns of the first matrix equals the number of rows of the second matrix. Then, when you multiply the two matrices:

For example, if A is a 2 × 3 matrix and B is a 3 × 5 matrix, then the matrix multiplication AB is possible. The resulting matrix C = AB has 2 rows and 5 columns. That is, C is a 2 × 5 matrix. Note that the matrix multiplication BA is not possible.

For another example, if X is an n × p matrix and β is a p × 1 column vector, then the matrix multiplication Xβ is possible. The resulting matrix Xβ has n rows and 1 column. That is, Xβ is an n × 1 column vector.

Okay, now that we know when we can multiply two matrices together, how do we do it? Here's the basic rule for multiplying A by B to get C = AB:

The entry in the ith row and jth column of C is the inner product — that is, element-by-element products added together — of the ith row of A with the jth column of B.

For example:

That is, the entry in the first row and first column of C, denoted c11, is obtained by:

And, the entry in the first row and second column of C, denoted c12, is obtained by:

And, the entry in the second row and third column of C, denoted c23, is obtained by:

You might convince yourself that the remaining five elements of C have been obtained correctly.

Matrix addition

Recall that Xβ + ε that appears in the regression function:

is an example of matrix addition. Again, there are some restrictions — you can't just add any two old matrices together. Two matrices can be added together only if they have the same number of rows and columns. Then, to add two matrices, simply add the corresponding elements of the two matrices. That is:

For example:

That is, the entry in the first row and first column of C, denoted c11, is obtained by:

And, the entry in the first row and second column of C, denoted c12, is obtained by:

You might convince yourself that the remaining seven elements of C have been obtained correctly.

PRACTICE PROBLEMS: Matrix formulation of the regression function

These problems are intended to tie together the matrix manipulation techniques we learned above within the framework of the matrix formulation of the multiple regression model.

Directions. Type up your answers to each of the following questions in a Word file named practice08_yourPSUid.doc. Once you have completed all of the practice problems in this lesson, upload your file to the Lesson #8 Practice Problems dropbox.


8.1. Perform the matrix multiplication and matrix addition on the right side of the following equation:

in order to show that it equals the right side of the following n equations:


8.2. What must the size and content of the matrix β be in the following equation?

Least squares estimates in matrix notation

Here's the punchline: the p × 1 vector containing the estimates of the p parameters of the regression function can be shown to equal:

where:

As before, that might not mean anything to you, if you've never studied matrix algebra — or if you have and you forgot it all! So, let's go off and review inverses and transposes of matrices.

Definition of the transpose of a matrix

The transpose of a matrix A is a matrix, denoted A' or AT, whose rows are the columns of A and whose columns are the rows of A — all in the same order. For example, the transpose of the 3 × 2 matrix A:

is the 2 × 3 matrix A':

And, since the X matrix in the simple linear regression setting is:

the X'X matrix in the simple linear regression setting must be:

Definition of the identity matrix

The square n × n identity matrix, denoted In, is a matrix with 1's on the diagonal and 0's elsewhere. For example, the 2 × 2 identity matrix is:

The identity matrix plays the same role as the number 1 in ordinary arithmetic:

That is, when you multiply a matrix by the identity, you get the same matrix back.

Definition of the inverse of a matrix

The inverse A-1 of a square (!!) matrix A is the unique matrix such that:

That is, the inverse of A is the matrix A-1 that you have to multiply A by in order to obtain the identity matrix I. Note that I am not just trying to be cute by including (!!) in that first sentence. The inverse only exists for square matrices!

Now, finding inverses is a really messy venture. The good news is that we'll always let computers find the inverses for us. In fact, we won't even know that Minitab is finding inverses behind the scenes!

An example

Ugh! All of these definitions! Let's take a look at an example just to convince ourselves that, yes, indeed the least squares estimates are obtained by the following matrix formula:

Let's consider the data in soapsuds.txt, in which the height of suds (y = suds) in a standard dishpan was recorded for various amounts of soap (x = soap, in grams) (Draper and Smith, 1998, p. 108). Using Minitab to fit the simple linear regression model to these data, we obtain:

minitab output

Let's see if we can obtain the same answer using the above matrix formula. We previously showed that:

Using the calculator function in Minitab, we can easily calculate some parts of this formula:

minitab output

That is, the 2 × 2 matrix X'X is:

And, the 2 × 1 column vector X'Y is:

So, we've determined X'X and X'Y. Now, all we need to do is to find the inverse (X'X)-1. As mentioned before, it is very messy to determine inverses by hand. Letting computer software do the dirty work for us, it can be shown that the inverse of X'X is:

And so, putting all of our work together, we obtain the least squares estimates:

That is, the estimated intercept is b0 = -2.67 and the estimated slope is b1 = 9.51. Aha! Our estimates are the same as those reported by Minitab:

minitab output

within rounding error!

PRACTICE PROBLEMS: Least squares in matrix notation

Directions. Type up your answers to the following question in a Word file named practice08_yourPSUid.doc. Once you have completed all of the practice problems in this lesson, upload your file to the Lesson #8 Practice Problems dropbox.


8.3. Show that the inverse of:

is:

by illustrating that when you multiply the two matrices together you get the 2 × 2 identity matrix.

Linear dependence

There is just one more really critical topic that we should address here, and that is linear dependence. We say that the columns of the matrix A:

are linearly dependent, since (at least) one of the columns can be written as a linear combination of another, namely the third column is 4 × the first column. If none of the columns can be written as a linear combination of the other columns, then we say the columns are linearly independent.

Unfortunately, linear dependence is not always obvious. For example, the columns in the following matrix A:

are linearly dependent, because the first column plus the second column equals 5 × the third column.

Now, why should we care about linear dependence? Because the inverse of a square matrix exists only if the columns are linearly independent. Since the vector of regression estimates b depends on (X'X)-1, the parameter estimates b0, b1, and so on cannot be uniquely determined if some of the columns of X are linearly dependent! That is, if the columns of your X matrix — that is, two or more of your predictor variables — are linearly dependent (or nearly so), you will run into trouble when trying to estimate the regression equation.

For example, suppose for some strange reason we multiplied the predictor variable soap by 2 in the dataset soapsuds.txt. That is, we'd have two predictor variables, say soap1 (which is the original soap) and soap2 (which is 2 × the original soap):

If we tried to regress y = suds on x1 = soap1 and x2 = soap2, we see that Minitab spits out trouble:

minitab output

In short, the first moral of the story is "don't collect your data in such a way that the predictor variables are perfectly correlated." And, the second moral of the story is "if your software package reports an error message concerning high correlation among your predictor variables, then think about linear dependence and how to get rid of it."

© 2004 The Pennsylvania State University. All rights reserved.
Materials developed by Dr. Laura J. Simon (Lecturer, Penn State Department of Statistics).