6.3 - Sequential (or Extra) Sums of Squares

The numerator of the general linear F-statistic — that is, \(SSE(R)-SSE(F)\) is what is referred to as a "sequential sum of squares" or "extra sum of squares."

Definition

What is a "sequential sum of squares?"
It can be viewed in either of two ways:
  • It is the reduction in the error sum of squares (SSE) when one or more predictor variables are added to the model.
  • Or, it is the increase in the regression sum of squares (SSR) when one or more predictor variables are added to the model.

In essence, when we add a predictor to a model, we hope to explain some of the variability in the response, and thereby reduce some of the error. A sequential sum of squares quantifies how much variability we explain (increase in regression sum of squares) or alternatively how much error we reduce (reduction in the error sum of squares).

Notation

The amount of error that remains upon fitting a multiple regression model naturally depends on which predictors are in the model. That is, the error sum of squares (SSE) and, hence, the regression sum of squares (SSR) depend on what predictors are in the model. Therefore, we need a way of keeping track of the predictors in the model for each calculated SSE and SSR value.

We'll just note what predictors are in the model by listing them in parentheses after any SSE or SSR. For example:

  • \(SSE({x}_{1})\) denotes the error sum of squares when \(x_{1}\) is the only predictor in the model.
  • \(SSR({x}_{1}, {x}_{2})\) denotes the regression sum of squares when \(x_{1}\) and \(x_{2}\) are both in the model.

And, we'll use notation like \(SSR(x_{2} | x_{1})\) to denote a sequential sum of squares. \(SSR(x_{2} | x_{1})\) denotes the sequential sum of squares obtained by adding \(x_{2}\) to a model already containing only the predictor \(x_{1}\). The vertical bar "|" is read as "given" — that is, "\(x_{2}\) | \(x_{1}\)" is read as "\(x_{2}\) given \(x_{1}\)." In general, the variables appearing to the right of the bar "|" are the predictors in the original model, and the variables appearing to the left of the bar "|" are the predictors newly added to the model.

Here are a few more examples of the notation:

  • The sequential sum of squares obtained by adding \(x_{1}\) to the model already containing only the predictor \(x_{2}\) is denoted as \(SSR({x}_{1}| {x}_{2})\).
  • The sequential sum of squares obtained by adding \(x_{1}\) to the model in which \(x_{2}\) and \(x_{3}\) are predictors is denoted as \(SSR({x}_{1}| {x}_{2},{x}_{3}) \).
  • The sequential sum of squares obtained by adding \(x_{1}\) and \(x_{2}\) to the model in which \(x_{3}\) is the only predictor is denoted as \(SSR({x}_{1}, {x}_{2}|{x}_{3}) \).

Let's try out the notation and the two alternative definitions of a sequential sum of squares on an example.

Example 6-3: ACL Test Scores Section

In the Allen Cognitive Level (ACL) Study, David and Riley (1990) investigated the relationship between ACL test scores and level of psychopathology. They collected the following data (Allen Test dataset) on each of the 69 patients in a hospital psychiatry unit:

  • Response y = ACL score
  • Potential predictor \(x_{1}\) = vocabulary ("Vocab") score on Shipley Institute of Living Scale
  • Potential predictor \(x_{2}\) = abstraction ("Abstract") score on Shipley Institute of Living Scale
  • Potential predictor \(x_{3}\) = score on Symbol-Digit Modalities Test ("SDMT")

If we estimate the regression function with y = ACL as the response and \(x_{1}\) = Vocab as the predictor, that is, if we "regress y = ACL on \(\boldsymbol{x_{1}}\) = Vocab," we obtain:

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2.691 2.6906 4.47 0.038
Vocab 1 2.691 2.6909 4.47 0.038
Error 67 40.359 0.6024    
Lack-of-Fit 24 7.480 0.3117 0.41 0.989
Pure Error 43 32.879 0.7646    
Total 68 43.050      

Regression Equation

ACL = 4.225 + 0.0298 Vocab

Noting that \(x_{1}\) is the only predictor in the model, the output tells us that:

  • \(SSR(x_{1}) = 2.691\)
  • \(SSE(x_{1}) = 40.359\)
  • \(SSTO = 43.050\) [There is no need to say \(SSTO(x_{1}\)) here since \(SSTO\) does not depend on which predictors are in the model.]

If we regress y = ACL on \(\boldsymbol{x_{1}}\) = Vocab and \(\boldsymbol{x_{3}}\) = SDMT, we obtain:

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Regression 2 11.7778 5.88892 12.43 0.000
Vocab 1 0.0979 0.09795 0.21 0.651
SDMT 1 9.0872 9.08723 19.18 0.000
Error 66 31.2717 0.47381    
Lack-of-Fit 64 30.7667 0.48073 1.90 0.406
Pure Error 2 0.5050 0.25250    
Total 68 43.0496      

Regression Equation

ACL = 3.845 - 0.0068 Vocab + 0.02979 SDMT

 

Noting that \(x_{1}\) and \(x_{3}\) are the predictors in the model, the output tells us:

  • \(SSR(x_{1}, x_{3}) = 11.7778\)
  • \(SSE(x_{1}, x_{3}) = 31.2717\)
  • \(SSTO = 43.0496\)

Comparing the sums of squares for this model containing \(x_{1}\) and \(x_{3}\) to the previous model containing only \(x_{1}\), we note that:

  • the error sum of squares has been reduced,
  • the regression sum of squares has increased,
  • and the total sum of squares stays the same.

For a given data set, the total sum of squares will always be the same regardless of the number of predictors in the model. Right? The total sum of squares quantifies how much the response varies — it has nothing to do with which predictors are in the model.

Now, how much has the error sum of squares decreased and the regression sum of squares increased? The sequential sum of squares SSR(\(x_{3}\) | \(x_{1}\)) tells us how much. Recall that \(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}\) is the reduction in the error sum of squares when \(\boldsymbol{x_{3}}\) is added to the model in which \(\boldsymbol{x_{1}}\) is the only predictor. Therefore:

\(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}=\boldsymbol{SSE}\boldsymbol{({x}_{1})}- \boldsymbol{SSE}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{3})}\) 

\(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}=\boldsymbol{40.359 - 31.2717 = 9.087}\)

Alternatively, \(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}\) is the increase in the regression sum of squares when \(\boldsymbol{x_{3}}\) is added to the model in which \(\boldsymbol{x_{1}}\) is the only predictor:

\(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}=\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{3})}- \boldsymbol{SSR}\boldsymbol{({x}_{1})} \) 

\(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}= \boldsymbol{11.7778 - 2.691 = 9.087}\)

Aha — we obtained the same answer! Now, even though — for the sake of learning — we calculated the sequential sum of squares by hand, Minitab and most other statistical software packages will do the calculation for you.

When fitting a regression model, Minitab outputs Adjusted (Type III) sums of squares in the Anova table by default. Adjusted sums of squares measure the reduction in the error sum of squares (or increase in the regression sum of squares) when each predictor is added to a model that contains all of the remaining predictors. So, in the Anova table above, the Adjusted SS for \(\boldsymbol{x_{1}}\) = Vocab is \(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}=\boldsymbol{0.0979}\), while the Adjusted SS for \(\boldsymbol{x_{3}}\) = SDMT is \(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}=\boldsymbol{9.0872}\). By contrast, let’s look at the output we obtain when we regress y = ACL on \(\boldsymbol{x_{1}}\) = Vocab and \(\boldsymbol{x_{3}}\) = SDMT and change the Minitab Regression Options to use Sequential (Type I) sums of squares instead of the default Adjusted (Type III) sums of squares:

Analysis of Variance

Source DF Seq SS Seq MS F-Value P-Value
Regression 2 11.7778 5.8889 12.43 0.000
Vocab 1 2.6906 2.6906 5.68 0.020
SDMT 1 9.0872 9.0872 19.18 0.000
Error 66 31.2717 0.4738    
Lack-of-Fit 64 30.7667 0.4807 1.90 0.406
Pure Error 2 0.5050 0.2525    
Total 68 43.0496      

Regression Equation

ACL = 3.845 - 0.0068 Vocab + 0.02979 SDMT

Note that the third column in the Anova table is now Sequential sums of squares ("Seq SS") rather than Adjusted sums of squares ("Adj SS"). Do the numbers in the "Seq SS" column look familiar? They should:

  • 2.6906 is the reduction in the error sum of squares — or the increase in the regression sum of squares — when you add \(x_{1}\) = Vocab to a model containing no predictors. That is, 2.6906 is just the regression sum of squares \(\boldsymbol{SSR}\boldsymbol{({x}_{1})}\).
  • 9.0872 is the reduction in the error sum of squares — or the increase in the regression sum of squares — when you add \(x_{3}\) = SDMT to a model already containing \(x_{1}\) = Vocab. That is, 9.0872 is the sequential sum of squares \(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}\).

In general, the number appearing in each row of the table is the sequential sum of squares for the row's variable given all the other variables that come before it in the table. These numbers differ from the corresponding numbers in the Anova table with Adjusted sums of squares, other than the last row. So, for the example above, the Adjusted SS and Sequential SS for \(x_{3}\) = SDMT is the same: \(\boldsymbol{SSR}\boldsymbol{({x}_{3}}| \boldsymbol{{x}_{1})}\) = 9.0872.

Order matters Section

Perhaps, you noticed from the previous illustration that the order in which we add predictors to the model determines the sequential sums of squares ("Seq SS") we get. That is, the order is important! Therefore, we'll have to pay attention to it — we'll soon see that the desired order depends on the hypothesis test we want to conduct.

Let's revisit the Allen Cognitive Level Study data to see what happens when we reverse the order in which we enter the predictors in the model. Let's start by regressing y = ACL on \(\boldsymbol{x_{3}}\) = SDMT (using the Minitab default Adjusted or Type III sums of squares):

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Regression 1 11.68 11.6799 24.95 0.000
SDMT 1 11.68 11.6799 24.95 0.000
Error 67 31.37 0.4682    
Lack-of-Fit 38 14.28 0.3758 0.64 0.904
Pure Error 29 17.09 0.5893    
Total 68 43.05      

Regression Equation

ACL = 3.753 + 0.02807 SDMT

Noting that \(x_{3}\) is the only predictor in the model, the resulting output tells us that:

  • SSR(\(x_{3}\)) = 11.68
  • SSE(\(x_{3}\)) = 31.37

Now, regressing y = ACL on \(\boldsymbol{x_{3}}\) = SDMT and \(\boldsymbol{x_{1}}\) = Vocabin that order, that is, specifying \(x_{3}\) first and \(x_{1}\) second, we obtain:

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Regression 2 11.7778 5.88892 12.43 0.000
SDMT 1 9.0872 9.08723 19.18 0.000
Vocab 1 0.0979 0.09795 0.21 0.651
Error 66 31.2717 0.47381    
Lack-of-Fit 64 30.7667 0.48073 1.90 0.406
Pure Error 2 0.5050 0.25250    
Total 68 43.0496      

Regression Equation

ACL = 3.845 + 0.02979 SDMT - 0.0068 Vocab

Noting that \(x_{1}\) and \(x_{3}\) are the two predictors in the model, the output tells us that:

  • SSR(\(x_{1}\), \(x_{3}\)) = 11.7778
  • SSE(\(x_{1}\), \(x_{3}\)) = 31.2717

How much did the error sum of squares decrease — or alternatively, the regression sum of squares increase? The sequential sum of squares \(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}\) tells us how much. \(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}\) is the reduction in the error sum of squares when \(\boldsymbol{x_{1}}\) is added to the model in which \(\boldsymbol{x_{3}}\) is the only predictor:

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}=\boldsymbol{SSE}\boldsymbol{({x}_{3})}- \boldsymbol{SSE}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{3})}\)

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}=\boldsymbol{31.37 - 31.2717 = 0.098}\)

Alternatively, \(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}\) is the increase in the regression sum of squares when \(\boldsymbol{x_{1}}\) is added to the model in which \(\boldsymbol{x_{3}}\) is the only predictor:

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}=\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{3})}- \boldsymbol{SSR}\boldsymbol{({x}_{3})} \) 

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}| \boldsymbol{{x}_{3})}= \boldsymbol{11.7778 - 11.68 = 0.098}\)

Again, we obtain the same answers. Regardless of how we perform the calculation, it appears that taking into account \(\boldsymbol{x_{1}}\) = Vocab doesn't help much in explaining the variability in y = ACL after \(\boldsymbol{x_{3}}\) = SDMT has already been considered.

Once again, we don't have to calculate sequential sums of squares by hand. Minitab does it for us. If we regress y = ACL on \(x_{3}\) = SDMT and \(x_{1}\) = Vocab in that order and use Sequential (Type I) sums of squares, we obtain:

Analysis of Variance

Source DF Seq SS Seq MS F-Value P-Value
Regression 2 11.7778 5.8889 12.43 0.000
SDMT 1 11.6799 11.6799 24.65 0.000
Vocab 1 0.0979 0.0979 0.21 0.651
Error 66 31.2717 0.4738    
Lack-of-Fit 64 30.7667 0.4807 1.90 0.406
Pure Error 2 0.5050 0.2525    
Total 68 43.0496      

Regression Equation

ACL = 3.845 + 0.02979 SDMT - 0.0068 Vocab

 

The Minitab output tells us:

  • SSR(\(x_{3}\)) = 11.6799. That is, the error sum of squares is reduced — or the regression sum of squares is increased — by 11.6799 when you add \(x_{3}\) = SDMT to a model containing no predictors.
  • SSR(\(x_{1}\) | \(x_{3}\)) = 0.0979. That is, the error sum of squares is reduced — or the regression sum of squares is increased — by (only!) 0.0979 when you add \(x_{1}\) = Vocab to a model already containing \(x_{3}\) = SDMT.

Two- (or three- or more-) degree of freedom sequential sums of squares Section

So far, we've only evaluated how much the error and regression sums of squares change when adding one additional predictor to the model. What happens if we simultaneously add two predictors to a model containing only one predictor? We obtain what is called a "two-degree-of-freedom sequential sum of squares." If we simultaneously add three predictors to a model containing only one predictor, we obtain a "three-degree-of-freedom sequential sum of squares," and so on.

There are two ways of obtaining these types of sequential sums of squares. We can:

  • either add up the appropriate one-degree-of-freedom sequential sums of squares
  • or use the definition of a sequential sum of squares

Let's try out these two methods on our Allen Cognitive Level Study example. Regressing, in order, y = ACL on \(\boldsymbol{x_{3}}\) = SDMT and \(\boldsymbol{x_{1}}\) = Vocab and \(\boldsymbol{x_{2}}\) = Abstract, and using sequential (Type I) sums of squares, we obtain:

Analysis of Variance

Source DF Seq SS Seq MS F-Value P-Value
Regression 3 12.3009 4.1003 8.67 0.000
SDMT 1 11.6799 11.6799 24.69 0.000
Vocab 1 0.0979 0.0979 0.21 0.651
Abstract 1 0.5230 0.5230 1.11 0.297
Error 65 30.7487 0.4731    
Total 68 43.0496      

Regression Equation

ACL = 3.946 + 0.02740 SDMT - 0.0174 Vocab + 0.0122 Abstract

The Minitab output tells us:

  • SSR(\(x_{3}\)) = 11.6799
  • SSR(\(x_{1}\) | \(x_{3}\)) = 0.0979
  • SSR(\(x_{2 }\)| \(x_{1}\), \(x_{3}\)) = 0.5230

Therefore, the reduction in the error sum of squares when adding \(x_{1}\) and \(x_{2}\) to a model already containing \(x_{3}\) is:

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}|\boldsymbol{{x}_{3})}= \boldsymbol{0.0979 + 0.5230 = 0.621}\)

Alternatively, we can calculate the sequential sum of squares SSR(\(x_{1}\), \(x_{2}\)| \(x_{3}\)) by definition of the reduction in the error sum of squares:

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}|\boldsymbol{{x}_{3})}= \boldsymbol{SSE}\boldsymbol{({x}_{3})}- \boldsymbol{SSE}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}, \boldsymbol{{x}_{3})}\)

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}|\boldsymbol{{x}_{3})= 31.37 - 30.7487 = 0.621}\)

or the increase in the regression sum of squares:

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}|\boldsymbol{{x}_{3})}=\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}, \boldsymbol{{x}_{3})}- \boldsymbol{SSR}\boldsymbol{({x}_{3})} \)

\(\boldsymbol{SSR}\boldsymbol{({x}_{1}}, \boldsymbol{{x}_{2}}|\boldsymbol{{x}_{3})}=\boldsymbol{12.3009 - 11.68 = 0.621}\)

Note that the Sequential (Type I) sums of squares in the Anova table add up to the (overall) regression sum of squares (SSR): 11.6799 + 0.0979 + 0.5230 = 12.3009 (within rounding error). By contrast, Adjusted (Type III) sums of squares do not have this property.

We've now finished our second aside. We can — finally — get back to the whole point of this lesson, namely learning how to conduct hypothesis tests for the slope parameters in a multiple regression model.

Try it!

Sequential sums of squares Section

These problems review the concept of "sequential (or extra) sums of squares." Sequential sums of squares are useful because they can be used to test:

  • whether one slope parameter is 0 (for example, \(H_{0}\):\(\beta_{1}\) = 0)
  • whether a subset (more than two, but less than all) of the slope parameters are 0 (for example, \(H_{0}\):\(\beta_{2}\) = \(\beta_{3}\) = 0)

Again, what is a sequential sum of squares? It can be viewed in either of two ways:

  • It is the reduction in the error sum of squares (SSE) when one or more predictor variables are added to the model.
  • Or, it is the increase in the regression sum of squares (SSR) when one or more predictor variables are added to the model.

Brain size and body size study.

Recall that the IQ Size data set contains data on the intelligence based on the performance IQ (y = PIQ) scores from the revised Wechsler Adult Intelligence Scale, brain size (\(x_{1}\) = brain) based on the count from MRI scans (given as count/10000), and body size measured by height in inches (\(x_{2}\) = height) and weight in pounds (\(x_{3}\) = weight) on 38 college students.

  1. Fit the linear regression model with \(x_{1}\) = brain as the only predictor.
    • What is the value of the error sum of squares, denoted SSE(\(X_{1}\)) since \(x_{1}\) is the only predictor in the model?
    • What is the value of the regression sum of squares, denoted SSR(\(X_{1}\)) since \(x_{1}\) is the only predictor in the model?
    • What is the value of the total sum of squares, SSTO? (There is no need to write SSTO(\(X_{1}\)) since this does not depend on which predictors are in the model.)

    SSE(\(X_{1}\)) = 16197
    SSR(\(X_{1}\)) = 2697
    SSTO = 18895

  2. Now, fit the linear regression model with the predictors (in order) \(x_{1}\) = brain and \(x_{2}\) = height in the model.
    • What is the value of the error sum of squares, denoted SSE(\(X_{1}\),\(X_{2}\)) since \(x_{1}\) and \(x_{2}\) are the only predictors in the model?
    • What is the value of the regression sum of squares, denoted SSR(\(X_{1}\),\(X_{2}\)) since \(x_{1}\) and \(x_{2}\) are the only predictors in the model?
    • Confirm that the value of SSTO is unchanged from the previous question.

    SSE(\(X_{1}\), \(X_{2}\)) = 13322
    SSR(\(X_{1}\), \(X_{2}\)) = 5573
    SSTO = 18895

  3. Now, let's use the above definitions to calculate the sequential sum of squares of adding \(X_{2}\) to the model in which \(X_{1}\) is the only predictor. We denote this quantity as SSR(\(X_{2}\)|\(X_{1}\)). (The bar "|" is read as "given.") According to the alternative definitions:
    • SSR(\(X_{2}\)|\(X_{1}\)) is the reduction in the error sum of squares when \(X_{2}\) is added to the model in which \(X_{1}\) is the only predictor. That is, SSR(\(X_{2}\)|\(X_{1}\))= SSE(\(X_{1}\)) – SSE(\(X_{1}\),\(X_{2}\)). What is the value of SSR(\(X_{2}\)|\(X_{1}\)) calculated this way?
    • Alternatively, we can think of SSR(\(X_{2}\)|\(X_{1}\)) as the increase in the regression sum of squares when \(X_{2}\) is added to the model in which \(X_{1}\) is the only predictor. That is, SSR(\(X_{2}\)|\(X_{1}\))= SSR(\(X_{1}\),\(X_{2}\)) – SSR(\(X_{1}\)). What is the value of SSR(\(X_{2}\)|\(X_{1}\)) calculated this way? Did you get the same answer as above? (You should, ignoring small round-off error).

    SSR(\(X_{2}\)|\(X_{1}\)) = SSE(\(X_{1}\)) – SSE(\(X_{1}\), \(X_{2}\)) = 16197 – 13322 = 2875
    SSR(\(X_{2}\)|\(X_{1}\)) = SSR(\(X_{1}\), \(X_{2}\)) – SSR(\(X_{1}\)) = 5573 – 2697 = 2876

  4. Note that Minitab can display a column of sequential sum of squares named "Seq SS" if we change the appropriate setting under "Options." The sequential sums of squares you get depends on the order in which you enter the predictors in the model. Refit the model from question 3 but select "Sequential (Type I)" for "Sum of squares for tests" under "Options."
    • Since you entered \(x_{1}\) = brain first, the number Minitab displays for the Seq SS for brain is SSR(\(X_{1}\)). What is the value Minitab displays for SSR(\(X_{1}\)), and is it consistent with the value of SSR(\(X_{1}\)) you obtained in question (1)? In words, how would you describe the sequential sum of squares SSR(\(X_{1}\))?
    • Since you entered, \(x_{2}\) = height second, the number Minitab displays for SeqSS for height is SSR(\(X_{2}\)|\(X_{1}\)). What is the value Minitab displays for SSR(\(X_{2}\)|\(X_{1}\)), and is it consistent with the value of SSR(\(X_{2}\)|\(X_{1}\)) you obtained in question (3)? In words, how would you describe the sequential sum of squares SSR(\(X_{2}\)|\(X_{1}\))?

    SSR(\(X_{1}\)) = 2697
    SSR(\(X_{2}\)|\(X_{1}\)) = 2876

  5. Let's make sure we see how the sequential sums of squares that we get depends on the order in which we enter the predictors in the model. Fit the linear regression model with the two predictors in the reverse order. That is, when fitting the model, indicate \(x_{2}\) = height first and \(x_{1}\) = brain second.(To do this click "Model" and re-order the "Terms in the model" using the arrows on the right.)
    • Since you entered \(x_{2}\) = height first, the number Minitab displays for the Seq SS for height is SSR(\(X_{2}\)). What is the value Minitab displays for SSR(\(X_{2}\))?
    • Since you entered \(x_{1}\) = brain second, the number Minitab displays for the Seq SS for brain is SSR(\(X_{1}\)|\(X_{2}\)). What is the value Minitab displays for SSR(\(X_{1}\)|\(X_{2}\))?
    • You can (and should!) verify the value Minitab displays for SSR(\(X_{2}\)) by fitting the linear regression model with \(x_{2}\) = height as the only predictor and verify the value Minitab displays for SSR(\(X_{1}\)|\(X_{2}\)) by using either of the two definitions.

    SSR(\(X_{2}\)) = 164
    SSR(\(X_{1}\)|\(X_{2}\)) = 5409
    SSR(\(X_{1}\)|\(X_{2}\)) = SSE(\(X_{2}\)) – SSE(\(X_{1}\), \(X_{2}\)) = 18731 – 13322 = 5409
    SSR(\(X_{1}\)|\(X_{2}\)) = SSR(\(X_{1}\), \(X_{2}\)) – SSR(\(X_{2}\)) = 5573 – 164 = 5409

  6. Sequential sum of squares can be obtained for any number of predictors that are added sequentially to the model. To see this, now fit the linear regression model with the predictors (in order) \(x_{1}\) = brain and \(x_{2}\) = height and \(x_{3}\) = weight. First:
    • The first two sequential sums of squares values, SSR(\(X_{1}\)) and SSR(\(X_{2}\)|\(X_{1}\)), should be consistent with their previous values, because you entered \(x_{1}\) = brain first and \(x_{2}\) = height second. Are they?
    • Since you entered \(x_{3}\) = weight third, the number Minitab displays for the Seq SS for weight is SSR(\(X_{3}\)|\(X_{1}\),\(X_{2}\)). What is the value Minitab displays for SSR(\(X_{3}\)|\(X_{1}\),\(X_{2}\))? Calculate SSR(\(X_{3}\)|\(X_{1}\),\(X_{2}\)) using either of the two definitions. Is your calculation consistent with the value Minitab displays under the Seq SS column? [Note that SSR(\(X_{3}\)|\(X_{1}\),\(X_{2}\)) happens to be 0.0 to one decimal place for this example, but of course this will not be true in general.]

    SSR(X1) = 2697
    SSR(\(X_{2}\)|\(X_{1}\)) = 2876
    SSR(\(X_{3}\)|\(X_{1}\), \(X_{2}\)) = 0
    SSR(\(X_{3}\)|\(X_{1}\), \(X_{2}\)) = SSE(\(X_{1}\), \(X_{2}\)) – SSE(\(X_{1}\), \(X_{2}\), \(X_{3}\)) = 13322 – 13322 = 0
    SSR(\(X_{3}\)|\(X_{1}\), \(X_{2}\)) = SSR(\(X_{1}\), \(X_{2}\), \(X_{3}\)) – SSR(\(X_{1}\), \(X_{2}\)) = 5573 – 5573 = 0

  7. All of the sequential sums of squares we considered so far are "one-degree-of-freedom sequential sums of squares," because we have only considered the effect of adding one additional predictor variable to a model. We could, however, quantify the effect of adding two additional predictor variables to a model. For example, we might want to know the effect of adding \(X_{2}\) and \(X_{3}\) to a model that already contains \(X_{1}\) as a predictor. The sequential sum of squares SSR(\(X_{2}\),\(X_{3}\)|\(X_{1}\)) quantifies this effect. It is a "two-degree-of-freedom sequential sum of squares," because it quantifies the effect of adding two additional predictor variables to the model. One-degree-of-freedom sequential sums of squares are used in testing one slope parameter such as \(H_{0}\) : \(\beta_{1}\) =0, where as two-degree-of-freedom sequential sums of squares are used in testing two slope parameters, such as \(H_{0}\) : \(\beta_{1}\) = \(\beta_{2}\) = 0.
    • Use either of the two definitions to calculate SSR(\(X_{2}\),\(X_{3}\)|\(X_{1}\)). That is, calculate SSR(\(X_{2}\),\(X_{3}\)|\(X_{1}\)) by SSR(\(X_{1}\),\(X_{2}\),\(X_{3}\)) – SSR(\(X_{1}\)) or by SSE(\(X_{1}\)) – SSE(\(X_{1}\),\(X_{2}\),\(X_{3}\)).
    • Calculate SSR(\(X_{2}\),\(X_{3}\)|\(X_{1}\)) by adding the proper one-degree of freedom sequential sum of squares, that is, SSR(\(X_{2}\)|\(X_{1}\)) + SSR(\(X_{3}\)|\(X_{1}\),\(X_{2}\)). Do you get the same answer?

    SSR(\(X_{2}\), \(X_{3}\)|\(X_{1}\)) = SSR(\(X_{1}\), \(X_{2}\), \(X_{3}\)) – SSR(\(X_{1}\)) = 5573 – 2697 = 2876
    SSR(\(X_{2}\), \(X_{3}\)|\(X_{1}\)) = SSE(\(X_{1}\)) – SSE(\(X_{1}\), \(X_{2}\), \(X_{2}\)) = 16197 – 13322 = 2875
    SSR(\(X_{2}\), \(X_{3}\)|\(X_{1}\)) = SSR(\(X_{2}\)|\(X_{1}\)) + SSR(\(X_{3}\)|\(X_{1}\), \(X_{2}\)) = 2876 + 0 = 2876

    There are two ways of obtaining two-degree-of-freedom sequential sums of squares — by the original definition of a sequential sum of square or by adding the proper one-degree of freedom sequential sums of squares.

    Incidentally, you can use the same concepts to get three-degree-of-freedom sequential sum of squares, four-degree-of-freedom sequential sum of squares, and so on.

  8. Regression sums of squares can be decomposed into a sum of sequential sum of squares. We can use a decomposition to quantify how important a predictor variable is ("marginally") in reducing the variability in the response (in the presence of the other variables in the model).
    • Fit the linear regression model with y = PIQ and (in order) \(x_{1}\) = brain and \(x_{2}\) = height. Verify that the regression sum of squares obtained, SSR(\(X_{1}\),\(X_{2}\)), is the sum of the two sequential sum of squares SSR(\(X_{1}\)) and SSR(\(X_{2}\)|\(X_{1}\)). This illustrates how SSR(\(X_{1}\),\(X_{2}\)) is "decomposed" into a sum of sequential sum of squares.
    • A regression sum of squares can be decomposed in more than way. To see this, fit the linear regression model with y = PIQ and (in order) \(x_{2}\) = height and \(x_{1}\) = brain. Verify that the regression sum of squares obtained, SSR(\(X_{1}\),\(X_{2}\)), is now the sum of the two sequential sum of squares SSR(\(X_{2}\))and SSR(\(X_{1}\)|\(X_{2}\)). That is, we've now decomposed SSR (\(X_{1}\),\(X_{2}\)) in a different way.

    SSR(\(X_{1}\), \(X_{2}\)) = SSR(\(X_{1}\)) + SSR(\(X_{2}\)|\(X_{1}\)) = 2697 + 2876 = 5573
    SSR(\(X_{1}\), \(X_{2}\)) = SSR(\(X_{2}\)) + SSR(\(X_{1}\)|\(X_{2}\)) = 164 + 5409 = 5573