At the beginning of this lesson, we translated three different research questions pertaining to the heart attacks in rabbits study (Cool Hearts dataset) into three sets of hypotheses we can test using the general linear Fstatistic. The research questions and their corresponding hypotheses are:
 Hypotheses 1
Is the regression model containing at least one predictor useful in predicting the size of the infarct?
 \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3} = 0\)
 \(H_{A} \colon\) At least one \(\beta_{j} ≠ 0\) (for j = 1, 2, 3)
 Hypotheses 2
Is the size of the infarct significantly (linearly) related to the area of the region at risk?
 \(H_{0} \colon \beta_{1} = 0 \)
 \(H_{A} \colon \beta_{1} \ne 0 \)
 Hypotheses 3
(Primary research question) Is the size of the infarct area significantly (linearly) related to the type of treatment upon controlling for the size of the region at risk for infarction?
 \(H_{0} \colon \beta_{2} = \beta_{3} = 0\)
 \(H_{A} \colon \) At least one \(\beta_{j} ≠ 0\) (for j = 2, 3)
Let's test each of the hypotheses now using the general linear Fstatistic:
\(F^*=\left(\dfrac{SSE(R)SSE(F)}{df_Rdf_F}\right) \div \left(\dfrac{SSE(F)}{df_F}\right)\)
To calculate the Fstatistic for each test, we first determine the error sum of squares for the reduced and full models — SSE(R) and SSE(F), respectively. The number of error degrees of freedom associated with the reduced and full models — \(df_{R}\) and \(df_{F}\), respectively — is the number of observations, n, minus the number of parameters, p, in the model. That is, in general, the number of error degrees of freedom is np. We use statistical software, such as Minitab's Fdistribution probability calculator, to determine the Pvalue for each test.
Testing all slope parameters equal 0 Section
To answer the research question: "Is the regression model containing at least one predictor useful in predicting the size of the infarct?," we test the hypotheses:
 \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3} = 0 \)
 \(H_{A} \colon\) At least one \(\beta_{j} \ne 0 \) (for j = 1, 2, 3)
 The full model
The full model is the largest possible model — that is, the model containing all of the possible predictors. In this case, the full model is:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)
The error sum of squares for the full model, SSE(F), is just the usual error sum of squares, SSE, that appears in the analysis of variance table. Because there are 4 parameters in the full model, the number of error degrees of freedom associated with the full model is \(df_{F} = n  4\).
 The reduced model
The reduced model is the model that the null hypothesis describes. Because the null hypothesis sets each of the slope parameters in the full model equal to 0, the reduced model is:
\(y_i=\beta_0+\epsilon_i\)
The reduced model basically suggests that none of the variation in the response y is explained by any of the predictors. Therefore, the error sum of squares for the reduced model, SSE(R), is just the total sum of squares, SSTO, that appears in the analysis of variance table. Because there is only one parameter in the reduced model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n  1 \).
 The test
Upon plugging in the above quantities, the general linear Fstatistic:
\(F^*=\dfrac{SSE(R)SSE(F)}{df_Rdf_F} \div \dfrac{SSE(F)}{df_F}\)
becomes the usual "overall Ftest":
\(F^*=\dfrac{SSR}{3} \div \dfrac{SSE}{n4}=\dfrac{MSR}{MSE}\)
That is, to test \(H_{0}\) : \(\beta_{1} = \beta_{2} = \beta_{3} = 0 \), we just use the overall Ftest and Pvalue reported in the analysis of variance table:
Analysis of Variance
Source DF Adj SS Adj MS F Value PValue Regression 3 0.95927 0.31976 16.43 0.000 Area 1 0.63742 0.63742 32.75 0.000 X2 1 0.29733 0.29733 15.28 0.001 X3 1 0.01981 0.01981 1.02 0.322 Error 28 0.54491 0.01946 Total 31 1.50418 Regression Equation
Inf =  0.135 + 0.613 Area  0.2435 X2  0.0657 X3
There is sufficient evidence (F = 16.43, P < 0.001) to conclude that at least one of the slope parameters is not equal to 0.
In general, to test that all of the slope parameters in a multiple linear regression model are 0, we use the overall Ftest reported in the analysis of variance table.
Testing one slope parameter is 0 Section
Now let's answer the second research question: "Is the size of the infarct significantly (linearly) related to the area of the region at risk?" To do so, we test the hypotheses:
 \(H_{0} \colon \beta_{1} = 0 \)
 \(H_{A} \colon \beta_{1} \ne 0 \)
 The full model
Again, the full model is the model containing all of the possible predictors:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)
The error sum of squares for the full model, SSE(F), is just the usual error sum of squares, SSE. Alternatively, because the three predictors in the model are \(x_{1}\), \(x_{2}\), and \(x_{3}\), we can denote the error sum of squares as SSE(\(x_{1}\), \(x_{2}\), \(x_{3}\)). Again, because there are 4 parameters in the model, the number of error degrees of freedom associated with the full model is \(df_{F} = n  4 \).
 The reduced model
Because the null hypothesis sets the first slope parameter, \(\beta_{1}\), equal to 0, the reduced model is:
\(y_i=(\beta_0+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)
Because the two predictors in the model are \(x_{2}\) and \(x_{3}\), we denote the error sum of squares as SSE(\(x_{2}\), \(x_{3}\)). Because there are 3 parameters in the model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n  3\).
 The test
The general linear statistic:
\(F^*=\dfrac{SSE(R)SSE(F)}{df_Rdf_F} \div \dfrac{SSE(F)}{df_F}\)
simplifies to:
\(F^*=\dfrac{SSR(x_1x_2, x_3)}{1}\div \dfrac{SSE(x_1,x_2, x_3)}{n4}=\dfrac{MSR(x_1x_2, x_3)}{MSE(x_1,x_2, x_3)}\)
Getting the numbers from the Minitab output:
Analysis of Variance
Source DF Adj SS Adj MS F Value PValue Regression 3 0.95927 0.31976 16.43 0.000 Area 1 0.63742 0.63742 32.75 0.000 X2 1 0.29733 0.29733 15.28 0.001 X3 1 0.01981 0.01981 1.02 0.322 Error 28 0.54491 0.01946 Total 31 1.50418 Regression Equation
Inf =  0.135 + 0.613 Area  0.2435 X2  0.0657 X3
we determine that value of the Fstatistic is:
 The test

\(F^* = \dfrac{SSR(x_1 \vert x_2, x_3)}{1} \div \dfrac{SSE(x_1, x_2, x_3)}{28} = \dfrac{0.63742}{0.01946}=32.7554\)
The Pvalue is the probability — if the null hypothesis were true — that we would get an Fstatistic larger than 32.7554. Comparing our Fstatistic to an Fdistribution with 1 numerator degree of freedom and 28 denominator degrees of freedom, Minitab tells us that the probability is close to 1 that we would observe an Fstatistic smaller than 32.7554:
F distribution with 1 DF in Numerator and 28 DF in denominator
x P ( X ≤x ) 32.7554 1.00000 Therefore, the probability that we would get an Fstatistic larger than 32.7554 is close to 0. That is, the Pvalue is < 0.001. There is sufficient evidence (F = 32.8, P < 0.001) to conclude that the size of the infarct is significantly related to the size of the area at risk after the other predictors x2 and x3 have been taken into account.
But wait a second! Have you been wondering why we couldn't just use the slope's tstatistic to test that the slope parameter, \(\beta_{1}\), is 0? We can! Notice that the Pvalue (P < 0.001) for the ttest (t* = 5.72):
Coefficients
Term Coef SE Coef TValue PValue VIF Constant 0.135 0.104 1.29 0.206 Area 0.613 0.107 5.72 0.000 1.14 X2 0.2435 0.0623 3.91 0.001 1.44 X3 0.0657 0.0651 1.01 0.322 1.57 Regression Equation
Inf =  0.135 + 0.613 Area  0.2435 X2  0.0657 X3
is the same as the Pvalue we obtained for the Ftest. This will always be the case when we test that only one slope parameter is 0. That's because of the wellknown relationship between a tstatistic and an Fstatistic that has one numerator degree of freedom:
\(t_{(np)}^{2}=F_{(1, np)}\)
For our example, the square of the tstatistic, 5.72, equals our Fstatistic (within rounding error). That is:
\(t^{*2}=5.72^2=32.72=F^*\)
So what have we learned in all of this discussion about the equivalence of the Ftest and the ttest? In short:
Compare the output obtained when \(x_{1}\) = Area is entered into the model last:
Coefficients
Term Coef SE Coef TValue PValue VIF Constant 0.135 0.104 1.29 0.206 X2 0.2435 0.0623 3.91 0.001 1.44 X3 0.0657 0.0651 1.01 0.322 1.57 Area 0.613 0.107 5.72 0.000 1.14 Regression Equation
Inf =  0.135  0.2435 X2  0.0657 X3 + 0.613 Area
to the output obtained when \(x_{1}\) = Area is entered into the model first:
Coefficients
Term Coef SE Coef TValue PValue VIF Constant 0.135 0.104 1.29 0.206 Area 0.613 0.107 5.72 0.000 1.14 X2 0.2435 0.0623 3.91 0.001 1.44 X3 0.0657 0.0651 1.01 0.322 1.57 Regression Equation
Inf =  0.135 + 0.613 Area  0.2435 X2  0.0657 X3
The tstatistic and Pvalue are the same regardless of the order in which \(x_{1}\) = Area is entered into the model. That's because — by its equivalence to the Ftest — the ttest for one slope parameter adjusts for all of the other predictors included in the model.
 We can use either the Ftest or the ttest to test that only one slope parameter is 0. Because the ttest results can be read right off of the Minitab output, it makes sense that it would be the test that we'll use most often.
 But, we have to be careful with our interpretations! The equivalence of the ttest to the Ftest has taught us something new about the ttest. The ttest is a test for the marginal significance of the \(x_{1}\) predictor after the other predictors \(x_{2}\) and \(x_{3}\) have been taken into account. It does not test for the significance of the relationship between the response y and the predictor \(x_{1}\) alone.
Testing a subset of slope parameters is 0 Section
Finally, let's answer the third — and primary — research question: "Is the size of the infarct area significantly (linearly) related to the type of treatment upon controlling for the size of the region at risk for infarction?" To do so, we test the hypotheses:
 \(H_{0} \colon \beta_{2} = \beta_{3} = 0 \)
 \(H_{A} \colon\) At least one \(\beta_{j} \ne 0 \) (for j = 2, 3)
 The full model
Again, the full model is the model containing all of the possible predictors:
\(y_i=(\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\beta_3x_{i3})+\epsilon_i\)
The error sum of squares for the full model, SSE(F), is just the usual error sum of squares, SSE. Alternatively, because the three predictors in the model are \(x_{1}\), \(x_{2}\), and \(x_{3}\), we can denote the error sum of squares as SSE(\(x_{1}\), \(x_{2}\), \(x_{3}\)). Again, because there are 4 parameters in the model, the number of error degrees of freedom associated with the full model is \(df_{F} = n  4 \).
 The reduced model
Because the null hypothesis sets the second and third slope parameters, \(\beta_{2}\) and \(\beta_{3}\), equal to 0, the reduced model is:
\(y_i=(\beta_0+\beta_1x_{i1})+\epsilon_i\)
The ANOVA table for the reduced model is:
Analysis of Variance
Source DF Adj SS Adj MS F Value PValue Regression 1 0.6249 0.62492 21.32 0.000 Area 1 0.6249 0.62492 21.32 0.000 Error 30 0.8793 0.02931 Total 31 1.5042 Because the only predictor in the model is \(x_{1}\), we denote the error sum of squares as SSE(\(x_{1}\)) = 0.8793. Because there are 2 parameters in the model, the number of error degrees of freedom associated with the reduced model is \(df_{R} = n  2 = 32 – 2 = 30\).
 The test
The general linear statistic:
\begin{align} F^*&=\dfrac{SSE(R)SSE(F)}{df_Rdf_F} \div\dfrac{SSE(F)}{df_F}\\&=\dfrac{0.87930.54491}{3028} \div\dfrac{0.54491}{28}\\&= \dfrac{0.33439}{2} \div 0.01946\\&=8.59.\end{align}
Alternatively, we can calculate the Fstatistic using a partial Ftest:
\begin{align}F^*&=\dfrac{SSR(x_2, x_3x_1)}{2}\div \dfrac{SSE(x_1,x_2, x_3)}{n4}\\&=\dfrac{MSR(x_2, x_3x_1)}{MSE(x_1,x_2, x_3)}.\end{align}
To conduct the test, we regress y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and \(x_{3 }\)— in order (and with "Sequential sums of squares" selected under "Options"):
Analysis of Variance
Source DF Seq SS Seq MS F Value PValue Regression 3 0.95927 0.31976 16.43 0.000 Area 1 0.62492 0.63492 32.11 0.000 X2 1 0.3143 0.31453 16.16 0.001 X3 1 0.01981 0.01981 1.02 0.322 Error 28 0.54491 0.01946 Total 31 1.50418 Coefficients
Term Coef SE Coef TValue PValue VIF Constant 0.135 0.104 1.29 0.206 Area 0.613 0.107 5.72 0.000 1.14 X2 0.2435 0.0623 3.91 0.001 1.44 X3 0.0657 0.0651 1.01 0.322 1.57 Regression Equation
Inf =  0.135 + 0.613 Area  0.2435 X2  0.0657 X3
yielding SSR(\(x_{2}\)  \(x_{1}\)) = 0.31453, SSR(\(x_{3}\)  \(x_{1}\), \(x_{2}\)) = 0.01981, and MSE = 0.54491/28 = 0.01946. Therefore, the value of the partial Fstatistic is:
\begin{align} F^*&=\dfrac{SSR(x_2, x_3x_1)}{2}\div \dfrac{SSE(x_1,x_2, x_3)}{n4}\\&=\dfrac{0.31453+0.01981}{2}\div\dfrac{0.54491}{28}\\&= \dfrac{0.33434}{2} \div 0.01946\\&=8.59,\end{align}
which is identical (within roundoff error) to the general Fstatistic above. The Pvalue is the probability — if the null hypothesis were true — that we would observe a partial Fstatistic more extreme than 8.59. The following Minitab output:
F distribution with 2 DF in Numerator and 28 DF in denominator
x P ( X ≤ x ) 8.59 0.998767 tells us that the probability of observing such an Fstatistic that is smaller than 8.59 is 0.9988. Therefore, the probability of observing such an Fstatistic that is larger than 8.59 is 1  0.9988 = 0.0012. The Pvalue is very small. There is sufficient evidence (F = 8.59, P = 0.0012) to conclude that the type of cooling is significantly related to the extent of damage that occurs — after taking into account the size of the region at risk.
Summary of MLR Testing Section
For the simple linear regression model, there is only one slope parameter about which one can perform hypothesis tests. For the multiple linear regression model, there are three different hypothesis tests for slopes that one could conduct. They are:
 Hypothesis test for testing that all of the slope parameters are 0.
 Hypothesis test for testing that a subset — more than one, but not all — of the slope parameters are 0.
 Hypothesis test for testing that one slope parameter is 0.
We have learned how to perform each of the above three hypothesis tests. Along the way, we also took two detours — one to learn about the "general linear Ftest" and one to learn about "sequential sums of squares." As you now know, knowledge about both are necessary in performing the three hypothesis tests.
The Fstatistic and associated pvalue in the ANOVA table are used for testing whether all of the slope parameters are 0. In most applications this pvalue will be small enough to reject the null hypothesis and conclude that at least one predictor is useful in the model. For example, for the rabbit heart attacks study, the Fstatistic is (0.95927/(4–1)) / (0.54491/(32–4)) = 16.43 with pvalue 0.000.
To test whether a subset — more than one, but not all — of the slope parameters are 0, there are two equivalent ways to calculate the Fstatistic:
 Use the general linear Ftest formula by fitting the full model to find SSE(F) and fitting the reduced model to find SSE(R). Then the numerator of the Fstatistic is (SSE(R) – SSE(F)) / ( \(df_{R}\) – \(df_{F}\)).
 Alternatively, use the partial Ftest formula by fitting only the full model but making sure the relevant predictors are fitted last and "sequential sums of squares" have been selected. Then the numerator of the Fstatistic is the sum of the relevant sequential sums of squares divided by the sum of the degrees of freedom for these sequential sums of squares. The denominator of the Fstatistic is the mean squared error in the ANOVA table.
For example, for the rabbit heart attacks study, the general linear Fstatistic is ((0.8793 – 0.54491) / (30 – 28)) / (0.54491 / 28) = 8.59 with pvalue 0.0012. Alternatively, the partial Fstatistic for testing the slope parameters for predictors \(x_{2}\) and \(x_{3}\) using sequential sums of squares is ((0.31453 + 0.01981) / 2) / (0.54491 / 28) = 8.59.
To test whether one slope parameter is 0, we can use an Ftest as just described. Alternatively, we can use a ttest, which will have an identical pvalue since in this case the square of the tstatistic is equal to the Fstatistic. For example, for the rabbit heart attacks study, the Fstatistic for testing the slope parameter for the Area predictor is (0.63742/1) / (0.54491/(32–4)) = 32.75 with pvalue 0.000. Alternatively, the tstatistic for testing the slope parameter for the Area predictor is 0.613 / 0.107 = 5.72 with pvalue 0.000, and \(5.72^{2} = 32.72\).
Incidentally, you may be wondering why we can't just do a series of individual ttests to test whether a subset of the slope parameters are 0. For example, for the rabbit heart attacks study, we could have done the following:
 Fit the model of y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and \(x_{3}\) and use an individual ttest for \(x_{3}\).
 If the test results indicate that we can drop \(x_{3}\) then fit the model of y = InfSize on \(x_{1}\) = Area and \(x_{2}\) and use an individual ttest for \(x_{2}\).
The problem with this approach is we're using two individual ttests instead of one Ftest, which means our chance of drawing an incorrect conclusion in our testing procedure is higher. Every time we do a hypothesis test, we can draw an incorrect conclucion by:
 rejecting a true null hypothesis, i.e., make a type I error by concluding the tested predictor(s) should be retained in the model, when in truth it/they should be dropped; or
 failing to reject a false null hypothesis, i.e., make a type II error by concluding the tested predictor(s) should be dropped from the model, when in truth it/they should be retained.
Thus, in general, the fewer tests we perform the better. In this case, this means that wherever possible using one Ftest in place of multiple individual ttests is preferable.
Try it!
Hypothesis tests for the slope parameters Section
The problems in this section are designed to review the hypothesis tests for the slope parameters, as well as to give you some practice on models with a threegroup qualitative variable (which we'll cover in more detail in Lesson 8). We consider tests for:
 whether one slope parameter is 0 (for example, \(H_{0} \colon \beta_{1} = 0 \))
 whether a subset (more than one but less than all) of the slope parameters are 0 (for example, \(H_{0} \colon \beta_{2} = \beta_{3} = 0 \) against the alternative \(H_{A} \colon \beta_{2} \ne 0 \) or \(\beta_{3} \ne 0 \) or both ≠ 0)
 whether all of the slope parameters are 0 (for example, \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3}\) = 0 against the alternative \(H_{A} \colon \) at least one of the \(\beta_{i}\) is not 0)
(Note the correct specification of the alternative hypotheses for the last two situations.)
Sugar beets study
A group of researchers were interested in studying the effects of three different growth regulators (treat, denoted 1, 2, and 3) on the yield of sugar beets (y = yield, in pounds). They planned to plant the beets in 30 different plots and then to randomly treat 10 plots with the first growth regulator, 10 plots with the second growth regulator, and 10 plots with the third growth regulator. One problem, though, is that the amount of available nitrogen in the 30 different plots varies naturally, thereby giving a potentially unfair advantage to plots with higher levels of available nitrogen. Therefore, the researchers also measured and recorded the available nitrogen (\(x_{1}\) = nit, in pounds/acre) in each plot. They are interested in comparing the mean yields of sugar beets subjected to the different growth regulators after taking into account the available nitrogen. The Sugar Beets dataset contains the data from the researcher's experiment.

Preliminary Work

Create a scatter plot with y = yield on the yaxis and x = nit on the xaxis — in doing so, use the qualitative ("grouping") variable treat to denote whether each plot received the first, second or third growth regulator. Does the plot suggest that it is reasonable to formulate a multiple regression model that would place three parallel lines through the data?
The plot shows a similar positive linear trend within each treatment category, which suggests that it is reasonable to formulate a multiple regression model that would place three parallel lines through the data.

Because the qualitative variable treat distinguishes between the three treatment groups (1, 2, and 3), we need to create two indicator variables, \(x_{2}\) and \(x_{3}\), say, in order to fit a linear regression model to these data. The new indicator variables should be defined as follows:
treat \(x_2\) \(x_3\) 1 1 0 2 0 1 3 0 0 Use Minitab's Calc >> Make Indicator Variables command to create the new indicator variables in your worksheet
Minitab creates an indicator variable for each treatment group but we can only use two, for treatment groups 1 and 2 in this case (treatment group 3 is the reference level in this case).

Then, if we assume the trend in the data can be summarized by this regression model:
\(y_{i} = \beta_{0}\) + \(\beta_{1}\)\(x_{1}\) + \(\beta_{2}\)\(x_{2}\) + \(\beta_{3}\)\(x_{3}\) + \(\epsilon_{i}\)
where \(x_{1}\) = nit and \(x_{2}\) and \(x_{3}\) are defined as above, what is the mean response function for plots receiving treatment 3? for plots receiving treatment 1? for plots receiving treatment 2? Are the three regression lines that arise from our formulated model parallel? What does the parameter \(\beta_{2}\) quantify? And, what does the parameter \(\beta_{3}\) quantify?
The fitted equation from Minitab is Yield = 84.99 + 1.3088 Nit  2.43 \(x_{2}\)  2.35 \(x_{3}\), which means that the equations for each treatment group are:
 Group 1: Yield = 84.99 + 1.3088 Nit  2.43(1) = 82.56 + 1.3088 Nit
 Group 2: Yield = 84.99 + 1.3088 Nit  2.35(1) = 82.64 + 1.3088 Nit
 Group 3: Yield = 84.99 + 1.3088 Nit
The three estimated regression lines are parallel since they have the same slope, 1.3088.
The regression parameter for \(x_{2}\) represents the difference between the estimated intercept for treatment 1 and the estimated intercept for the reference treatment 3.
The regression parameter for \(x_{3}\) represents the difference between the estimated intercept for treatment 2 and the estimated intercept for the reference treatment 3.


Testing whether all of the slope parameters are 0

The researchers are interested in answering the following research question: "Is the regression model containing at least one predictor useful in predicting the size of sugar beet yield?" To answer this research question, how should the researchers specify their null and alternative hypotheses?
\(H_0 \colon \beta_1 = \beta_2 = \beta_3 = 0\) against the alternative \(H_A \colon \) at least one of the \(\beta_i\) is not 0.

Fit the linear regression model with y = yield and \(x_{1}\) = nit and \(x_{2}\) and \(x_{3}\) as predictors. To test \(H_{0} \colon \beta_{1} = \beta_{2} = \beta_{3} = 0 \), we can use the "overall F" test, constructed as:
\(F=\dfrac{SSR(X_1,X_2,X_3)\div3}{SSE(X_1,X_2,X_3)\div(n4)}=\dfrac{MSR(X_1,X_2,X_3)}{MSE(X_1,X_2,X_3)}\)
\(F = \dfrac{\frac{16039.5}{3}}{\frac{1078.0}{304}} = \dfrac{5346.5}{41.46} = 128.95\)

This is just the Ftest and associated Pvalue reported in the analysis of variance table. Make a decision for the researchers at the \(\alpha = 0.05\) level
Since the pvalue for this Fstatistic is reported as 0.000, we reject \(H_{0}\) in favor of \(H_{A}\) and conclude that at least one of the slope parameters is not zero, i.e., the regression model containing at least one predictor is useful in predicting the size of sugar beet yield.


Tests for whether one slope parameter is 0

The researchers are interested in answering the following research question: "Is sugar beet yield significantly linearly related to the available nitrogen?" (The answer to this question will educate the researchers on whether they need to worry about differences in nitrogen in future experiments.) To answer this research question, how should the researchers specify their null and alternative hypotheses?
\(H_0 \colon \beta_1= 0\) against the alternative \(H_A \colon \beta_1 \ne 0\)

Fit the linear regression model with y = yield and (in order) \(x_{2}\) and \(x_{3}\) and \(x_{1}\) = nit as predictors. (In Minitab click "Model" and use the arrows to reorder the "Terms in the model." Also click "Options" and select "Sequential (Type I)" for "Sum of squares for tests.") To test \(H_{0} \colon \beta_{1} = 0 \), we know that we can use the ttest that Minitab displays as a default. What is the value of the tstatistic and its associated Pvalue? What does this Pvalue tell the scientists about nitrogen?
tstatistic = 19.60, pvalue = 0.000, so we reject \(H_{0}\) in favor of \(H_{A}\) and conclude that the slope parameter for \(x_{1}\) = nit is not zero, i.e., sugar beet yield is significantly linearly related to the available nitrogen (controlling for treatment).

Alternatively, note that we can use a "partial F" test, constructed as:
\(F=\dfrac{SSR(X_1X_2,X_3)\div1}{SSE(X_1,X_2,X_3)\div(n4)}=\dfrac{MSR(X_1X_2,X_3)}{MSE(X_1,X_2,X_3)}\)
Use the Minitab output to calculate the value of this F statistic. Does the value you obtain equal \(t^{2}\), the square of the tstatistic as we might expect?
\(Fstatistic= \dfrac{\frac{15934.5}{1}}{\frac{1078.0}{304}} = \dfrac{15934.5}{41.46} = 384.32\), which is the same as \(19.60^{2}\).
Because \(t^{2}\) will equal the partial Fstatistic whenever you test for whether one slope parameter is 0, it makes sense to just use the tstatistic and Pvalue that Minitab displays as a default. But, note that we've just learned something new about the meaning of the ttest in the multiple regression setting. It tests for the ("marginal") significance of the \(x_{1}\) predictor after \(x_{2}\) and \(x_{3}\) have already been taken into account.


Tests for whether a subset of the slope parameters are 0

The researchers are interested in answering the following research question: "Is there a significant difference in the mean yields of sugar beets subjected to the different growth regulators after taking into account the available nitrogen?" To answer this research question, how should the researchers specify their null and alternative hypotheses?
\(H_0 \colon \beta_2=\beta_3= 0\) against the alternative \(H_A \colon \beta_2 \ne 0\) or \(\beta_3 \ne 0\) or both \(\ne 0\).

Fit the linear regression model with y = yield and (in order) \(x_{1}\) = nit and \(x_{2}\) and \(x_{3}\) as predictors. To test \(H_{0} \colon \beta_{2} = \beta_{3} = 0 \), we can use a "partial F" test, constructed as:
\(F=\dfrac{SSR(X_2,X_3X_1)\div2}{SSE(X_1,X_2,X_3)\div(n4)}=\dfrac{MSR(X_2,X_3X_1)}{MSE(X_1,X_2,X_3)}\)
\(F = \dfrac{\frac{10.4+27.5}{2}}{\frac{1078.0}{304}} = \dfrac{18.95}{41.46} = 0.46\).
F distribution with 2 DF in Numerator and 26 DF in denominator
x P ( X ≤ x ) 0.46 0.363677 pvalue \(= 10.363677 = 0.636\), so we fail to reject \(H_{0}\) in favor of \(H_{A}\) and conclude that we cannot rule out \(\beta_2 = \beta_3 = 0\), i.e., there is no significant difference in the mean yields of sugar beets subjected to the different growth regulators after taking into account the available nitrogen.
Note that the sequential mean square due to regression, MSR(\(X_{2}\),\(X_{3}\)\(X_{1}\)), is obtained by dividing the sequential sum of square by its degrees of freedom (2, in this case, since two additional predictors \(X_{2}\) and \(X_{3}\) are considered). Use the Minitab output to calculate the value of this F statistic, and use Minitab to get the associated Pvalue. Answer the researcher's question at the \(\alpha= 0.05\) level.
