8.2.4 - Hypothesis Test for the Population Slope

As mentioned, the test for the slope follows the logic for a one sample hypothesis test for the mean. Typically (and will be the case in this course) we test the null hypothesis that the slope is equal to zero. However, it is possible to test the null hypothesis that the slope is zero or less than zero OR test the null hypothesis that the slope is zero or greater than zero.

 

Research Question Is there a linear relationship? Is there a positive linear relationship? Is there a negative linear relationship?
Null Hypothesis \(\beta_1=0\) \(\beta_1=0\) \(\beta_1=0\)
Alternative Hypothesis \(\beta_1\ne0\) \(\beta_1>0\) \(\beta_1<0\)
Type of Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional

The test statistic for the test of population slope is:

\(t^*=\dfrac{\hat{\beta}_1}{\hat{SE}(\hat{\beta}_1)}\)

where \(\hat{SE}(\hat{\beta}_1)\) is the estimated standard error of the sample slope (found in Minitab output). Under the null hypothesis and with the assumptions shown in the previous section, \(t^*\) follows a \(t\)-distribution with \(n-2\) degrees of freedom.

Take another look at the output from Bob’s data.

Coefficients
Predictor Coef SE Coef T-Value P-Value VIF
Constant 49.542 0.560 88.40 0.000  
Critical Areas 10.417 0.115 90.92 0.000 1.00
Regression Equation

Cost = 49.542 + 10.417 Critical Areas

Here we can see that the “T-Value” is 90.92, a very large t value indicating the difference between the null value for the slope (zero) is very different from the value for the slope calculated by the least-squares method (10.417). This results in a small probability value that the null is true (P-Value is less then .05), so Bob can reject the null, and conclude that the slope is not zero. Therefore, the number of critical areas significantly predicts the cost of development.

He can be more specific and conclude that for every one unit change in critical areas, the cost of development increases by 10.417.

Note! In this class, we will have Minitab perform the calculations for this test. Minitab's output gives the result for two-tailed tests for \(\beta_1\) and \(\beta_0\). If you wish to perform a one-sided test, you would have to adjust the p-value Minitab provides.

As with most of our calculations, we need to allow some room for imprecision in our estimate. We return to the concept of confidence intervals to build in some error around the estimate of the slope.

The \( (1-\alpha)100\%\) confidence interval for \(\beta_1\) is:

\(\hat{\beta}_1\pm t_{\alpha/2}\left(\hat{SE}(\hat{\beta}_1)\right)\)

where \(t\) has \(n-2\) degrees of freedom.

Note! The degrees of freedom of t depends on the number of independent variables. The degrees of freedom is \(n - 2\) when there is only one independent variable.

The final piece of output from Minitab is the Least Squares Regression Equation. Remember that Bob is interested in being able to predict the development cost of land given the number of critical areas. Bob can use the equation to do this.

If a given piece of land has 10 critical areas, Bob can “plug in” the value of “10” for X, the resulting equation

\(Cost = 49.542 + 10.417 * 10\)

Results in a predicted cost of:

\(153.712 = 49.542 + 10.417 * 10\)

So, if Bob knows a piece of land has 10 critical areas, he can predict the development cost will be about 153 dollars!

Using the 10 critical features allowed Bob to predict the development cost, but there is an important distinction to make about predicting an “AVERAGE” cost, or a “SPECIFIC” cost. These are represented by ‘CONFIDENCE INTERVALS” versus ‘PREDICTION INTERVALS’ for new observations. (notice the difference here is that we are referring to a new observation as opposed to above when we used confidence intervals for the estimate of the slope!)

The mean response at a given X value is given by:

\(E(Y)=\beta_0+\beta_1X\)

Inferences about Outcome for New Observation Section

  • The point estimate for the outcome at \(X = x\) is provided above.
  • The interval to estimate the mean response is called the confidence interval. Minitab calculates this for us.
  • The interval used to estimate (or predict) an outcome is called prediction interval.

For a given x value, the prediction interval and confidence interval have the same center, but the width of the prediction interval is wider than the width of the confidence interval. That makes good sense since it is harder to estimate a value for a single subject (for example a particular piece of land in Bob’s town that may have some unique features)  than it would be to estimate the average for all pieces of land. Again, Minitab will calculate this interval as well.