# Common Procedures in Minitab

Common Procedures in Minitab##
Quick Links

##
Quick Links

# Calculate a T-Interval for a Population Mean

Calculate a T-Interval for a Population Mean##
Minitab^{®}

## Procedure

A *t*-interval for a population mean provides an interval of estimates of the unknown population mean μ.

- Select
**Stat**>**Basic Statistics**>**1 Sample t**... - Use the pull-down options to select, '
**Samples in columns**.' - Select the variable you want to analyze, (by double-clicking, or highlighting and clicking once on '
**Select**'.) so it appears in the box labeled '**Variables**'. - Select '
**Options**' ... Type in the desired confidence level the default is 95.0 in the box labeled '**Confidence level**'. (Ignore the box labeled '**Alternative**'.) - Select
**OK**. - Select
**OK**. The output will appear in the session window.

## Example

The US National Research Council currently recommends that females between the ages of 11 and 50 intake 15 milligrams of iron daily. The iron intakes of a random sample of 25 such American females are found in the dataset irondef.txt. With 95% confidence, what is the mean iron intake of all American females?

## Minitab Dialog Boxes

## Sample Minitab Output

### One-Sample T: iron

Variable | N | Mean | StDev | SE Mean | 95% CI |
---|---|---|---|---|---|

iron | 25 | 14.300 | 2.367 | 0.473 | (13.323, 15.277) |

## Video Review

# Code Numeric to Numeric Data

Code Numeric to Numeric Data##
Minitab^{®}

## Procedure

Minitab can be used to translate or "code" a column of numbers into another column of numbers. The procedure is particularly useful for creating dummy indicator variables for the qualitative predictor variables that you'd like to include in your regression model.

- In Minitab, select
**Data**>>**Recode**>>**to Numeric****...** - In the box labeled
**Recode values in the following columns**, specify the name of the numeric variable that you want to code. - In the box labeled
**Method**, specify a method for recoding the values specified above. - For instance, to recode a range of values, type the numeric values in the boxes labeled
**Lower endpoint**and**Upper endpoint**and the**Recoded Value**that you want this range to represent. Make sure you do this for every possible value of the variable that you want to code. - Select
**OK**. The new variable should appear in your worksheet.

**Note:** if you have more than one numeric variable to create, you have to code each one separately.

## Example

Sports Illustrated published results of a study designed to determine how well professional golfers putt. The data set puttgolf.txt contains data on the lengths of putts * *(*x*) and the percentage of successful putts (*y*) made by professional golfers during 15 tournaments. Only putts that were 2 to 20 feet from the hole are included in the data set. When fitting a two-piece piecewise linear regression function — connected at *x* = 10 — to the data, you have to create a new numeric dummy variable, say "*dummy*", that takes on value 0 if *x* ≤ 10 and 1 if *x* > 10. Use Minitab to code the numeric variable *length* into the numeric variable *dummy*.

## Minitab Dialog Box

### Resulting Minitab Worksheet

## Video Review

# Code a Text Variable into a Numeric Variable

Code a Text Variable into a Numeric Variable##
Minitab^{®}

## Procedure

Minitab can be used to translate or "code" a column of text values into another column of numeric values.

- In Minitab select
**Data**>>**Recode**>>**to Numeric****...** - In the box labeled
**Recode values in the following columns**, specify the name of the text variable that you want to code. - Under '
**Method**', select the option '**Recode individual values**'. - For each value of the variable that you want to code, type the text value in the box labeled
**Recoded value**. Make sure you do this for every possible value of the text variable that you want to code. - Select
**OK**. The new numeric variable should appear in your worksheet. You can rename the column in your worksheet with a more effective label if you want.

**Note:** if you have more than one text variable to create, you have to code each one separately.

## Example

The data set birthsmokers2.txt contains data on the birthweight (*y* = *Wgt*), gestation length (*x*_{1} = Gest) and mother's smoking status (*x*_{2} = Smokes, yes or no) of babies born to 32 mothers. If you wanted to fit a multiple regression model that included smoking status, you'd first have to create a numeric variable in your worksheet, *dummy* say, that equals 1 if Smokes = yes and equals 0 if Smokes = no. Create the *dummy* variable in your worksheet.

## Minitab Dialog Box

### Resulting Minitab Worksheet

## Video Review

# Conduct Best Subsets Regression

Conduct Best Subsets Regression##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Best Subsets...** - In the box labeled
**Response**, specify the response. - In the box labeled
**Free predictors**, specify the predictors that you want considered for the model. (Do not include predictors that you specify in the following**Predictors in all models**box.) - (Optional) In the box labeled
**Predictors in all models**, specify all of the predictors that must be included in every model considered. - Select
**OK**. The output will appear in the session window.

##
Example

Researchers were interested in learning how the composition of cement affected the heat evolved during the hardening of the cement. Therefore, they measured and recorded the following data (cement.txt) on 13 batches of cement:

- Response
*y*: heat evolved in calories during hardening of cement on a per gram basis - Predictor
*x*_{1}: % of tricalcium aluminate - Predictor
*x*_{2}: % of tricalcium silicate - Predictor
*x*_{3}: % of tetracalcium alumino ferrite - Predictor
*x*_{4}: % of dicalcium silicate

Perform a best subsets regression. In doing so, require that the predictor *x*_{2} be included in all models considered.

## Minitab Dialog Box

### Sample Output

### Best Subsets Regression: y versus x1, x3, x4, x2

Response is y

The following variables are included in all models: x2

Vars | R-Sq | Mallows | x | x | x | ||
---|---|---|---|---|---|---|---|

R-Sq(adj) | Cp | S | 1 | 3 | 4 | ||

1 | 97.9 | 97.4 | 2.7 | 2.4063 | x | ||

1 | 84.7 | 81.6 | 62.4 | 6.4455 | x | ||

2 | 98.2 | 97.6 | 3.0 | 2.3087 | x | x | |

2 | 98.2 | 97.6 | 3.0 | 2.3121 | x | x | |

3 | 98.2 | 97.4 | 5.0 | 2.4460 | x | x | x |

## Video Review

# Conduct Regression Error Normality Tests

Conduct Regression Error Normality Tests##
Minitab^{®}
– Procedure

If you haven't already done so, store the residuals on which you want to conduct the Ryan Joiner correlation test.

- Select
**Stat**>**Regression**>**Regression >****Fit Regression Model...** - Specify the response and the predictor variable(s).
- Select
**Storage...**Under**Diagnostic Measures**, select the type of residuals (and/or influence measures) that you want to store. Select**OK**. - Select
**OK**. The requested residuals (and/or influence measures) will be stored in your worksheet.

Once Minitab has stored the residuals in your worksheet:

- Select
**Stat**>**Basic Statistics**>**Normality Test****...** - In the box labeled
**Variable**, specify the name of the variable containing the residuals (Minitab names it something like RESI1, RESI2, ...). - Under
**Tests for Normality**, select**Anderson-Darling****,****Ryan-Joiner****,****or**.**Kolmogorov-Smirnov** - Select
**OK**. A new graph window containing the requested normal probability plot should appear.

##
Example

The data set adaptive.txt contains the Gesell adaptive scores and ages (in months) of *n* = 21 children with cyanotic heart disease. Upon regressing the response *y* = score on the predictor *x *= age, use the resulting residuals to test whether or not the error terms are normally distributed.

## Minitab Dialog Boxes

### Sample Output

## Video Review

# Conduct Stepwise Regression

Conduct Stepwise Regression##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Regression**>>**Fit Regression Model...** - In the box labeled
**Response**, specify the response. - In the box labeled
**Continuous Predictors**, specify all the predictors that you want to be considered for the model. - Click on the Stepwise button.
- Choose 'Stepwise' from among the Method pull-down options.
- (Optional) Use the buttons below the box labeled
**Potential terms**to indicate**terms to include in every model**, specify all of the predictors that must be included in every model considered. - (Optional) Specify the
**Alpha to enter**and**Alpha to remove**significance levels. The default for both is 0.15. - Check the box labeled 'Display the table of model selection details using the pull-down to select '
**Include details for each step**'. - Select
**OK**. - Select
**OK**. The output will appear in the session window.

##
Example

Researchers were interested in learning how the composition of cement affected the heat evolved during the hardening of the cement. Therefore, they measured and recorded the following data (cement.txt) on 13 batches of cement:

- Response
*y*: heat evolved in calories during hardening of cement on a per gram basis - Predictor
*x*_{1}: % of tricalcium aluminate - Predictor
*x*_{2}: % of tricalcium silicate - Predictor
*x*_{3}: % of tetracalcium alumino ferrite - Predictor
*x*_{4}: % of dicalcium silicate

Perform stepwise regression on the data set. Let *α*_{E} = *α*_{R} = 0.15. In doing so, require that the predictor *x*_{2} be included in all models considered.

## Minitab Dialog Boxes

### Sample Output

### Regression analysis: y versus x1, x2, x3, x4

Stepwise Selection of Terms

Candidate terms: x1, x2, x3, x4

Terms | --------Step 1-------- | --------Step 2-------- | ||
---|---|---|---|---|

Coef | P | Coef | P | |

Constant | 57.42 | 52.58 | ||

x2 | 0.789 | 0.001 | 0.6623 | 0.000 |

x1 | 1.468 | 0.000 | ||

S | 9.07713 | 2.40634 | ||

R-sq | 66.63% | 97.87% | ||

R-sq(adj) | 63.59% | 97.44% | ||

R-sq(pred) | 55.74% | 96.54% | ||

Mallows' Cp | 142.49 | 2.68 |

\(\alpha\) to enter = 0.15, \(\alpha\) to remove = 0.15 At your request, the stepwise procedure included these terms in every module: x2

## Video Review

# Conduct a Lack of Fit Test

Conduct a Lack of Fit Test##
Minitab^{®}

## Procedure

- Select
**Stat**>>**Regression**>>**Regression ...**>>**Fit Regression Model**... - Specify the response and the predictor(s).
- Minitab automatically recognizes replicates of data and produces the Lack of Fit test with Pure error by default.
- Select
**OK**. The output will appear in the session window.

## Example

The data set bluegills.txt contains the lengths (in mm) and ages (in years) of *n* = 78 bluegill fish. Is there sufficient evidence to conclude that there is a lack of linear fit between *y* = *length* and *x* = *age* of bluegill fish?

## Minitab Dialog Box

### Sample Output

### Regression Analysis: length versus age

Analysis of Variance | |||||
---|---|---|---|---|---|

Source | DF | Adj SS | Adj MS | F-value | P-value |

Regression | 1 | 32966 | 32965.8 | 210.66 | 0.000 |

age | 1 | 32966 | 32965.8 | 210.66 | 0.000 |

Error | 76 | 11893 | 156.5 | ||

Lack-of-Fit |
4 | 3080 |
770.0 |
6.29 | 0.000 |

Pure Error | 72 | 8813 | 122.4 | ||

Total | 77 | 44859 |

## Video Review

# Conducting a Hypothesis Test for the Population Correlation Coefficient P

Conducting a Hypothesis Test for the Population Correlation Coefficient PThere is one more point we haven't stressed yet in our discussion about the correlation coefficient *r* and the coefficient of determination *r*^{2} — namely, the two measures summarize the strength of a linear relationship *in samples only*. If we obtained a different sample, we would obtain different correlations, different *r*^{2} values, and therefore potentially different conclusions. As always, we want to *draw conclusions about populations*, not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient *ρ* (the Greek letter "rho").

Incidentally, where does this topic fit in among the four regression analysis steps?

- Model formulation
- Model estimation
- Model evaluation
**Model use**

It's a situation in which we *use the model* to answer a specific research question, namely whether or not a linear relationship exists between two quantitative variables

In general, a researcher should use the hypothesis test for the population correlation *ρ* to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.

We previously learned that to evaluate whether or not a linear relationship exists between skin cancer mortality and latitude, we can perform either of the following tests:

*t*-test for testing*H*_{0}:*β*_{1}= 0- ANOVA
*F*-test for testing*H*_{0}:*β*_{1}= 0

That's because it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response. Suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age? In this case, one could treat the husband's age as the response:

`Pearson correlation of HAge and WAge = 0.939`

or one could treat wife's age as the response:

`Pearson correlation of HAge and WAge = 0.939`

In cases such as these, we answer our research question concerning the existence of a linear relationship by using the** t-test for testing the population correlation coefficient **

*H*

_{0}:

*ρ*= 0.

Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient *ρ*. First, we specify the null and alternative hypotheses:

Null hypothesisH_{0}:ρ= 0Alternative hypothesisH_{A}:ρ≠ 0 orH_{A}:ρ< 0 orH_{A}:ρ> 0

Second, we calculate the value of the test statistic using the following formula:

Test statistic: \(t^*=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\)

Third, we use the resulting test statistic to calculate the *P*-value. As always, the *P*-value is the answer to the question "how likely is it that we’d get a test statistic *t** as extreme as we did if the null hypothesis were true?" The *P*-value is determined by referring to a *t-*distribution with *n*-2 degrees of freedom.

Finally, we make a decision:

- If the
*P*-value is smaller than the significance level α, we reject the null hypothesis in favor of the alternative. We conclude "there is sufficient evidence at the α level to conclude that there is a linear relationship in the population between the predictor*x*and response*y*." - If the
*P*-value is larger than the significance level α, we fail to reject the null hypothesis. We conclude "there is not enough evidence at the α level to conclude that there is a linear relationship in the population between the predictor*x*and response*y*."

Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on *n* = 170 couples is *r* = 0.939. To test *H*_{0}: *ρ* = 0 against the alternative *H*_{A}: *ρ* ≠ 0, we obtain the following test statistic:

\[t^*=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}}=\frac{0.939\sqrt{170-2}}{\sqrt{1-0.939^2}}=35.39\]

To obtain the *P*-value, we need to compare the test statistic to a *t*-distribution with 168 degrees of freedom (since 170 - 2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a two-sided test, multiply the probability by 2. Minitab helps us out here:

Student's t distribution with 168 DF | |
---|---|

x | P ( X <= x ) |

35.3900 | 1.0000 |

The output tells us that the probability of getting a test statistic smaller than 35.39 is greater than 0.999. Therefore, the probability of getting a test statistic greater than 35.39 is less than 0.001. As illustrated in this , we multiply by 2 and determine that the *P*-value is less than 0.002. Since the *P*-value is small — smaller than 0.05, say — we can reject the null hypothesis. There is sufficient statistical evidence at the α = 0.05 level to conclude that there is a significant linear relationship between a husband's age and his wife's age.

Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:

Pearson correlation of WAge and HAge= 0.939 | |
---|---|

P-Value = 0.000 |

It should be noted that the three hypothesis tests we learned for testing the existence of a linear relationship — the *t*-test for *H*_{0}: *β*_{1}= 0, the ANOVA *F*-test for *H*_{0}: *β*_{1}= 0, and the *t*-test for *H*_{0}: *ρ* = 0 — will always yield the same results. For example, if we treat the husband's age ("HAge") as the response and the wife's age ("WAge") as the predictor, each test yields a *P*-value of 0.000... < 0.001:

The regression equation is HAge= 3.59 + 0.967 WAge 170 cases used 48 cases contain missing values |
|||||
---|---|---|---|---|---|

Predictor | Coef | SE Coef | T | P | |

Constant | 3.590 | 1.159 | 3.10 | 0.002 | |

WAge | 0.96670 | 0.02742 | 35.25 | 0.000 | |

S = 4.069 | R-Sq = 88.1% | R-sq(adj) = 88.0% | |||

Analysis of Variance | |||||

Source | DF | SS | MS | F | P |

Regression | 1 | 20577 | 20577 | 1242.51 | 0.000 |

Error | 168 | 2782 | 17 | ||

Total | 169 | 23359 | |||

Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000 |

And similarly, if we treat the wife's age ("WAge") as the response and the husband's age ("HAge") as the predictor, each test yields of *P*-value of 0.000... < 0.001:

The regression equation is WAge= 1.57 + 0.911 HAge 170 cases used 48 cases contain missing values |
|||||
---|---|---|---|---|---|

Predictor | Coef | SE Coef | T | P | |

Constant | 1.574 | 1.150 | 1.37 | 0.173 | |

WAge | 0.91124 | 0.02585 | 35.25 | 0.000 | |

S = 3.951 | R-Sq = 88.1% | R-sq(adj) = 88.0% | |||

Analysis of Variance | |||||

Source | DF | SS | MS | F | P |

Regression | 1 | 19396 | 19396 | 1242.51 | 0.000 |

Error | 168 | 2623 | 17 | ||

Total | 169 | 22019 | |||

Pearson correlation of WAge and HAge = 0.939 P-Value = 0.000 |

Technically, then, it doesn't matter what test you use to obtain the *P*-value. You will always get the same *P*-value. But, you should report the results of the test that make sense for your particular situation:

- If one of the variables can be clearly identified as the response, report that you conducted a
*t*-test or*F*-test results for testing*H*_{0}:*β*_{1}= 0. (Does it make sense to use*x*to predict*y*?) - If it is not obvious which variable is the response, report that you conducted a
*t*-test for testing*H*_{0}:*ρ*= 0. (Does it only make sense to look for an association between*x*and*y*?)

One final note ... as always, we should clarify when it is okay to use the *t*-test for testing *H*_{0}: *ρ* = 0? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:

- When it is not obvious which variable is the response.
- When the (
*x*,*y*) pairs are a random sample from a bivariate normal population.- For each
*x*, the*y*'s are normal with equal variances. - For each
*y*, the*x*'s are normal with equal variances. - Either,
*y*can be considered a linear function of*x*. - Or,
*x*can be considered a linear function of*y*.

- For each
- The (
*x*,*y*) pairs are independent

# Create Interaction Variables

Create Interaction Variables##
Minitab^{®}
– Procedure

In order to enter interaction terms into a regression model in Minitab, you have to first create column(s) in the worksheet that contains the interaction term(s).

- Select
**Calc**>>**Calculator...** - In the box labeled
**Store the result in variable**, specify the column (or the name of the new variable,*x1x2*, for example) in which you want to store the interaction term. - In the box labeled
**Expression**, multiply the two predictor variables that go into the interaction terms. For example, if you want to create an interaction between*x1*and*x2*, use the calculator to multiply them together: '*x1*'*'*x2*'. - Select
**OK**. The new variable,*x1x2*, should appear in your worksheet.

##
Example

The data set birthsmokers.txt contains data on the birthweight (*y* = *Wgt*), gestation length (*x*_{1} = *Gest*) and (*x*_{2} = *Smoke*, 1 if mother smoked, 0 if not) of babies born to 32 mothers. If you wanted to fit a multiple regression model that allowed interaction between gestation length and smoking, you'd first have to create a variable in your worksheet, *GestSmoke* say, that contained the interaction term. Use Minitab's calculator to create the interaction term in your worksheet.

## Minitab Dialog Box

### Sample of Resulting Minitab Worksheet

## Video Review

# Create Residual Plots

Create Residual Plots##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Regression**...**>>****Fit Regression Model ...** - Specify the response and the predictor(s).
- Under
**Graphs...**- Under
**Residuals for Plots**, select either**Regular**or**Standardized**. - Under
**Residuals Plots**, select the desired types of residual plots. If you want to create residuals vs. predictor plot, specify the predictor variable in the box labeled**Residuals versus the variables**. - Select
**OK**.

- Under
- Select
**OK**. The standard regression output will appear in the session window, and the residual plots will appear in new windows.

##
Example

The data set bluegills.txt contains the lengths (in mm) and ages (in years) of *n* = 78 bluegill fish. Treating *y* = length as the response and *x* = age as the predictor, request a normal plot of the standardized residuals and a standardized residuals vs. fits plot.

## Minitab dialog boxes

### Sample output

## Video Review

# Create a Basic Scatter Plot

Create a Basic Scatter Plot##
Minitab^{®}
– Procedure

The basic "scatter plot" command creates a simple scatter plot of a response variable *y* against a predictor variable *x*.

- Select
**Graph**>>**Scatterplot ...** - Select the graph type "
**Simple**." - Specify your
**Y variable**and your**X variable**in the boxes provided. - Select
**OK**. A new window containing the scatter plot will appear.

##
Example

*Sports Illustrated* published results of a study designed to determine how well professional golfers putt. The data set puttgolf.txt contains data on the lengths of putts and the percentage of successful putts made by professional golfers during 15 tournaments. Only putts that were 2 to 20 feet from the hole are included in the data set.

What do the plot of *y* = *success* and *x* = *length* suggest about the relationship between the two variables?

## Minitab Dialog Box

### Minitab Sample Plot

## Video Review

# Create a Fitted Line Plot

Create a Fitted Line Plot##
Minitab^{®}
– Procedure

The "fitted line plot" command is one way of obtaining the estimated regression function between a response *y* and a predictor *x*. The "fitted line plot" command provides not only the estimated regression function but also a scatter plot of the data adorned with the estimated regression function.

- Select
**Stat**>>**Regression**>>**Fitted Line Plot...** - In the box labeled "
**Response (Y)**", specify the desired response variable. - In the box labeled "
**Predictor (X)**", specify the desired predictor variable. - Select
**OK**. A new window containing the fitted line plot will appear.

##
Example

*Sports Illustrated* published results of a study designed to determine how well professional golfers putt. The data set puttgolf.txt contains data on the lengths of putts and the percentage of successful putts made by professional golfers during 15 tournaments. Only putts that were 2 to 20 feet from the hole are included in the data set.

What is the estimated linear relationship between *y* = *success* and *x* = *length*?

## Minitab Dialog Box

### Sample Output

## Video Review

# Create a Fitted Line Plot with Confidence and Prediction Bands

Create a Fitted Line Plot with Confidence and Prediction Bands##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Fitted line plot...** - Specify the response and the predictor.
- Select
**Options...**Under**Display Options**, select**Display confidence interval**and select**Display prediction interval**. Specify the desired confidence level — 95% is the default. Select**OK**. - Select
**OK**. A new window containing the fitted line plot will appear.

##
Example

For people of the same age and gender, height is often considered a good predictor of weight. The data set htwtmales.txt contains the heights (*ht*, in cm) and weights (*wt*, in kg) of a sample of 14 males between the ages of 19 and 26 years.

- Find a 95% prediction band for the weight of a randomly selected male, aged 19 to 26.
- Find a 95% confidence band for the average weight of all males, aged 19 to 26.

## Minitab Dialog Boxes

### Sample Minitab Output Plot

## Video Review

# Create a Simple Matrix of Scatter Plots

Create a Simple Matrix of Scatter Plots##
Minitab^{®}
– Procedure

Creating a matrix of scatter plots between a set of variables is a good way to visualize the relationship between each pair of variables.

- Select
**Graph**>>**Matrix plot**... - Under Matrix of plots, select the
**Simple**plot. - In the box labeled
**Graph variables**, specify the variables you want to include in your plot. - Select
**OK**. A new graph window should appear containing the scatter plot matrix.

## Example

Using the dataset iqsize.txt, create a matrix of scatter plots between each pair of the four variables.

### Minitab dialog boxes

#### Resulting scatter plot matrix

## Video Review

# Creating a Correlation Matrix

Creating a Correlation Matrix##
Minitab^{®}
– Procedure (v.16 & v.17)

- Select
**Stat**>>**Basic statistics**>>**Correlation...** - In the box labeled
**Variables**, specify the two (or more) variables for which you want the correlation coefficient(s) calculated. - If you would like a
*P*-value so that you can test that each population correlation is 0, put a checkmark in the box labeled**Display p-values**by clicking once on the box. - Select
**OK**. The output will appear in the session window.

##
Example

Using the iqsize.txt data set, estimate the correlations among each pair of the four variables.

## Minitab Dialog Box

### Resulting Minitab Output

### Correlations: PIQ, Brain, Height, Weight

PIQ | Brain | Height | |
---|---|---|---|

Brain | 0.378 | ||

0.019 | |||

Height | -0.093 | 0.588 | |

0.578 | 0.000 | ||

Weight | 0.003 | 0.513 | 0.700 |

0.988 | 0.001 | 0.000 | |

P-Value

# Display Data

Display Data##
Minitab^{®}
– Procedure

- In Minitab, select
**Data >> Display Data...** - In the box labeled
**Columns, constants, and matrices to display**, specify the variables that you would like displayed. - Select
**OK**. The data will be displayed in the session window.

##
Example

Display the data contained in the adaptive.txt data set.

## Minitab dialog box

### Sample output

### Results for: Worksheet 2

Data Display

Row | case | age | score |
---|---|---|---|

1 | 1 | 15 | 95 |

2 | 2 | 26 | 71 |

3 | 3 | 10 | 83 |

4 | 4 | 9 | 91 |

5 | 5 | 15 | 102 |

6 | 6 | 20 | 87 |

7 | 7 | 18 | 93 |

8 | 8 | 11 | 100 |

9 | 9 | 8 | 104 |

10 | 10 | 20 | 94 |

11 | 11 | 7 | 113 |

12 | 12 | 9 | 96 |

13 | 13 | 10 | 83 |

14 | 14 | 11 | 84 |

15 | 15 | 11 | 102 |

16 | 16 | 10 | 100 |

17 | 17 | 12 | 105 |

18 | 18 | 42 | 57 |

19 | 19 | 17 | 121 |

20 | 20 | 11 | 86 |

21 | 21 | 10 | 100 |

## Video Review

# Find a Confidence Interval and a Prediction Interval for the Response

Find a Confidence Interval and a Prediction Interval for the Response##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Regression****>>****Fit Regression Model ...** - Specify the response and the predictor(s).
- Select
**OK**. The output will appear in the session window.

Next, back up to the Main Menu *having just run this regression*:

- Select
**Stat**>>**Regression**>>**Regression >> Predict ...** - Specify the response.
- Specify either the
*x*value ("**Enter individual values**") or a column name ("**Enter columns of values**") containing multiple*x*values. - Select
**Options**... Specify the**Confidence level**— the default is 95%. Select**OK**. - Select
**OK**. The output will appear in the session window.

##
Example

For people of the same age and gender, height is often considered a good predictor of weight. The data set htwtmales.txt contains the heights (*ht*, in cm) and weights (*wt*, in kg) of a sample of 14 males between the ages of 19 and 26 years.

- Find a 95% prediction interval for the weight of a randomly selected male, aged 19 to 26, who is 170 centimeters tall.
- Find a 95% confidence interval for the average weight of all males, aged 19 to 26, who are 170 centimeters tall.

## Minitab Dialog Boxes

### Resulting Sample Minitab Output

### Prediction for wt

Regression Equation

wt = -60.6 + 0.755 ht

Variable | Setting | no heading | no heading | ||
---|---|---|---|---|---|

ht | 170 | ||||

Fit | SE Fit | 95% CI | 95%PI | ||

67.6694 | 2.83819 | (61.4855, 73.8533) | (47.0975, 88.2413) |

## Video Review

# Find a t Critical Value

Find a t Critical Value##
Minitab^{®}
– Procedure

You may need to find a *t* critical value if you are using the critical value approach to conduct a hypothesis test that uses a *t*-statistic.

- Select
**Calc**>>**Probability Distributions**>>**t**... - Click the button labeled '
**Inverse cumulative probability**'. (Ignore the box labeled '**Noncentrality parameter**'. That is, leave the default value of 0 as is.) - Type in the number of degrees of freedom in the box labeled '
**Degrees of Freedom**'. - Click the button labeled '
**Input Constant**'. In the box, type the cumulative probability for which you want to find the associated*t*-value. - Select OK. The
*t-*value will appear in the session window.

##
Example

The US National Research Council currently recommends that females between the ages of 11 and 50 intake 15 milligrams of iron daily.

Is there evidence that the population of American females is, on average, getting less than the recommended 15 mg of iron? That is, should we reject the null hypothesis H_{0}: μ = 15 against the alternative H_{A}: μ < 15?

The iron intakes (irondef.txt) of a random sample of 25 such American females yielded a *t*-statistic of -1.48.

If we were interested in calculating the test at the α = 0.05 level, what is the appropriate *t*-critical value to which we should compare the *t*-statistic?

## Minitab Dialog Box

### Sample Minitab Output

### Inverse Cumulative Distribution Function

Student's t distribution with 24 DF

P ( X ≤ x) | x |
---|---|

0.05 | -1.71088 |

## Video Review

# Find a t-based P-value

Find a t-based P-value##
Minitab^{®}
– Procedure

You may need to find a P-value if you are using the *P*-value approach to conduct a hypothesis test that uses a *t*-statistic.

- Select
**Calc**>>**Probability Distributions**>>**t**... - Click the button labeled '
**Cumulative probability**'. - Type the number of degrees of freedom in the box labeled '
**Degrees of freedom**'. - Click the button labeled '
**Input constant**'. In the box, type the test statistic for which you want to find the associated cumulative probability. - Select OK. The probability that a t-distributed random variable with this number of degrees of freedom is less than or equal to the test statistic will appear in the session window.
- The
*P*-value is this probability for a lower-tail test or one minus this probability for an upper-tail test. For a two-tail test multiply the one-tail probability by two.

##
Example

The US National Research Council currently recommends that females between the ages of 11 and 50 intake 15 milligrams of iron daily.

Is there evidence that the population of American females is, on average, getting less than the recommended 15 mg of iron? That is, should we reject the null hypothesis H0: μ = 15 against the alternative HA: μ < 15?

The iron intakes (irondef.txt) of a random sample of 25 such American females yielded a t-statistic of -1.48.

If we were interested in calculating the test at the α = 0.05 level, what is the appropriate *P*-value to which we should compare the *t*-statistic?

## Minitab Dialog Box

### Sample Minitab Output

### Cumulative Distribution Function

Student's t distribution with 24 DF

x | P ( X ≤ x) |
---|---|

-1.48 | 0.0759408 |

## Video Review

# Find an F Critical Value

Find an F Critical Value##
Minitab^{®}
– Procedure

You may need to find an *F* critical value if you are using the critical value approach to conduct a hypothesis test that uses an *F*-statistic.

- Select
**Calc**>>**Probability Distributions**>>**F**... - Click the button labeled
**Inverse cumulative probability**. (Ignore the box labeled**Noncentrality parameter**. That is, leave the default value of 0.0 as is.) - Type in the number of numerator degrees of freedom in the box labeled
**Numerator degrees of freedom**. - Type in the number of denominator degrees of freedom in the box labeled
**Denominator degrees of freedom**. - Click the button labeled
**Input Constant**. In the box, type the cumulative probability for which you want to find the associated*F*-value. - Select
**OK**. The*F-*value will appear in the session window.

##
Example

Some researchers at UCLA conducted a study on cyanotic heart disease in children. They measured the age at which the child spoke his or her first word (*x*, in months) and the Gesell adaptive score (*y*) on a sample of 21 children.

Is there evidence of a relationship between age at first word and Gesell adaptive score? That is, should we reject the null hypothesis *H*0: *β*1 = 0 against the alternative hypothesis *H*A: *β*1 ≠ 0 at the 0.05 level? The resulting data (adaptive.txt) yield an ANOVA *F*-statistic of 13.20.

## Minitab Dialog Box

Because the *F*-test is large regardless of whether the population slope is positive or negative, the *F*-test is always a one-sided test. Therefore, because we want to conduct the hypothesis test at the 0.05 level, the appropriate **cumulative probability** to enter is 0.95. The number of **numerator degrees of freedom** is always 1 for a simple linear regression model with one predictor. Because there are 21 measurements in the sample, the appropriate number of **denominator degrees of freedom** is 19. Therefore, your Minitab dialog box should look like:

### Sample Minitab Output

In this case, Minitab tells us that the *F*-critical value is:

### Inverse Cumulative Distribution Function

F distribution with 1 DF in numerator and 19 DF in denominator

P ( X ≤ x) | x |
---|---|

0.95 | 4.38075 |

## Video Review

# Find an F-based P-value

Find an F-based P-value##
Minitab^{®}
– Procedure

You may need to find a *P*-value if you are using the *P*-value approach to conduct a hypothesis test that uses an *F*-statistic.

- Select
**Calc**>>**Probability Distributions**>>**F**... - Click the button labeled
**Cumulative probability**. (Leave the non centrality parameter set as the default of 0.) - Type the number of numerator degrees of freedom in the box labeled
**Numerator d****egrees of freedom**, and type the number of the denominator degrees of freedom in the box labeled**Denominator degrees of freedom**. - Click the button labeled
**Input constant**. In the box, type the value of your*F*-statistic for which you want to find the associated cumulative probability. - Select
**OK**. The cumulative probability will appear in the session window. The*P*-value is 1 minus the reported cumulative probability.

##
Example

The coolhearts.txt data set contains the following data on 32 rabbits subjected to a heart attack:

*y*_{i}is the size of the infarcted area (in grams) of rabbit*i**x*_{i1}is the size of the region at risk (in grams) of rabbit*i**x*_{i2}= 1 if early cooling of rabbit*i*, 0 if not*x*_{i3}= 1 if late cooling of rabbit*i*, 0 if not

It can be shown that the partial *F*-statistic for testing *H*_{0} : *β*_{2} = *β*_{3} = 0 is 8.59 with 2 numerator and 28 denominator degrees of freedom. Find the *F*-based *P*-value so that you can draw a conclusion about the hypothesis.

## Minitab Dialog Box

### Sample Minitab Output

### Cumulative Distribution Function

F distribution with 1 DF in numerator and 28 DF in denominator

x | P ( X ≤ x) |
---|---|

8.59 | 0.998767 |

The *P*-value is therefore 1 - 0.9988 or 0.0012.

## Video Review

# Generate Random Normally Distributed Data

Generate Random Normally Distributed Data##
Minitab^{®}
– Procedure

Minitab can be used to generate random data. In this example, we use Minitab to create a random set of data that is normally distributed.

- Select
**Calc**>>**Random Data**>>**Normal****...** - In the box labeled
**Generate ... rows of data**, type in the number of rows of data that you would like to generate. - In the box labeled
**Store in Column(s):**, enter the column name(s) where you want Minitab to store the data. - In the boxes labeled
**Mean:**and**Standard deviation:**type in the mean and standard deviation of your desired normal distribution. The default is the standard normal distribution with mean = 0 and standard deviation = 1. - Select
**OK**. The new data will appear in the worksheet window.

##
Example

First, generate a column of 200 random numbers from a standard normal distribution with a mean of 0 and a standard deviation of 1. Then, generate 20 more columns, each containing 200 random numbers from a standard normal distribution with a mean of 0 and a standard deviation of 1.

## Minitab Dialog Boxes

First option - one column**:**

Second method - multiple columns:

### Resulting Minitab Worksheet

Based on the first dialog box above, one column (C1) of (standard) normally distributed data appears in the worksheet:

...

Results based on specifying 20 columns (C1-C20) of (standard) normally distributed data as it will appear in the worksheet:

## Video Review

# Obtain a Sample Correlation

Obtain a Sample Correlation##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Basic statistics**>>**Correlation ...** - Specify the two (or more) variables for which you want the correlation coefficient(s) calculated.
- Pearson correlation is the default. An optional Spearman rho method is also available.

- If it isn't already checked, put a checkmark in the box labeled
**Display p-values**by clicking once on the box. - Select
**OK**. The output will appear in the session window.

##
Example

For people of the same age and gender, height is often considered a good predictor of weight. The data set htwtmales.txt contains the heights (*ht*, in cm) and weights (*wt*, in kg) of a sample of 14 males between the ages of 19 and 26 years.

- What is the sample correlation coefficient between
*ht*and*wt*? - Is there sufficient evidence to conclude that the population correlation coefficient between
*ht*and*wt*is significantly different from 0?

## Minitab Dialog Box

### Resulting Sample Minitab Output

### Correlation: wt, ht

Pearson correlation of wt and ht = 0.689

P-Value = 0.006

## Video Review

# Perform a Basic Regression Analysis

Perform a Basic Regression Analysis##
Minitab^{®}
– Procedure

The "basic regression analysis" command outputs:

- the estimated regression function
- a table of estimated coefficients (Coef), which also includes standard errors of the coefficients (SE Coef), and
*t*-statistics (T) and*P*-values (P) for testing the parameters differ from 0 - the coefficient of determination
*r*^{2} - the analysis of variance table
- a table of unusual observations

- Select
**Stat**>>**Regression**>>**Regression****>>****Fit Regression Model ...** - In the box labeled "
**Response**", specify the desired response variable. - In the box labeled "
**Predictors**", specify the desired predictor variable. - Select
**OK**. The basic regression analysis output will be displayed in the session window.

### Regression Through the Origin

To fit an RTO model click "Model" in the regular regression window and uncheck "Include the constant term in the model".

##
Example

*Sports Illustrated* published results of a study designed to determine how well professional golfers putt. The data set puttgolf.txt contains data on the lengths of putts and the percentage of successful putts made by professional golfers during 15 tournaments. Only putts that were 2 to 20 feet from the hole are included in the data set.

Is there a significant linear relationship between the response *y* = *success* and the predictor *x* = *length*?

## Minitab Basic Regression Analysis Dialog Box

### Sample Output

### Regression Analysis: success versus length

Analysis of Variance | |||||
---|---|---|---|---|---|

Source | DF | Adj SS | Adj MS | F-Value | P-Value |

Regression | 1 | 9529 | 9529.29 | 113.28 | 0.000 |

length | 1 | 9529 | 9529.29 | 113.28 | 0.000 |

Error | 17 | 1430 | 84.12 | ||

Total | 18 | 10959 | |||

Model Summary | |||||

S | R-sq | R-sq (adj) | R-sq(pred) | ||

9.17166 | 86.95% | 86.18% | 82.51% | ||

Regression Equation success = 83.61 - 4.089 length |
|||||

Fits and Diagnostics for Unusual Observations | |||||

Obs | success | Fit | Resid | Std Resid |
R |

1 | 93.30 | 75.43 | 17.87 | 2.17 | |

R Large residual |

## Video Review

# Perform a Linear Regression Analysis

Perform a Linear Regression Analysis##
Minitab^{®}

- Select
**Stat**>>**Regression**>>**Regression**>>**Fit Regression Model ...** - Specify the response and the predictor(s).
- (For standard residual plots) Under
**Graphs**..., select the desired residual plots. - Minitab automatically recognizes replicates of data and produces the Lack of Fit test with Pure error by default.
- Select
**OK**.

Next, back up to the Main Menu having just run this regression:

- (To get a prediction interval) Select
**Stat**>>**Regression**>>**Regression**>>**Predict**... - Specify the response.
- Specify either the x value ("
**Enter individual values**") or a column name ("**Enter columns of values**") containing multiple x values. - Select
**Options**... Specify the**Confidence level**— the default is 95%. Select**OK**. - Select
**OK**. The output will be displayed in the session window.

### Regression Through the Origin

To fit an RTO model click "**Model**" and uncheck "Include the constant term in the model".

##
Example

The iqsize.txt data set contains data on the IQ (*y* = *PIQ*), brain size (*x*_{1} = *Brain*), height (*x*_{2} = *Height*), and weight (*x*_{3} = *Weight*) of *n* = 38 college students. Fit the multiple linear regression model treating *PIQ* as the response, and *Brain*, *Height*, and *Weight* as the predictors. In doing so, request a lack of fit test. Also, with 95% confidence, predict the *PIQ* of a randomly selected college student whose *Brain* = 90, *Height* = 70 and *Weight* = 150.

## Minitab Dialog Boxes

### Resulting Minitab Output

### Regression Analysis: PIQ versus Brain, Height, Weight

Analysis of Variance | |||||
---|---|---|---|---|---|

Source | DF | Adj SS | Adj MS | F-Value | P-Value |

Regression | 3 | 5572.7 | 1857.58 | 4.74 | 0.007 |

Brain | 1 | 5239.2 | 5239.23 | 13.37 | 0.001 |

Height | 1 | 1934.7 | 1934.71 | 4.94 | 0.033 |

Weight | 1 | 0.0 | 0.0 | 0.00 | 0.998 |

Error | 34 | 13321.8 | 391.82 | ||

Total | 37 | 18894.6 | |||

Model Summary | |||||

S | R-sq | R-sq (adj) | R-sq(pred) | ||

19.7944 | 29.49% | 23.27% | 12.76% | ||

Coefficients | |||||

Term | Coef | SE Coef | T-Value | P-Value | VIF |

Constant | 111.4 | 63.0 | 1.77 | 0.086 | |

Brain | 2.060 | 0.563 | 3.66 | 0.001 | 1.58 |

Height | -2.73 | 1.23 | -2.22 | 0.033 | 2.28 |

Weight | 0.001 | 0.197 | 0.00 | 0.998 | 2.02 |

Regression Equation PIQ = 111.4 + 2.060 Brain - 2.73 Height + 0.001 Weight |
|||||

Fits and Diagnostics for Unusual Observations | |||||

Obs | PIQ | Fit | Resid | Std Resid |
R |

13 | 147.00 | 95.31 | 51.69 | 2.72 | |

R Large residual | |||||

Prediction for PIQ | |||||

Regression Equation PIQ = 111.4 + 2.060 Brain - 2.73 Height + 0.001 Weight |
|||||

Variable | Setting | no heading | |||

Brain | 90 | ||||

Height | 70 | ||||

Fit | SE Fit | 95% CI | 95% PI | ||

105.636 | 3.90554 | (97.6986, 113.573) | (64.6330, 146.638) |

## Video Review

# Perform a t-test for a Population Mean µ

Perform a t-test for a Population Mean µ##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Basic Statistics**>>**1 Sample t**... - If it is not already done so, use the pull-down options to select, '
**Samples in columns**'. - Select the variable you want to analyze by clicking or by highlighting and clicking once on '
**Select**', so it appears in the box labeled '**Samples in columns**'. - In the box labeled '
**Test mean**', type the assumed value of the mean under the null hypothesis. - Select
**Options**... (Ignore the box labeled '**Confidence level**'.) For the box labeled '**Alternative**', use the pull-down options to select the direction of the alternative hypothesis (less than, not equal, greater than). - Select OK.
- Select OK. The output will appear in the session window.

##
Example

The US National Research Council currently recommends that females between the ages of 11 and 50 intake 15 milligrams of iron daily. The iron intakes of a random sample of 25 such American females are found in the dataset irondef.txt. Is there evidence that the population of American females is, on average, getting less than the recommended 15 mg of iron? That is, should we reject the null hypothesis H_{0}: μ = 15 against the alternative H_{A}: μ < 15? Using Minitab, determine a 95% confidence interval for μ, the mean iron intake of all women in the population.

## Minitab Dialog Boxes

### Sample Minitab Output

##### One-Sample T: iron

Test of \(\mu=15 \text { vs }<15\)

Variable | N | Mean | StDev | SE Mean | 95% Upper Bound | T | P |
---|---|---|---|---|---|---|---|

iron | 25 | 14.300 | 2.367 | 0.473 | 15.110 | -1.48 | 0.076 |

## Video Review

# Randomly Sample Data with Replacement from Columns

Randomly Sample Data with Replacement from Columns##
Minitab^{®}
– Procedure

Random sampling from a data set allows one to analyze a subset of the data rather than the entire data set. When you randomly sample "with replacement," you allow the same data point to be selected more than once. Sampling as such helps to ensure that the selected data points are independent.

- Select
**Calc**>>**Random data**>>**Sample from columns...** - In the box labeled "
**From columns:**", specify the number of data points you want to sample. - In the larger box under the "
**Sample ... rows from columns**" label, specify from which (two) columns you want to sample. - In the box labeled "
**Store samples in...**", specify two unused columns to store your selected data points. - Select (put a checkmark in) the box labeled "
**Sample with replacement**." - Select
**OK**. The randomly sampled data points will appear in the worksheet.

##
Example

*Sports Illustrated* published results of a study designed to determine how well professional golfers putt. The data set puttgolf.txt contains data on the lengths of putts and the percentage of successful putts made by professional golfers during 15 tournaments. Only putts that were 2 to 20 feet from the hole are included in the data set.

Randomly sample 5 golfers (with replacement) from the data set.

## Minitab Sample from Columns Dialog Box

### Sample Random Selection of 5 Data Points

## Video Review

# Split the Worksheet Based on the Value of a Variable

Split the Worksheet Based on the Value of a Variable##
Minitab^{®}
– Procedure

- Select
**Data**>>**Split Worksheet...** - In the box labeled
**By**variables, specify the variable based on which you want the worksheet to be split. - Select
**OK**. The new worksheets, based on the original worksheet, will appear.

##
Example

A laboratory tested the relationship between operating cost per mile (*y* = *cost*) and cruising speed (*x* = *speed*) for two different makes (0, 1) of truck tires. The resulting data are stored in tiretesting.txt (Neter, Kutner, *et al*, 1996, p. 493). Split the worksheet into two worksheets based on the value of the variable *make*.

## Minitab Dialog Box

### Resulting Sample Minitab Output

Worksheet is split into two worksheets; one for each make of truck.

## Video Review

# Store Residuals, Leverages, and Influence Measures

Store Residuals, Leverages, and Influence Measures##
Minitab^{®}
– Procedure

- Select
**Stat**>>**Regression**>>**Regression**>>**Fit Regression Model ...** - Specify the response and the predictor variable(s).
- Select
**Storage...**Under**Diagnostic Measures**, select the type of residuals (and/or influence measures) that you want to be stored. Select**OK**. - Select
**OK**. The requested residuals (and/or influence measures) will be stored in your worksheet.

##
Example

The data set adaptive.txt contains the Gesell adaptive scores and ages (in months) of *n* = 21 children with cyanotic heart disease. Upon regressing the response *y* = score on the predictor *x *= age, store the resulting standardized residuals in the worksheet.

## Minitab Dialog Boxes