# Minitab Help 8: Categorical Predictors

### Birthweight and smoking (2-level categorical predictor, additive model)

• Create an indicator variable for Smoke by selecting Calc > Make Indicator Variables, move "Smoke" to the box labeled "Indicator variables for," and click "OK."
• Perform a linear regression analysis of Wgt on Gest + Smoke_yes. You can either put "Gest" and "Smoke_yes" in the box labeled "Continuous predictors" or, alternatively, put "Gest" in the box labeled "Continuous predictors" and "Smoke" in the box labeled "Categorical predictors." Either way, click "Storage" and select "Fits" before clicking "OK."
• Select Calc > Calculator, type "FITS0" in the box labeled "Store result in variable," type "if(Smoke="no",FITS)" in the box labeled "Expression," and click "OK." Repeat to create FITS1 as "if(Smoke="yes",FITS)."
• Create a basic scatterplot but select "With Groups" instead of "Simple." Plot "Wgt" vs "Gest" with "Smoke" as the "Categorical variable for grouping."
• To add parallel regression lines representing Smoke=0 and Smoke=1 to the scatterplot, select the scatterplot, select Editor > Add > Calculated Line, select "FITS0" for the "Y column" and "Gest" for the "X column." Repeat to add the "FITS1" line.
• To display confidence intervals for the regression parameters, click "Results" in the Regression Dialog and select "Expanded tables" for "Display of results."
• Find a confidence interval and a prediction interval for the response to display confidence intervals for expected Wgt at Gest=38 (for Smoke=1 and Smoke=0).
• Select Data > Split Worksheet to create separate worksheets for Smoke=0 and Smoke=1.
• To repeat analysis using (1, -1) coding, click "Coding" in the Regression Dialog and select "(-1, 0, +1)" as the "Coding for categorical predictors."

### Depression treatment (3-level categorical predictor, interaction model)

• Create a basic scatterplot but select "With Groups" instead of "Simple." Plot "y" (treatment effectiveness) vs "age" with "TRT" as the "Categorical variable for grouping."
• Perform a linear regression analysis of y on age (continuous) + TRT (categorical) but before clicking "OK," click "Model," select age and TRT together in the "Predictors" box, change "Interactions through order" to "2" and click "Add." You should see "age*TRT" appear in the box labeled "Terms in the model." Before clicking "OK," click "Coding," select "(1, 0)" for the "Coding for categorical predictors," and change the "Reference level" to "C."
• To add non-parallel regression lines representing each of the three treatments to the scatterplot do the following. Create a basic scatterplot but select "With Regression and Groups" instead of "Simple." Plot "y" (treatment effectiveness) vs "age" with "TRT" as the "Categorical variable for grouping."
• Create residual plots and select "Residuals versus fits" (with regular residuals).
• Conduct regression error normality tests and select Anderson-Darling.
• Click "Options" in the regression dialog to select Sequential (Type I) sums of squares to display an Anova table allowing calculation of F-statistic to see if at least one of x2, x3, age.x2, and age.x3 are useful (i.e., the regression functions differ).
• Use the same Anova table to calculate an F-statistic to see if at least one of age.x2 and age.x3 are useful (i.e., the regression functions have different slopes).

### Real estate air conditioning (2-level categorical predictor, interaction model, transformations)

• Perform a linear regression analysis of SalePrice on SqFeet and Air (both continuous) but before clicking "OK," click "Model," select SqFeet and Air together in the "Predictors" box, change "Interactions through order" to "2" and click "Add." You should see "SqFeet*Air" appear in the box labeled "Terms in the model."
• To display a scatterplot of SalePrice vs SqFeet with points marked by Air and non-parallel regression lines representing Air=0 and Air=1 do the following. Create a basic scatterplot but select "With Regression and Groups" instead of "Simple." Plot "SalePrice" vs "SqFeet" with "Air" as the "Categorical variable for grouping."
• Create residual plots and select "Residuals versus fits" (with regular residuals).
• Select Calc > Calculator to create log(SalePrice) and log(SqFeet) variables and repeat preceding instructions to fit a multiple linear regression model of log(SalePrice) on log(SqFeet) + Air + log(SqFeet).Air.
• Repeat preceding instructions to display a scatterplot of log(SalePrice) vs log(SqFeet) with points marked by Air and non-parallel regression lines representing Air=0 and Air=1.
• Create residual plots and select "Residuals versus fits" (with regular residuals).

### Hospital infection risk (4-level categorical predictor, additive model)

• Use Data > Subset Worksheet to select only hospitals with Stay < 14.
• Perform a linear regression analysis of InfctRsk on Stay and Xray (continuous) and Region (categorical) but before clicking "OK," click "Coding," select "(1, 0)" for the "Coding for categorical predictors," and change the "Reference level" to "1."
• The resulting Anova table displays an F-statistic to see if at least one of i2, i3, and i4 are useful (conclusion: the regression functions differ by region).
• To calculate an F-statistic to see if at least one of i2 and i3 are useful you'll need to first create indicator variables for Region by selecting Calc > Make Indicator Variables. To find the reduced model results, Perform a linear regression analysis of InfctRsk on Stay, Xray, and Region_4 (all continuous).