Perhaps somewhere along the way in our most recent discussion, you thought "why not just fit two separate regression functions — one for the smokers and one for the nonsmokers?" (If you didn't think of it, I thought of it for you!) Are there advantages to including both the binary and quantitative predictor variables within one multiple regression model? The answer is yes! In this section, we explore the two primary advantages.
The first advantage
An easy way of discovering the first advantage is to analyze the data three times — once using the data on all 32 subjects, using the data on only the 16 nonsmokers, and once using the data on only the 16 smokers. Then, we can investigate the effects of the different analyses on important things such as the sizes of standard errors of the coefficients and the widths of confidence intervals. Let's try it!
Here's the Minitab output for the analysis using a (0,1) indicator variable and the data on all 32 subjects. Let's just run through the output and collect information on various values obtained:
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2390  349  6.84  0.000  
Gest  143.10  9.13  15.68  0.000  1.06 
Smoke  244.5  42.0  5.83  0.000  1.06 
Regression Equation
Wgt = 2390 + 143.10 Gest  244.5 SmokeThe standard error of the Gest coefficient is 9.13. Recall that this value quantifies how much the estimated Gest coefficient would vary from sample to sample. And, the following output:
Variable Setting
Gest  38 

Smoke  1 
Fit  SE Fit  95% CI  95% PI 

2803.69  30.8496  (2740.60, 2866.79)  (2559.13, 3048.26) 
Variable Setting
Gest  38 

Smoke  0 
Fit  SE Fit  95% CI  95% PI 

3048.24  28.9051  (2989.12, 3107.36)  (2804.67, 3291.81) 
tells us that for mothers with a 38week gestation, the width of the confidence interval for the mean birth weight is 126.2 for smoking mothers and 118.2 for nonsmoking mothers.
Let's do that again, but this time for the Minitab output on just the 16 nonsmoking mothers:
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2546  457  5.57  0.000  
Gest_0  147.2  12.0  12.29  0.000  1.00 
Regression Equation
Wgt_0 = 2546 + 147.2 Gest_0The standard error of the Gest coefficient is 12.0. And:
Variable  Setting 

Gest_0  38 
Fit  SE Fit  95% CI  95% PI 

3047.72  26.7748  (2990.30, 3105.15)  (2811.30, 3284.15) 
for nonsmoking mothers with a 38week gestation, the width of the confidence interval for the mean birth weight is 114.9.
And, let's do the same thing one more time for the Minitab output on just the 16 smoking mothers:
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2475  554  4.447  0.001  
Gest_1  139.0  14.1  9.85  0.000  1.00 
Regression Equation
Wgt_1 = 2475 + 139.2 Gest_1The standard error of the Gest coefficient is 14.1. And:
Variable  Setting 

Gest_1  38 
Fit  SE Fit  95% CI  95% PI 

2808.53  35.8088  (2731.73, 2885.33)  (2526.39, 3090.67) 
for smoking mothers with a 38week gestation, the length of the confidence interval is 153.6.
Here's a summary of what we've gleaned from the three pieces of output:
Model estimated using… 
SE(Gest)

Width of CI for \(\mu_Y\) 

all 32 data points 
9.13

(NS) 118.2
(S) 126.2 
16 nonsmokers 
12.0

114.9

16 smokers 
14.1

153.6

Let's see what we learn from this investigation:
 The standard error of the Gest coefficient — SE(Gest) — is the smallest for the estimated model based on all 32 data points. Therefore, confidence intervals for the Gest coefficient will be narrower if calculated using the analysis based on all 32 data points. (This is a good thing!)
 The width of the confidence interval for the mean weight of babies born to smoking mothers is narrower for the estimated model based on all 32 data points (126.2 compared to 153.6), and not substantially different for nonsmoking mothers (118.2 compared to 114.9). (Another good thing!)
In short, there appears to be an advantage in "pooling" and analyzing the data all at once rather than breaking it apart and conducting different analyses for each group. Our regression model assumes that the slope for the two groups is equal. It also assumes that the variances of the error terms are equal. Therefore, it makes sense to use as much data as possible to estimate these quantities.
The second advantage
An easy way of discovering the second advantage of fitting one "combined" regression function using all of the data is to consider how you'd answer the research question if you broke apart the data and conducted two separate analyses obtaining:
Nonsmokers
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2546  457  5.57  0.000  
Gest_0  147.2  12.0  12.29  0.000  1.00 
Regression Equation
Wgt_0 = 2546 + 147.2 Gest_0Smokers
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2475  554  4.47  0.001  
Gest_1  139.0  14.1  9.85  0.000  1.00 
Regression Equation
Wgt_1 = 2475 + 139.0 Gest_1How could you use these results to determine if the mean birth weight of babies differs between smoking and nonsmoking mothers, after taking into account the length of gestation? Not completely obvious is it?! It actually could be done with much more (complicated) work than would be necessary if you analyze the data as a whole and fit one combined regression function:
Coefficients
Term  Coef  SE Coef  TValue  PValue  VIF 

Constant  2390  349  6.84  0.000  
Gest  143.10  9.13  15.68  0.000  1.06 
Smoke  244.5  42.0  5.83  0.000  1.06 
Regression Equation
Wgt = 2390 + 143.10 Gest  244.5 SmokeAs we previously discussed, answering the research question merely involves testing the null hypothesis \(H_0 \colon \beta_2 = 0\) against the alternative \(H_0 \colon \beta_2 \ne 0\). The Pvalue is < 0.001. There is sufficient evidence to conclude that there is a statistically significant difference in the mean birth weight of all babies of smoking mothers and the mean birth weight of all babies of nonsmoking mothers, after taking into account the length of gestation.
In summary, "pooling" your data and fitting one combined regression function allows you to easily and efficiently answer research questions concerning the binary predictor variable.