9.4 - Equal Slopes Model: Salary Example

9.4 - Equal Slopes Model: Salary Example

Using Technology

Using our Salary example and the data in the table below, we can run through the steps for the ANCOVA.

Females Males
Salary Years Salary Years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

 

  1. Step 1: Are all regression slopes = 0?

    A simple linear regression can be run for each treatment group, Males and Females. Running these procedures using statistical software we get the following:

    Males

    Use the following SAS code:

    data equal_slopes;
    input gender $ salary years;
    datalines;
    m 78 3
    m 43 1
    m 103 5
    m 48 2
    m 80 4
    f 80 5
    f 50 3
    f 30 2
    f 20 1
    f 60 4
    ;
    proc reg data=equal_slopes;
    where gender='m';
    model salary=years;
    title 'Males';
    run; quit;
    
    

    And here is the output that you get:

    The REG Procedure
    Mode1:: MODEL1
    Dependent Variable: salary

    Number of Observations Read 5
    Number of Observations Used 5

    Analysis of Variance

    Source DF Sum of Squares Mean Square F Value Pr > F
    Model 1 2310.40000 2310.40000 44.78 0.0068
    Error 3 154.80000 51.60000    
    Corrected Total 4 2465.20000      

    Females

    Use the following SAS code:

    data equal_slopes;
    input gender $ salary years;
    datalines;
    m 78 3
    m 43 1
    m 103 5
    m 48 2
    m 80 4
    f 80 5
    f 50 3
    f 30 2
    f 20 1
    f 60 4
    ;
    proc reg data=equal_slopes;
    where gender='f';
    model salary=years;
    title 'Females';
    run; quit;
    
    

    And here is the output for this run:

    The REG Procedure
    Mode1:: MODEL1
    Dependent Variable: salary

    Number of Observations Read 5
    Number of Observations Used 5

    Analysis of Variance

    Source DF Sum of Squares Mean Square F Value Pr > F
    Model 1 2250.00000 2250.00000 225.00 0.0006
    Error 3 30.00000 10.00000    
    Corrected Total 4 2280.00000      

    In both cases, the simple linear regressions are significant so the slopes are not = 0.

  2.  

  3. Step 2: Are the slopes equal?

    We can test for this using our statistical software.

    In SAS we now use proc mixed and include the covariate in the model.

    We will also include a ‘treatment × covariate’ interaction term and the significance of this term answers our question. If the slopes differ significantly among treatment levels, the interaction p-value will be < 0.05.

    data equal_slopes;
    input gender $ salary years;
    datalines;
    m 78 3
    m 43 1
    m 103 5
    m 48 2
    m 80 4
    f 80 5
    f 50 3
    f 30 2
    f 20 1
    f 60 4
    ;
    proc mixed data=equal_slopes method=type3;
    class gender;
    model salary = gender years gender*years;
    run;
    
    
    Note! In SAS, we specify the treatment in the class statement, indicating that these are categorical levels. By NOT including the covariate in the class statement, it will be treated as a continuous variable for regression in the model statement.

    The Mixed Procedure
     Type 3 Tests of Fixed Effects

    Effect Num DF Den DF F Value Pr > F
    years 1 6 148.06 <.0001
    gender 1 6 7.01 0.0381
    years*gender 1 6 0.01 0.9384
    Here we see that the slopes are equal and in a plot of the regressions, we see that the lines are parallel.
    plot

To obtain the plot in SAS, we can use the following SAS code:

ods graphics on;
proc sgplot data=equal_slopes;
styleattrs datalinepatterns=(solid);
reg y=salary x=years / group=gender;
run;

  1.  
  2. Step 3: Fit an Equal Slopes Model

    We can now proceed to fit an Equal Slopes model by removing the interaction term. Again, we will use our statistical software SAS.

    data equal_slopes;
    input gender $ salary years;
    datalines;
    m 78 3
    m 43 1
    m 103 5
    m 48 2
    m 80 4
    f 80 5
    f 50 3
    f 30 2
    f 20 1
    f 60 4
    ;
    proc mixed data=equal_slopes method=type3;
    class gender;
    model salary = gender years;
    lsmeans gender / pdiff adjust=tukey;
    /* Tukey unnecessary with only two treatment levels */
    title 'Equal Slopes Model';
    run;
    
    

    We obtain the following results:

    The Mixed Procedure
     Type 3 Tests of Fixed Effects

    Effect Num DF Den DF F Value Pr > F
    years 1 7 172.55 <.0001
    gender 1 7 47.46 0.0002

    Least Squares Means

    Effect gender Estimate Standard Error DF t Value Pr > |t|
    gender f 48.0000 2.2991 7 20.88 <.0001
    gender m 70.4000 2.2991 7 30.62 <.0001

    Note! In SAS, the model statement automatically creates an intercept. This produces the correct table for testing the effects (using the adjusted sum of squares). However, including the intercept technically over-parameterizes the ANCOVA model resulting in an additional calculation step to obtain the regression equations. To get the intercepts for the covariate directly, we can re-parameterize the model by suppressing the intercept (noint) and then specifying that we want the solutions (solution) to the model. However, this should only be done to find the equation estimates, not to test effects. 

    Here is what the SAS code looks like for this:

    data equal_slopes;
    input gender $ salary years;
    datalines;
    m 78 3
    m 43 1
    m 103 5
    m 48 2
    m 80 4
    f 80 5
    f 50 3
    f 30 2
    f 20 1
    f 60 4
    ;
    proc mixed data=equal_slopes method=type3;
    class gender;
    model salary = gender years / noint solution;
    ods select SolutionF;
    title 'Equal Slopes Model';
    run;
    
    

    Here is the output:

    Solution for Fixed Effects
    Effect gender Estimate Standard Error DF t Value Pr > |t|
    gender f 2.7000 4.1447 7 0.65 0.5356
    gender m 25.1000 4.1447 7 6.06 0.0005
    years   15.1000 1.1495 7 13.14 <.0001

    In the first section of the output above a separate intercept is reported for each gender, the ‘Estimate’ value for each gender, and a common slope for both genders, labeled ‘Years’.

    Thus, the estimated regression equation for Females is \(\hat{y} = 2.7 + 15.1 \text{(Years)}\), and for males it is \(\hat{y} = 25.1 + 15.1 \text{(Years)}\)

    To this point in this analysis, we can see that 'gender' is now significant. By removing the impact of the covariate, we went from

    Type 3 Tests of Fixed Effects
    Effect Num DF Den DF F Value Pr > F
    gender 1 8 2.11 0.1840

    (without covariate consideration)

    to

    gender 1 7 47.46 0.0002

    (adjusting for the covariate)

Using our Salary example and the data in the table below, we can run through the steps for the ANCOVA. On this page, we will go through the steps using Minitab.

Females Males
Salary Years Salary Years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4
  1. Step 1: Are all regression slopes = 0?

    A simple linear regression can be run for each treatment group, Males and Females. To perform regression analysis on each gender group in Minitab, we will have to subdivide the salary data manually and separately, saving the male data into the Male Salary Dataset and the female data into the Female Salary dataset.

    Running these procedures using statistical software we get the following:

    Males

    Open the Male dataset in the Minitab project file (Male Salary Dataset).

    Then, from the menu bar, select Stat > Regression > Regression > Fit Regression Model

    In the pop-up window, select salary into Response and years into Predictors as shown below.

    Minitab Regression window with Responses=salary, Continuous predictors=years

    Click OK, and Minitab will output the following.

    Regression Analysis: Salary versus years

    Regression Equation

    salary = 24.8 + 15.2 years

    Coefficients

    Term Coef SE Coef T-Value P-Value VIF
    Constant 24.80 7.53 3.29 0.046  
    years 15.20 2.27 6.69 0.007 1.00

    Model Summary

    S R-sq R-sq(adj) R-sq(pred)
    7.18331 R-Sq = 93.7% 91.6% 85.94%

    Analysis of Variance

    Source DF SS MS F-Value P-Value
    Regression 1 2310.4 2310.40 44.78 0.007
         years 1 2310.4 2310.40 44.78 0.007
    Residual Error 3 154.8 51.6    
    Total 4 2465.2      

    Females

    Open Minitab dataset Female Salary Dataset. Follow the same procedure as was done for the Male dataset and Minitab will output the following:

    Regression Analysis: Salary versus years

    Regression Equation

    salary = 3.00 + 15.00 years

    Coefficients

    Term Coef SE Coef T-Value P-Value VIF
    Constant 3.00 3.32 0.90 0.432  
    years 15.00 1.00 15.00 0.001 1.00

    Model Summary

    S R-sq R-sq(adj) R-sq(pred)
    3.16228 98.68% 98.25% 95.92%

    Analysis of Variance

    Source DF SS MS F-Value P-Value
    Regression 1 2250.0 2250.0 225.00 0.001
         years 1 2250.0 2250.0 225.00 0.001
    Residual Error 3 30.0 10.0    
    Total 4 2280.0      

    In both cases, the simple linear regressions are significant, so the slopes are not = 0.

  2. Step 2: Are the slopes equal?

    We can test for this using our statistical software. In Minitab, we must now use GLM (general linear model) and be sure to include the covariate in the model. We will also include a ‘treatment x covariate’ interaction term and the significance of this term is what answers our question. If the slopes differ significantly among treatment levels, the interaction p-value will be < 0.05.

    First, open the dataset in the Minitab project file Salary Dataset. Then, from the menu select Stat > ANOVA > General Linear Model > Fit General Linear Model

    In the dialog box, select salary into Responses, gender into Factors, and years into Covariates.

    Minitab General Linear Model window with Responses= salary, Factors=gender and Covariates=years

    To add the interaction term, first click Model…. Then, use the shift key to highlight gender and years, and click Add. Click OK, then OK again, and Minitab will display the following output.

    Analysis of Variance

    Source DF Adj SS Adj MS F-Value P-Value
    year 1 4560.20 4560.20 148.06 0.000
    gender 1 216.02 216.02 7.01 0.038
    years*gender 1 0.20 0.20 0.01 0.938
    Error 6 184.80 30.80    
    Total 9 5999.60      
     

    It is clear the interaction term is not significant. This suggests the slopes are equal. In a plot of the regressions, we can also see that the lines are parallel.

    plot
  3. Step 3: Fit an Equal Slopes Model

    We can now proceed to fit an Equal Slopes model by removing the interaction term. This can be easily accomplished by starting again with STAT > ANOVA > General Linear Model > Fit General Linear Model 

    GLM: Model window with default X circled
  4. Click OK, then OK again, and Minitab will display the following output.

    Analysis of Variance

    Source DF Adj SS Adj MS F-Value P-Value
    year 1 4560.20 4560.20 172.55 0.000
    gender 1 1254.4 1254.40 47.46 0.000
    Error 7 185.0 26.43    
    Total 9 5999.6      

    To generate the mean comparisons select  STAT > ANOVA > General Linear Model > Comparisons... and fill in the dialog box as seen below.

    Minitab Comparisions windo with Response=salary, Type of Comparison=Pairwise and Method= Tukey

    Click OK and Minitab will produce the following output.

  5. Comparison of salary

    Tukey Pairwise Comparisons: gender

    Grouping information Using the Tukey Method and 95% Confidence

    gender N Mean Grouping
    Male 5 70.4 A
    gender 5 48.0 B

    Means that do not share a letter are significantly different.

First, we can input the data manually.

gender = c(rep("m",5),rep("f",5))
salary = c(78,43,103,48,80,80,50,30,20,60)
years = c(3,1,5,2,4,5,3,2,1,4)
salary_data = data.frame(salary,gender,years)

We can apply a simple linear regression for the male subset of the data and display the results using summary.

male_data = subset(salary_data,gender=="m")
lm_male = lm(salary~years,male_data)
summary(lm_male)
Call:
lm(formula = salary ~ years, data = male_data)

Residuals:
   1    2    3    4    5 
 7.6  3.0  2.2 -7.2 -5.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   24.800      7.534   3.292  0.04602 * 
years         15.200      2.272   6.691  0.00681 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.183 on 3 degrees of freedom
Multiple R-squared:  0.9372,	Adjusted R-squared:  0.9163 
F-statistic: 44.78 on 1 and 3 DF,  p-value: 0.006809

Next, we apply a simple linear regression for the female subset.

female_data = subset(salary_data,gender=="f")
lm_female = lm(salary~years,female_data)
summary(lm_female)
Call:
lm(formula = salary ~ years, data = female_data)

Residuals:
 6  7  8  9 10 
 2  2 -3  2 -3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.000      3.317   0.905 0.432389    
years         15.000      1.000  15.000 0.000643 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.162 on 3 degrees of freedom
Multiple R-squared:  0.9868,	Adjusted R-squared:  0.9825 
F-statistic:   225 on 1 and 3 DF,  p-value: 0.0006431

It is clear the regression for both treatments is significant. We continue to test for unequal slopes in the full dataset using an interaction term.

options(contrasts=c("contr.sum","contr.poly"))
lm_unequal = lm(salary~gender+years+gender:years,salary_data)
aov3_unequal = car::Anova(lm_unequal,type=3)
aov3_unequal
Anova Table (Type III tests)

Response: salary
             Sum Sq Df  F value    Pr(>F)    
(Intercept)   351.3  1  11.4055   0.01491 *  
gender        216.0  1   7.0136   0.03811 *  
years        4560.2  1 148.0584 1.874e-05 ***
gender:years    0.2  1   0.0065   0.93839    
Residuals     184.8  6                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The interaction term is not significant, suggesting the slopes do not differ significantly. We simplify the model to the equal-slope model without an interaction term.

lm_equal = lm(salary~gender+years,salary_data)
aov3_equal = car::Anova(lm_equal,type=3)
aov3_equal
Anova Table (Type III tests)

Response: salary
            Sum Sq Df F value    Pr(>F)    
(Intercept)  351.3  1  13.292 0.0082232 ** 
gender      1254.4  1  47.464 0.0002335 ***
years       4560.2  1 172.548 3.458e-06 ***
Residuals    185.0  7                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This is the final model since all terms are significant. We can then produce the LS means for the gender levels.

aov1_equal = aov(lm_equal)
lsmeans_gender = emmeans::emmeans(aov1_equal,~gender) 
lsmeans_gender
 gender emmean  SE df lower.CL upper.CL
 f        48.0 2.3  7     42.6     53.4
 m        70.4 2.3  7     65.0     75.8

Confidence level used: 0.95 

We can also find the regression equation coefficients. Note the female level was used as the reference level by default.

  aov1_equal$coefficients
(Intercept)     genderm       years 
        2.7        22.4        15.1 

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility