## Two-Factor Factorial

Let's return to our greenhouse example and further think about the addition of the plant species impacting the plant height. We could have run a multi-factor experiment to also compare 2 different species (Species A and Species B). We would then need to assign combinations of fertilizer and species levels to 48 pots to have 6 replications in the greenhouse. This would be a referred to as 2 × 4 factorial treatment design.

The data might look like this:

Fertilizer Treatment | |||||
---|---|---|---|---|---|

Control | F1 | F2 | F3 | ||

Species | A |
21.0 | 32.0 | 22.5 | 28.0 |

19.5 | 30.5 | 26.0 | 27.5 | ||

22.5 | 25.0 | 28.0 | 31.0 | ||

21.5 | 27.5 | 27.0 | 29.5 | ||

20.5 | 28.0 | 26.5 | 30.0 | ||

21.0 | 28.6 | 25.2 | 29.2 | ||

B |
23.7 | 30.1 | 30.6 | 36.1 | |

23.8 | 28.9 | 31.1 | 36.6 | ||

23.7 | 34.4 | 34.9 | 37.1 | ||

22.8 | 32.7 | 30.1 | 36.8 | ||

22.8 | 32.7 | 30.1 | 36.8 | ||

24.4 | 32.7 | 25.5 | 37.1 |

The ANOVA table would now be constructed as follows:

Source |
df |
SS |
MS |
F |

Fertilizer | (4 -1) = 3 | |||

Species | (2 -1) = 1 | |||

Fertilizer × Species | (2 -1)(4 -1) = 3 | |||

Error | 47 - 7 = 40 | |||

Total | N - 1 = 47 |

In SAS or Minitab, we need first to re-format the data into a ‘stacked’ format. You can do this in an Excel worksheet and then copy and paste the stacked data into either SAS or Minitab.

```
data greenhouse_2way;
input fert $ species $ height;
datalines;
control SppA 21.0
control SppA 19.5
control SppA 22.5
control SppA 21.5
control SppA 20.5
control SppA 21.0
control SppB 23.7
control SppB 23.8
control SppB 23.8
control SppB 23.7
control SppB 22.8
control SppB 24.4
f1 SppA 32.0
f1 SppA 30.5
f1 SppA 25.0
f1 SppA 27.5
f1 SppA 28.0
f1 SppA 28.6
f1 SppB 30.1
f1 SppB 28.9
f1 SppB 30.9
f1 SppB 34.4
f1 SppB 32.7
f1 SppB 32.7
f2 SppA 22.5
f2 SppA 26.0
f2 SppA 28.0
f2 SppA 27.0
f2 SppA 26.5
f2 SppA 25.2
f2 SppB 30.6
f2 SppB 31.1
f2 SppB 28.1
f2 SppB 34.9
f2 SppB 30.1
f2 SppB 25.5
f3 SppA 28.0
f3 SppA 27.5
f3 SppA 31.0
f3 SppA 29.5
f3 SppA 30.0
f3 SppA 29.2
f3 SppB 36.1
f3 SppB 36.6
f3 SppB 38.7
f3 SppB 37.1
f3 SppB 36.8
f3 SppB 37.1
;
run;
proc mixed data=greenhouse_2way method=type3;
class fert species;
model height = fert species fert*species;
store out2way;
run;
ods graphics on;
ods html style=statistical sge=on;
proc plm restore=out2way;
lsmeans fert*species / adjust=tukey plot=meanplot cl lines;
/* Because the 2-factor interaction is significant, we need to work with the treatment combination means */
ods exclude diffs diffplot;
run; title; run;
```

The piece of the SAS code above that runs the two factor factorial model is:

```
proc mixed data=greenhouse_2way method=type3;
class fert species;
model height = fert species fert*species;
store out2way;
run;
```

Proc mixed is the same SAS procedure we used for the single factor ANOVA. Again we specify the data. The method is type 3, which is the way the F test is calculated. We specify the two factors as class variables because they are categorical. The model statement should look familiar. Compare this to the "theoretical" illustration in 4.1, do you see the similarities? The store command stores the elements for our LS interval plot.

As a means of review, think about where the sums of squares are coming from. We have covered the sums of squares for the individual factors in our previous lessons, but now you can see from the output that we have sums of squares for the interaction term. The following (partial) output shows the interaction term in the model:

Type 3 Analysis of Variance | ||||||||
---|---|---|---|---|---|---|---|---|

Source | DF | Sum of Squares | Mean Square | Expected Mean Square | Error Term | Error DF | F Value | Pr > F |

fert | 3 | 745.437500 | 248.479167 | Var(Residual)+Q(fert,fert*species) | MS(Residual) | 40 | 73.10 | <.0001 |

species | 1 | 236.740833 | 236.740833 | Var(Residual)+Q(species,fert*species) | MS(Residual) | 40 | 69.65 | <.0001 |

fert*species | 3 | 50.584167 | 16.861389 | Var(Residual)+Q(fert*species) | MS(Residual) | 40 | 4.96 | 0.0051 |

Residual | 40 | 135.970000 | 3.399250 | Var(Residual) |

or summarized in the table of Type 3 Tests of Fixed Effects:

Type 3 Tests of Fixed Effects | ||||
---|---|---|---|---|

Effect | Num DF | Den DF | F Value | Pr > F |

fert | 3 | 40 | 73.10 | <.0001 |

species | 1 | 40 | 69.65 | <.0001 |

fert*species | 3 | 40 | 4.96 | 0.0051 |

So what is our conclusion about this data? Interpret the interaction term first. Is this interaction term significant? Yes! So you do NOT interpret the individual factors (fert or species).

***** RULE *****Interpreting the interaction term first makes sense if you think about it. If there is a significant interaction (as we see above), then it tells us that the effect of the levels of FactorA depends upon what level of FactorB you are considering. Likewise, in order to evaluate the effect of FactorB, you need to specify what level of Factor A you are considering. In other words, in the presence of a significant interaction, you can’t unambiguously interpret the main effects for treatments involved in the interaction. In the case where an interaction is not significant we can drop the interaction term from our model (See the next page in this lesson: 4.1.1a -The Additive Model (No Interaction)).

So now that we have looked at the ANOVA output and see the significant interaction term, we know that we want to generate the LSmeans for the interaction effect (i.e., the treatment combinations) for mean comparisons and plotting our figure. The SAS code (from the program greenhouse_2way.sas) that generates these results look like:

```
ods graphics on;
ods html style=statistical sge=on;
proc plm restore=out2way;
lsmeans fert*species / adjust=tukey plot=meanplot cl lines;
/* Because the 2-factor interaction is significant, we need to work with the treatment combination means */
ods exclude diffs diffplot;
run; title; run;
```

SAS Output for the LSmeans:

fert*species Least Squares Means | |||||||||
---|---|---|---|---|---|---|---|---|---|

fert | species | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |

control | SppA | 21.0000 | 0.7527 | 40 | 27.90 | <.0001 | 0.05 | 19.4788 | 22.5212 |

control | SppB | 32.7000 | 0.7527 | 40 | 31.49 | <.0001 | 0.05 | 22.1788 | 25.2212 |

f1 | SppA | 28.6000 | 0.7527 | 40 | 38.00 | <.0001 | 0.05 | 27.0788 | 30.1212 |

f1 | SppB | 31.6167 | 0.7527 | 40 | 42.00 | <.0001 | 0.05 | 30.0954 | 33.1379 |

f2 | SppA | 25.8667 | 0.7527 | 40 | 34.37 | <.0001 | 0.05 | 24.3454 | 27.3879 |

f2 | SppB | 30.0500 | 0.7527 | 40 | 39.92 | <.0001 | 0.05 | 28.5288 | 31.5712 |

f3 | SppA | 29.2000 | 0.7527 | 40 | 38.79 | <.0001 | 0.05 | 27.6788 | 30.7212 |

f3 | SppB | 37.0667 | 0.7527 | 40 | 49.25 | <.0001 | 0.05 | 35.5454 | 38.5879 |

Note that the *p*-values here (*Pr* > *t*) are testing the hypotheses that the estimates of the means = 0. This is not usually of interest. Notice also that we see a single value for the standard error based on the MSE from the ANOVA, rather than a separate standard error for each mean (as we would get from Proc Summary for the sample means). Again in this example, with equal sample sizes and no covariates, the LSmeans will be identical with the ordinary means we would see with the Summary Procedure.

The results of the Tukey test were:

Tukey Grouping for fert*species Least Squares Means (Alpha=0.05) |
||||
---|---|---|---|---|

LS-means with the same letter are not significantly different | ||||

fert | species | Estimate | ||

f3 | SppB | 37.0667 | A | |

f1 | SppB | 31.6167 | B | |

B | ||||

f2 | SppB | 30.0500 | B | |

B | ||||

f3 | SppA | 29.2000 | C | B |

C | B | |||

f1 | SppA | 28.6000 | C | B |

C | ||||

f2 | SppA | 25.8667 | C | D |

D | ||||

control | SppB | 23.7000 | E | D |

E | ||||

control | SppA | 21.0000 | E |

The PLM procedure produces the graph which we can then label, using the Graphics Editor, with the mean comparison lettering:

If a traditional bar chart is desired, you can create the graph in Excel by first pasting the LSmeans and standard errors into an Excel spreadsheet, sorting the LSmeans by species, then creating two series: one for SppA, and then another series for SppB. When we plot the means by series, we get a graph of the response (Height) vs. Fertilizer with the second crossed factor (Species) indicated in a legend:

*Figure Caption: Mean plant height for the four fertilizer levels for each of the two plant species. Bar height indicates the mean and error bars are +/- 1 standard error. Means sharing the same letter do not differ significantly at the 95% confidence level based on the Tukey mean comparison method. *

You can see here, as a finished product, we elected to show the standard errors associated with the LSmeans. Because LSmeans can differ from the original sample treatment level means, a safe bet is to plan on using the LSmeans and associated standard errors from the ANOVA for the finished plot. To be sure how best to represent means, you should plan on checking with refereed journal *Instructions to Authors* for style guidelines.

Using the information from the ANOVA, LSmeans, and mean comparisons to identify differences between treatment levels is an important final step. In other words, the ANOVA doesn't end with a *p*-value for an *F*-test.

Here is a brief discussion (video) of this and a couple of other thoughts on this topic.