3.6 - Greenhouse Example in Minitab & R

One-Way ANOVA

Example
Example

Step 1: Import the data
The data (Lesson1 Data) can be copied and pasted from a word processor into a worksheet in Minitab:
Step 2: Run the ANOVA
To run the ANOVA, we use the sequence of tool-bar tabs: Stat > ANOVA > One-way…

You then get the pop-up box seen below. Be sure to select from the drop-down in the upper right 'Response data are in a separate column for each factor level':

Then we double-click from the left-hand list of factor levels to the input box labeled ‘Responses’, and then click on the box labeled Comparisons.

We check the box for Tukey and then exit by clicking on OK . To generate the Diagnostics, we then click on the box for Graphs and select the "Three in one" option:

You can now ‘back out' by clicking on OK in each nested panel.

Step 3: Results

Now in the Session Window, we see the ANOVA table along with the results of the Tukey Mean Comparison:

One-Way ANOVA: Control, F1, F2, F3

Method

Null Hypothesis: All means are equal

Alternative Hypothesis: Not all means are equal

Significance Level: \(\alpha=0.05\)

Equal variances were assumed for the analysis.

Factor Information

Factor	Levels	Values
Factor	4	Control, F1, F2, F3

Analysis of Variance

Source	DF	Adj SS	Adj MS	F-Value	P-Value
Factor	3	251.44	83.813	27.46	0.000
Error	20	61.03	3.052
Total	23	312.47

(Extracted from the output that follows from above.)

Grouping Information Using Tukey Method

N		Mean	Grouping
F3	6	29.200	A
F1	6	28.600	A	B
F2	6	25.867		B
Control	6	21.000			C

Means that do not share a letter are significantly different.

As can be seen, Minitab provides a difference in means plot, which can be conveniently used to identify the significantly different means by following the rule: if the confidence interval does not cross the vertical zero line, then the difference between the two associated means is statistically significant.

The diagnostic (residual) plots, as we asked for them, are in one figure:

Residual plots graphs for Control F1 F2 F3 variables

Note that the Normal Probability plot is reversed (i.e, the axes are switched) compared to the SAS output. Assessing straight line adherence is the same, and the residual analysis provided is comparable to SAS output.

First, load the greenhouse data in R. Note the greenhouse data are in separate columns.

setwd("~/path-to-folder/")
greenhouse_data<-read.table("greenhouse_data.txt",header=T)

We put our data in a stacked format; the first column has the response variable (height) and the second column has the treatment levels (fert). We can then attach the stacked dataset.

my_data<-stack(greenhouse_data)
colnames(my_data)<-c("height","fert")
attach(my_data)

To calculate the overall mean, standard deviation, and standard error we can use the following commands:

overall_mean<-mean(height)
overall_mean

[1] 26.16667

overall_sd<-sd(height) 
overall_sd

[1] 3.685892

overall_standard_error<-overall_sd/sqrt(length(height)) 
overall_standard_error

[1] 0.7523795

Next, we calculate the mean, standard deviation, and standard error for each group.

aggregate(height~fert, FUN=mean)

     fert   height
1 Control 21.00000
2      F1 28.60000
3      F2 25.86667
4      F3 29.20000

aggregate(height~fert, FUN=sd)

     fert   height
1 Control 1.000000
2      F1 2.437212
3      F2 1.899123
4      F3 1.288410

std_error_fn = function(x){ sd(x)/sqrt(length(x)) }
aggregate(height~fert, FUN=std_error_fn)

     fert    height
1 Control 0.4082483
2      F1 0.9949874
3      F2 0.7753136
4      F3 0.5259911

Next, we produce a boxplot.

boxplot(height~fert,xlab="Fertilizer",ylab="Plant Height",
         main="Distribution of Plant Heights by Fertilizer")

Boxplot of Distribution of Plant Height by Fertilizer

Next, we run the ANOVA model. The most commonly used function for ANOVA in R is aov. However, this function implements a sequential sum of squares (Type 1). To implement Type 3 sum of squares, we utilize the Anova function within the car package. Type 3 hypotheses are only used with effects coding for categorical variables (which we will learn more about in later lessons). Therefore before we obtain the ANOVA table, we must change the coding options to contr.sum, i.e. effect coding.

options(contrasts=c("contr.sum","contr.poly"))

library(car)
lm1 = lm(height ~ fert, data=my_data)
aov3 = Anova(lm1, type=3)
aov3

Anova Table (Type III tests)

Response: height
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 16432.7  1 5384.817 < 2.2e-16 ***
fert          251.4  3   27.465 2.712e-07 ***
Residuals      61.0 20                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can see the sum of squares in the first column, the degrees of freedom in the second column, the F-test statistic in the third column, and finally, we can see the p-value in the fourth column.

Note the output doesn't give the SSTotal. To find it, use the identity \(SSTotal=SSTreatment+SSResidual=251.44+61.03=312.47\). Similarly for the df associated with \(SSTotal\), we can add the df of \(SSTreatment\) and \(SSResidual\).

We can examine the LS means interval plot using the emmeans function (from the emmeans package) applied to the output of the aov function.

aov1 = aov(lm1) # equivalent to aov1 = aov(height ~ fert)
library(emmeans)
lsmeans = emmeans(aov1, ~fert)
plot(lsmeans, xlab="height LS-Means with 95% CIs")

To obtain the Tukey pairwise comparisons of means with a 95% family-wise confidence level we use the following commands:

tpairs = TukeyHSD(aov1)
tpairs
plot(tpairs)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = lm1)

$fert
                diff        lwr         upr     p adj
F1-Control  7.600000  4.7770648 10.42293521 0.0000016
F2-Control  4.866667  2.0437315  7.68960188 0.0005509
F3-Control  8.200000  5.3770648 11.02293521 0.0000005
F2-F1      -2.733333 -5.5562685  0.08960188 0.0598655
F3-F1       0.600000 -2.2229352  3.42293521 0.9324380
F3-F2       3.333333  0.5103981  6.15626854 0.0171033

Alternatively, the emmeans package can be used as follows (output not shown):

lsmeans = emmeans(aov1, ~fert)
tpairs2 = pairs(lsmeans,adjust="Tukey")
tpairs2
plot(tpairs2)

Finally, we can produce diagnostic (residual) plots as follows:

par(mfcol=c(2,2),mar=c(4.5,4.5,2,0)) # produces plots in one figure
plot(aov1,1)
plot(aov1,2)
hist(aov1$residuals,main="Residual Histogram",xlab="residuals")

combined image showing residuals vs fitted plot, residual histogram and Normal Q-Q plot