3.6 - Greenhouse Example in Minitab & R

3.6 - Greenhouse Example in Minitab & R

One-Way ANOVA

  1. Step 1: Import the data

    The data (Lesson1 Data) can be copied and pasted from a word processor into a worksheet in Minitab: 

    &lesson 1 data in a table
  2. Step 2: Run the ANOVA

    To run the ANOVA, we use the sequence of tool-bar tabs: Stat > ANOVA > One-way…

    ANOVA window showing the STAT to ANOVA and One Way... choice tree

    You then get the pop-up box seen below. Be sure to select from the drop-down in the upper right 'Response data are in a separate column for each factor level':

    One Way ANOVA window with Responses field equal to Control F1 F2 F3

    Then we double-click from the left-hand list of factor levels to the input box labeled ‘Responses’, and then click on the box labeled Comparisons.

    One Way ANOVA Comparisons window with with Tukey checkbox marked

    We check the box for Tukey and then exit by clicking on OK . To generate the Diagnostics, we then click on the box for Graphs and select the "Three in one" option:

    One Way ANOVA: Graphs window with 'three in one' option selected

    You can now ‘back out' by clicking on OK in each nested panel.

  3. Step 3: Results

    Now in the Session Window, we see the ANOVA table along with the results of the Tukey Mean Comparison:

    One-Way ANOVA: Control, F1, F2, F3

    Method

    Null Hypothesis: All means are equal

    Alternative Hypothesis: Not all means are equal

    Significance Level: \(\alpha=0.05\)

    Equal variances were assumed for the analysis.

    Factor Information

    Factor Levels Values
    Factor 4 Control, F1, F2, F3

    Analysis of Variance

    Source DF Adj SS Adj MS F-Value P-Value
    Factor 3 251.44 83.813 27.46 0.000
    Error 20 61.03 3.052    
    Total 23 312.47      
     

    (Extracted from the output that follows from above.)

    Grouping Information Using Tukey Method

    N Mean Grouping
    F3 6 29.200 A    
    F1 6 28.600 A B  
    F2 6 25.867   B  
    Control 6 21.000     C

    Means that do not share a letter are significantly different.

    Tukey Simultatious 95% CIs graph

As can be seen, Minitab provides a difference in means plot, which can be conveniently used to identify the significantly different means by following the rule: if the confidence interval does not cross the vertical zero line, then the difference between the two associated means is statistically significant.

The diagnostic (residual) plots, as we asked for them, are in one figure:

Residual plots graphs for Control F1 F2 F3 variables

 

Note that the Normal Probability plot is reversed (i.e, the axes are switched) compared to the SAS output. Assessing straight line adherence is the same, and the residual analysis provided is comparable to SAS output.

First, load the greenhouse data in R. Note the greenhouse data are in separate columns.

setwd("~/path-to-folder/")
greenhouse_data<-read.table("greenhouse_data.txt",header=T)

We put our data in a stacked format; the first column has the response variable (height) and the second column has the treatment levels (fert). We can then attach the stacked dataset.

my_data<-stack(greenhouse_data)
colnames(my_data)<-c("height","fert")
attach(my_data)

To calculate the overall mean, standard deviation, and standard error we can use the following commands:

overall_mean<-mean(height)
overall_mean
[1] 26.16667
overall_sd<-sd(height) 
overall_sd
[1] 3.685892
overall_standard_error<-overall_sd/sqrt(length(height)) 
overall_standard_error
[1] 0.7523795

Next, we calculate the mean, standard deviation, and standard error for each group.

aggregate(height~fert, FUN=mean) 

     fert   height
1 Control 21.00000
2      F1 28.60000
3      F2 25.86667
4      F3 29.20000
aggregate(height~fert, FUN=sd) 
     fert   height
1 Control 1.000000
2      F1 2.437212
3      F2 1.899123
4      F3 1.288410
std_error_fn = function(x){ sd(x)/sqrt(length(x)) }
aggregate(height~fert, FUN=std_error_fn) 
     fert    height
1 Control 0.4082483
2      F1 0.9949874
3      F2 0.7753136
4      F3 0.5259911

Next, we produce a boxplot.

boxplot(height~fert,xlab="Fertilizer",ylab="Plant Height",
         main="Distribution of Plant Heights by Fertilizer")
Boxplot of Distribution of Plant Height by Fertilizer

Next, we run the ANOVA model. The most commonly used function for ANOVA in R is aov. However, this function implements a sequential sum of squares (Type 1). To implement Type 3 sum of squares, we utilize the Anova function within the car package. Type 3 hypotheses are only used with effects coding for categorical variables (which we will learn more about in later lessons). Therefore before we obtain the ANOVA table, we must change the coding options to contr.sum, i.e. effect coding.

options(contrasts=c("contr.sum","contr.poly"))

library(car)
lm1 = lm(height ~ fert, data=my_data)
aov3 = Anova(lm1, type=3)
aov3
Anova Table (Type III tests)

Response: height
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 16432.7  1 5384.817 < 2.2e-16 ***
fert          251.4  3   27.465 2.712e-07 ***
Residuals      61.0 20                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can see the sum of squares in the first column, the degrees of freedom in the second column, the F-test statistic in the third column, and finally, we can see the p-value in the fourth column.

Note the output doesn't give the SSTotal. To find it, use the identity \(SSTotal=SSTreatment+SSResidual=251.44+61.03=312.47\). Similarly for the df associated with \(SSTotal\), we can add the df of \(SSTreatment\) and \(SSResidual\).

We can examine the LS means interval plot using the emmeans function (from the emmeans package) applied to the output of the aov function.

aov1 = aov(lm1) # equivalent to aov1 = aov(height ~ fert)
library(emmeans)
lsmeans = emmeans(aov1, ~fert)
plot(lsmeans, xlab="height LS-Means with 95% CIs")
LS means plot for height vs fert

 

To obtain the Tukey pairwise comparisons of means with a 95% family-wise confidence level we use the following commands:

tpairs = TukeyHSD(aov1)
tpairs
plot(tpairs)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = lm1)

$fert
                diff        lwr         upr     p adj
F1-Control  7.600000  4.7770648 10.42293521 0.0000016
F2-Control  4.866667  2.0437315  7.68960188 0.0005509
F3-Control  8.200000  5.3770648 11.02293521 0.0000005
F2-F1      -2.733333 -5.5562685  0.08960188 0.0598655
F3-F1       0.600000 -2.2229352  3.42293521 0.9324380
F3-F2       3.333333  0.5103981  6.15626854 0.0171033
95% family-wise confidence level plot

Alternatively, the emmeans package can be used as follows (output not shown):

lsmeans = emmeans(aov1, ~fert)
tpairs2 = pairs(lsmeans,adjust="Tukey")
tpairs2
plot(tpairs2)

Finally, we can produce diagnostic (residual) plots as follows:

par(mfcol=c(2,2),mar=c(4.5,4.5,2,0)) # produces plots in one figure
plot(aov1,1)
plot(aov1,2)
hist(aov1$residuals,main="Residual Histogram",xlab="residuals")

combined image showing residuals vs fitted plot, residual histogram and Normal Q-Q plot


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility