3.1 - The Model | STAT 502

The effects model for one way ANOVA is a linear additive statistical model which relates the response to the treatment and can be expressed as

\(Y_{ij}=\mu+\tau_i+\epsilon_{ij}\)

where \(\mu\) is the grand mean, \(\tau_i\) (i = 1, 2, ..., T) are the deviations from the grand mean due to the treatment levels and \(\epsilon_{ij}\) are the error terms. The quantities \(\tau_i\) add to zero, and are also referred to as the treatment level effects. Alternatively, the errors account for the amount ‘left over’ after considering the grand mean and the effect of a particular treatment level.

Here is the analogy in terms of the greenhouse experiment. Imagine someone unaware that different fertilizers were used inquiring about the plant heights on average. The overall sample mean, an estimate of the true grand mean, will be a suitable response to this inquiry. On the other hand, the overall mean would not be satisfactory information for the experimenter of the study who obviously suspects that there will be height differences among different fertilizer types. Instead, more informative to the experimenter are the plant height estimates after including the effect of the treatment \(\tau_i\).

Note: The actual plant height can never be known because there is an unknown measurement error associated with any observation. The unknown error associated with the ith treatment level and the jth observation is denoted \(\epsilon_{ij}\) (\(i = 1,2,..., T, j = 1,2,...,n_i\)). It is a random component (noise) that reflects the unexplained variability among plants within treatment levels.

Under the null hypothesis where the treatment effect is zero, the reduced model can be written \(Y_{ij}=\mu+\epsilon_{ij}\).

Under the alternative hypothesis, where the treatment effects are not zero for at least one treatment level, the full model can be written \(Y_{ij}=\mu +\tau_i+\epsilon_{ij}\).

If \(SSE(R)\) denotes the error sums of squares associated with the reduced model and \(SSE(F)\) denotes the error sums of squares associated with the full model, we can utilize the General Linear Test approach to test the null hypothesis using the test statistic:

\(F=\frac{\left(\dfrac{SSE(R)-SSE(F)}{df_R-df_F}\right)}{\left(\dfrac{SSE(F)}{df_F}\right)}\)

Under the null hypothesis, this statistic has an F-distribution with numerator and denominator degrees of freedom equal to \(df_R-df_F\) and \(df_F\), respectively, where \(df_R\) is the degrees of freedom associated with \(SSE(R)\) and \(df_F\) is the degree of freedom associated with \(SSE(F)\). It is easy to see that \(df_R=N-1\) and \(df_F=N-T\) where \(N=\sum_{i=1}^{T} n_{i}\).

Also,

\(SSE(R) = \sum_i \sum_j (Y_{ij} - \bar{Y}_{..})^2=SS_{Total}\) (See Section 2.2)

Therefore the test statistic can be rewritten solely in terms from the full model (dropping the "\((F)\)" notation),

\begin{aligned} F &= \frac{\left(\dfrac{SS_{Total}-SSE}{T-1}\right)}{\left(\dfrac{SSE}{N-T}\right)} \\ &= \frac{\left(\dfrac{SS_{Trt}}{df_{Trt}}\right)}{\left(\dfrac{SSE}{df_{Error}}\right)} \\ &= \frac{MS_{Trt}}{MSE} \end{aligned}

Note that this is the same test statistic derived in Section 2.2 for testing the treatment significance. If the null hypothesis is true, then the treatment effect is not significant. If we reject the null hypothesis, then we conclude that the treatment effect is significant, which leads to the conclusion that at least one treatment level is better than the others.