13.3  Theoretical Results
13.3  Theoretical ResultsSo far, in an attempt to understand the analysis of variance method conceptually, we've been waving our hands at the theory behind the method. We can't procrastinate any further... we now need to address some of the theories behind the method. Specifically, we need to address the distribution of the error sum of squares (SSE), the distribution of the treatment sum of squares (SST), and the distribution of the allimportant Fstatistic.
The Error Sum of Squares (SSE)
Recall that the error sum of squares:
\(SS(E)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}\bar{X}_{i.})^2\)
quantifies the error remaining after explaining some of the variation in the observations \(X_{ij}\) by the treatment means. Let's see what we can say about SSE. Well, the following theorem enlightens us as to the distribution of the error sum of squares.
If:

the \(j^{th}\) measurement of the \(i^{th}\) group, that is, \(X_{ij}\), is an independently and normally distributed random variable with mean \(\mu_i\) and variance \(\sigma^2\)

and \(W^2_i=\dfrac{1}{n_i1}\sum\limits_{j=1}^{n_i} (X_{ij}\bar{X}_{i.})^2\) is the sample variance of the \(i^{th}\) sample
Then:
\(\dfrac{SSE}{\sigma^2}\)
follows a chisquare distribution with n−m degrees of freedom.
Proof
A theorem we learned (way) back in Stat 414 tells us that if the two conditions stated in the theorem hold, then:
\(\dfrac{(n_i1)W^2_i}{\sigma^2}\)
follows a chisquare distribution with \(n_{i}−1\) degrees of freedom. Another theorem we learned back in Stat 414 states that if we add up a bunch of independent chisquare random variables, then we get a chisquare random variable with the degrees of freedom added up, too. So, let's add up the above quantity for all n data points, that is, for \(j = 1\) to \(n_i\) and \(i = 1\) to m. Doing so, we get:
\(\sum\limits_{i=1}^{m}\dfrac{(n_i1)W^2_i}{\sigma^2}=\dfrac{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}\bar{X}_{i.})^2}{\sigma^2}=\dfrac{SSE}{\sigma^2}\)
Because we assume independence of the observations \(X_{ij}\), we are adding up independent chisquare random variables. (By the way, the assumption of independence is a perfectly fine assumption as long as we take a random sample when we collect the data.) Therefore, the theorem tells us that \(\dfrac{SSE}{\sigma^2}\) follows a chisquare random variable with:
\((n_11)+(n_21)+\cdots+(n_m1)=nm\)
degrees of freedom... as was to be proved.
Now, what can we say about the mean square error MSE? Well, one thing is...
Recall that to show that MSE is an unbiased estimator of \(\sigma^2\), we need to show that \(E(MSE) = \sigma^2\). Also, recall that the expected value of a chisquare random variable is its degrees of freedom. The results of the previous theorem, therefore, suggest that:
\(E\left[ \dfrac{SSE}{\sigma^2}\right]=nm\)
That said, here's the crux of the proof:
\(E[MSE]=E\left[\dfrac{SSE}{nm} \right]=E\left[\dfrac{\sigma^2}{nm} \cdot \dfrac{SSE}{\sigma^2} \right]=\dfrac{\sigma^2}{nm}(nm)=\sigma^2\)
The first equality comes from the definition of MSE. The second equality comes from multiplying MSE by 1 in a special way. The third equality comes from taking the expected value of \(\dfrac{SSE}{\sigma^2}\). And, the fourth and final equality comes from simple algebra.
Because \(E(MSE) = \sigma^2\), we have shown that, no matter what, MSE is an unbiased estimator of \(\sigma^2\)... always!
The Treatment Sum of Squares (SST)
Recall that the treatment sum of squares:
\(SS(T)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i}(\bar{X}_{i.}\bar{X}_{..})^2\)
quantifies the distance of the treatment means from the grand mean. We'll just state the distribution of SST without proof.
If the null hypothesis:
\(H_0: \text{all }\mu_i \text{ are equal}\)
is true, then:
\(\dfrac{SST}{\sigma^2}\)
follows a chisquare distribution with m−1 degrees of freedom.
When we investigated the mean square error MSE above, we were able to conclude that MSE was always an unbiased estimator of \(\sigma^2\). Can the same be said for the mean square due to treatment MST = SST/(m−1)? Well...
The mean square due to treatment is an unbiased estimator of \(\sigma^2\) only if the null hypothesis is true, that is, only if the m population means are equal.
Answer
Since MST is a function of the sum of squares due to treatment SST, let's start with finding the expected value of SST. We learned, on the previous page, that the definition of SST can be written as:
\(SS(T)=\sum\limits_{i=1}^{m}n_i\bar{X}^2_{i.}n\bar{X}_{..}^2\)
Therefore, the expected value of SST is:
\(E(SST)=E\left[\sum\limits_{i=1}^{m}n_i\bar{X}^2_{i.}n\bar{X}_{..}^2\right]=\left[\sum\limits_{i=1}^{m}n_iE(\bar{X}^2_{i.})\right]nE(\bar{X}_{..}^2)\)
Now, because, in general, \(E(X^2)=Var(X)+\mu^2\), we can do some substituting into that last equation, which simplifies to:
\(E(SST)=\left[\sum\limits_{i=1}^{m}n_i\left(\dfrac{\sigma^2}{n_i}+\mu_i^2\right)\right]n\left[\dfrac{\sigma^2}{n}+\bar{\mu}^2\right]\)
where:
\(\bar{\mu}=\dfrac{1}{n}\sum\limits_{i=1}^{m}n_i \mu_i\)
because:
\(E(\bar{X}_{..})=\dfrac{1}{n}\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} E(X_{ij})=\dfrac{1}{n}\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} \mu_i=\dfrac{1}{n}\sum\limits_{i=1}^{m}n_i \mu_i=\bar{\mu}\)
Simplifying our expectiation yet more, we get:
\(E(SST)=\left[\sum\limits_{i=1}^{m}\sigma^2\right]+\left[\sum\limits_{i=1}^{m}n_i\mu^2_i\right]\sigma^2n\bar{\mu}^2\)
And, simplifying yet again, we get:
\(E(SST)=\sigma^2(m1)+\left[\sum\limits_{i=1}^{m}n_i(\mu_i\bar{\mu})^2\right]\)
Okay, so we've simplified E(SST) as far as is probably necessary. Let's use it now to find E(MST).
Well, if the null hypothesis is true, \(\mu_1=\mu_2=\cdots=\mu_m=\bar{\mu}\), say, the expected value of the mean square due to treatment is:
\(E[M S T]=E\left[\frac{S S T}{m1}\right]=\sigma^{2}+\frac{1}{m1} \color{red}\overbrace{\color{black}\sum\limits_{i=1}^{m} n_{i}\left(\mu_{i}\bar{\mu}\right)^{2}}^0 \color{black}=\sigma^{2}\)
On the other hand, if the null hypothesis is not true, that is, if not all of the \(\mu_i\) are equal, then:
\(E(MST)=E\left[\dfrac{SST}{m1}\right]=\sigma^2+\dfrac{1}{m1}\sum\limits_{i=1}^{m} n_i(\mu_i\bar{\mu})^2>\sigma^2\)
So, in summary, we have shown that MST is an unbiased estimator of \(\sigma^2\) if the null hypothesis is true, that is, if all of the means are equal. On the other hand, we have shown that, if the null hypothesis is not true, that is, if all of the means are not equal, then MST is a biased estimator of \(\sigma^2\) because E(MST) is inflated above \(\sigma^2\). Our proof is complete.
Our work on finding the expected values of MST and MSE suggests a reasonable statistic for testing the null hypothesis:
\(H_0: \text{all }\mu_i \text{ are equal}\)
against the alternative hypothesis:
\(H_A: \text{at least one of the }\mu_i \text{ differs from the others}\)
is:
\(F=\dfrac{MST}{MSE}\)
Now, why would this F be a reasonable statistic? Well, we showed above that \(E(MSE) = \sigma^2\). We also showed that under the null hypothesis, when the means are assumed to be equal, \(E(MST) = \sigma^2\), and under the alternative hypothesis when the means are not all equal, E(MST) is inflated above \(\sigma^2\). That suggests then that:

If the null hypothesis is true, that is, if all of the population means are equal, we'd expect the ratio MST/MSE to be close to 1.

If the alternative hypothesis is true, that is, if at least one of the population means differs from the others, we'd expect the ratio MST/MSE to be inflated above 1.
Now, just two questions remain:
 Why do you suppose we call MST/MSE an Fstatistic?
 And, how inflated would MST/MSE have to be in order to reject the null hypothesis in favor of the alternative hypothesis?
Both of these questions are answered by knowing the distribution of MST/MSE.
The Fstatistic
If \(X_{ij} ~ N(\mu\), \(\sigma^2\)), then:
\(F=\dfrac{MST}{MSE}\)
follows an F distribution with m−1 numerator degrees of freedom and n−m denominator degrees of freedom.
Answer
It can be shown (we won't) that SST and SSE are independent. Then, it's just a matter of recalling that an F random variable is defined to be the ratio of two independent chisquare random variables. That is:
\(F=\dfrac{SST/(m1)}{SSE/(nm)}=\dfrac{MST}{MSE} \sim F(m1,nm)\)
as was to be proved.
Now this all suggests that we should reject the null hypothesis of equal population means:
if \(F\geq F_{\alpha}(m1,nm)\) or if \(P=P(F(m1,nm)\geq F)\leq \alpha\)
If you go back and look at the assumptions that we made in deriving the analysis of variance Ftest, you'll see that the Ftest for the equality of means depends on three assumptions about the data:
 independence
 normality
 equal group variances
That means that you'll want to use the Ftest only if there is evidence to believe that the assumptions are met. That said, as is the case with the twosample ttest, the Ftest works quite well even if the underlying measurements are not normally distributed unless the data are highly skewed or the variances are markedly different. If the data are highly skewed, or if there is evidence that the variances differ greatly, we have two analysis options at our disposal. We could attempt to transform the observations (take the natural log of each value, for example) to make the data more symmetric with more similar variances. Alternatively, we could use nonparametric methods (that are unfortunately not covered in this course).