13.1 - Random Effects Models

Imagine that we randomly select a of the possible levels of the factor of interest. In this case, we say that the factor is random. Typically random factors are categorical. While continuous covariates may be measured at random levels, we usually think of the effects as being systematic (such as linear, quadratic or even exponential) effects. Random effects are not systematic. The model helps make this clear.

As before, the usual single factor ANOVA applies which is

\(y_{ij}=\mu +\tau_i+\varepsilon_{ij}
\left\{\begin{array}{c}
i=1,2,\ldots,a \\
j=1,2,\ldots,n
\end{array}\right. \)

However, herein, both the error term and treatment effects are random variables, that is

\(\varepsilon_{ij}\ \mbox{is }NID(0,\sigma^2)\mbox{ and }\tau_i\mbox{is }NID(0,\sigma^2_{\tau})\)

Also, \(\tau_{i}\) and \(\epsilon_{ij}\) are independent. The variances \(\sigma^{2}_{\tau}\) and \(\sigma^{2}\) are called variance components.

There might be some confusion about the differences between noise factors and random factors. Noise factors may be fixed or random. In Robust Parameter Designs we treat them as random because, although we control them in our experiment, they are not controlled under the conditions under which our system will normally be run. Factors are random when we think of them as a random sample from a larger population and their effect is not systematic.

It is not always clear when the factor is random. For example, if a company is interested in the effects of implementing a management policy at its stores and the experiment includes all 5 of its existing stores, it might consider "store" to be a fixed factor, because the levels are not a random sample. But if the company has 100 stores and picks 5 for the experiment, or if the company is considering a rapid expansion and is planning to implement the selected policy at the new locations as well, then "store" would be considered a random factor. We seldom consider random factors in \( 2^k\) or \( 3^k\) designs because 2 or 3 levels are not sufficient for estimating variances.

In the fixed effect models we test the equality of the treatment means. However, this is no longer appropriate because treatments are randomly selected and we are interested in the population of treatments rather than any individual one. The appropriate hypothesis test for a random effect is:

\(H_0 \colon \sigma^2_{\tau}=0\)
\(H_1 \colon \sigma^2_{\tau}>0\)

The standard ANOVA partition of the total sum of squares still works; and leads to the usual ANOVA display. However, as before, the form of the appropriate test statistic depends on the Expected Mean Squares. In this case, the appropriate test statistic would be

\(F_0=MS_{Treatments}/MS_E\)

which follows an F distribution with a-1 and N-a degrees of freedom. Furthermore, we are also interested in estimating the variance components \(\sigma^{2}_{\tau}\) and \(\sigma^{2}\). To do so, we use the analysis of variance method which consists of equating the expected mean squares to their observed values.

\({\hat{\sigma}}^2=MS_E\ \mbox{and}\ {\hat{\sigma}}^2+n{\hat{\sigma}}^2_{\tau}=MS_{Treatments}\)

\({\hat{\sigma}}^2_{\tau}=\frac{MS_{Treatment}-MS_E}{n}\)

\({\hat{\sigma}}^2=MS_E\)

Potential problem that may arise here is that the estimated treatment variance component may be negative. It such a case, it is proposed to either consider zero in case of a negative estimate or use another method which always results in a positive estimate. A negative estimate for the treatment variance component can also be viewed as a evidence that the linear model in not appropriate, which suggests looking for a better one.

Example 3.11 from the text discusses a single random factor case about the difference of looms in a textile weaving company. Four looms have been chosen randomly from a population of looms within a weaving shed and four observations of fabric strength were made on each loom.The data obtained from the experiment are below.

Loom Obs 1 Obs 2 Obs 3 Obs 4 row sum
1 98 97 99 96 390
2 91 90 93 92 366
3 96 95 97 95 383
4 95 96 99 98 388

Here is the Minitab output for this example using Stat > ANOVA > Balanced ANOVA command.

Factor Types Levels Values
Loom random 4 1 2 3 4
Analysis of Variance for y
Source DF SS MS F P
Loom 3 89.188 29.729 15.68 0.000
Error 12 22.750 1.896    
Total 15 111.938      
Source Variance Component Error term Expected Mean Square for Each Term (using unrestricted model)
1 Loom 6.958 2 (2) + 4(1)
2 Error 1.896   (2)

The interpretation made from the ANOVA table is as before. With the p-value equal to 0.000 it is obvious that the looms in the plant are significantly different, or more accurately stated, the variance component among the looms is significantly larger than zero. And confidence intervals can be found for the variance components. The \(100(1-\alpha)\%\) confidence interval for \(\sigma^2\) is

\(\dfrac{(N-a)MS_E}{\chi^2_{\alpha/2,N-a}} \leq \sigma^2\leq \dfrac{(N-a)MS_E}{\chi^2_{1-\alpha/2,N-a}}\)

Confidence intervals for other variance components are provided in the textbook. It should be noted that a closed form expression for the confidence interval on some parameters may not be obtained.