# Lesson 13: Experiments with Random Factors

Lesson 13: Experiments with Random Factors

## Introduction

Throughout most parts of this course we have discussed experiments with fixed factors. That is, the levels used for the factors are those of interest by the experimenter and the inference made is confined to those specific levels.  However, when factor levels are chosen at random from a larger population of potential levels, the factor is called a random factor. In this case, the statistical inference applies to the whole population of levels. Random factor models have many industrial applications including measurement system studies.

## Objectives

Upon successful completion of this lesson, you should be able to:

• Understanding the concept of random effect
• Getting familiar with random effect models and components of variance in each model
• Learning how to deal with models containing two random factors
• Getting familiar with how to analyze experiments where one of the factors is fixed and the other one is random
• Finding the expected mean squares using a simple algorithm

# 13.1 - Random Effects Models

13.1 - Random Effects Models

Imagine that we randomly select a of the possible levels of the factor of interest. In this case, we say that the factor is random. Typically random factors are categorical. While continuous covariates may be measured at random levels, we usually think of the effects as being systematic (such as linear, quadratic or even exponential) effects. Random effects are not systematic. The model helps make this clear.

As before, the usual single factor ANOVA applies which is

$$y_{ij}=\mu +\tau_i+\varepsilon_{ij} \left\{\begin{array}{c} i=1,2,\ldots,a \\ j=1,2,\ldots,n \end{array}\right.$$

However, herein, both the error term and treatment effects are random variables, that is

$$\varepsilon_{ij}\ \mbox{is }NID(0,\sigma^2)\mbox{ and }\tau_i\mbox{is }NID(0,\sigma^2_{\tau})$$

Also, $$\tau_{i}$$ and $$\epsilon_{ij}$$ are independent. The variances $$\sigma^{2}_{\tau}$$ and $$\sigma^{2}$$ are called variance components.

There might be some confusion about the differences between noise factors and random factors. Noise factors may be fixed or random. In Robust Parameter Designs we treat them as random because, although we control them in our experiment, they are not controlled under the conditions under which our system will normally be run. Factors are random when we think of them as a random sample from a larger population and their effect is not systematic.

It is not always clear when the factor is random. For example, if a company is interested in the effects of implementing a management policy at its stores and the experiment includes all 5 of its existing stores, it might consider "store" to be a fixed factor, because the levels are not a random sample. But if the company has 100 stores and picks 5 for the experiment, or if the company is considering a rapid expansion and is planning to implement the selected policy at the new locations as well, then "store" would be considered a random factor. We seldom consider random factors in $$2^k$$ or $$3^k$$ designs because 2 or 3 levels are not sufficient for estimating variances.

In the fixed effect models we test the equality of the treatment means. However, this is no longer appropriate because treatments are randomly selected and we are interested in the population of treatments rather than any individual one. The appropriate hypothesis test for a random effect is:

$$H_0 \colon \sigma^2_{\tau}=0$$
$$H_1 \colon \sigma^2_{\tau}>0$$

The standard ANOVA partition of the total sum of squares still works; and leads to the usual ANOVA display. However, as before, the form of the appropriate test statistic depends on the Expected Mean Squares. In this case, the appropriate test statistic would be

$$F_0=MS_{Treatments}/MS_E$$

which follows an F distribution with a-1 and N-a degrees of freedom. Furthermore, we are also interested in estimating the variance components $$\sigma^{2}_{\tau}$$ and $$\sigma^{2}$$. To do so, we use the analysis of variance method which consists of equating the expected mean squares to their observed values.

$${\hat{\sigma}}^2=MS_E\ \mbox{and}\ {\hat{\sigma}}^2+n{\hat{\sigma}}^2_{\tau}=MS_{Treatments}$$

$${\hat{\sigma}}^2_{\tau}=\frac{MS_{Treatment}-MS_E}{n}$$

$${\hat{\sigma}}^2=MS_E$$

Potential problem that may arise here is that the estimated treatment variance component may be negative. It such a case, it is proposed to either consider zero in case of a negative estimate or use another method which always results in a positive estimate. A negative estimate for the treatment variance component can also be viewed as a evidence that the linear model in not appropriate, which suggests looking for a better one.

Example 3.11 from the text discusses a single random factor case about the difference of looms in a textile weaving company. Four looms have been chosen randomly from a population of looms within a weaving shed and four observations of fabric strength were made on each loom.The data obtained from the experiment are below.

 Loom Obs 1 Obs 2 Obs 3 Obs 4 row sum 1 98 97 99 96 390 2 91 90 93 92 366 3 96 95 97 95 383 4 95 96 99 98 388

Here is the Minitab output for this example using Stat > ANOVA > Balanced ANOVA command.

Factor Types Levels Values
Loom random 4 1 2 3 4
##### Analysis of Variance for y
Source DF SS MS F P
Loom 3 89.188 29.729 15.68 0.000
Error 12 22.750 1.896
Total 15 111.938
Source Variance Component Error term Expected Mean Square for Each Term (using unrestricted model)
1 Loom 6.958 2 (2) + 4(1)
2 Error 1.896   (2)

The interpretation made from the ANOVA table is as before. With the p-value equal to 0.000 it is obvious that the looms in the plant are significantly different, or more accurately stated, the variance component among the looms is significantly larger than zero. And confidence intervals can be found for the variance components. The $$100(1-\alpha)\%$$ confidence interval for $$\sigma^2$$ is

$$\dfrac{(N-a)MS_E}{\chi^2_{\alpha/2,N-a}} \leq \sigma^2\leq \dfrac{(N-a)MS_E}{\chi^2_{1-\alpha/2,N-a}}$$

Confidence intervals for other variance components are provided in the textbook. It should be noted that a closed form expression for the confidence interval on some parameters may not be obtained.

# 13.2 - Two Factor Factorial with Random Factors

13.2 - Two Factor Factorial with Random Factors

Imagine that we have two factors, say A and B, that both have a large number of levels which are of interest. We will choose a random levels of factor A and b random levels for factor B and n observations are made at each treatment combination. The corresponding linear model for this case and the respective variance components would be

$$y_{ijk}=\mu+\tau_i+\beta_j+(\tau\beta)_{ij}+\varepsilon_{ijk} \left\{\begin{array}{c} i=1,2,\ldots,a \\ j=1,2,\ldots,b \\ k=1,2,\ldots,n \end{array}\right.$$

$$V(\tau_i)=\sigma^2_\tau,\ V(\beta_j)=\sigma^2_\beta,\ V((\tau\beta)_{ij})=\sigma^2_{\tau\beta},\ V(\varepsilon_{ijk})=\sigma^2$$

$$V(y_{ijk})=\sigma^2_\tau+\sigma^2_\beta+\sigma^2_{\tau\beta}+\sigma^2$$

Where $$\tau_i, \beta_j , (\tau \beta)_{ij}$$ and \epsilon_{ijk}\) are all NID random variables with mean zero and variance as shown above. The relevant hypotheses that we are interested in testing are:

$$H_0 \colon \sigma^2_{\tau}=0\quad H_0 \colon \sigma^2_{\beta}=0\quad H_0 \colon \sigma^2_{\tau\beta}=0$$
$$H_1 \colon \sigma^2_{\tau}>0\quad H_1 \colon \sigma^2_{\beta}>0\quad H_1 \colon \sigma^2_{\tau\beta}>0$$

The numerical calculations for the analysis of variance are exactly like in the fixed effect case. However, we state once again, that to form the test statistics, the expected mean squares should be taken into account. We state the expected mean squares (EMS) here and assuming the hypothesis is true, we form the F test statistics, so that under that assumption, both the numerator and denominator of the F statistic have the same expectation. Note that the test for the main effects are no longer what they were in the fixed factor situation.

$$E(MS_A)=\sigma^2+n\sigma^2_{\tau\beta}+bn\sigma^2_\tau \Longrightarrow F_0=\frac{MS_A}{MS_{AB}}$$

$$E(MS_B)=\sigma^2+n\sigma^2_{\tau\beta}+an\sigma^2_\beta \Longrightarrow F_0=\frac{MS_B}{MS_{AB}}$$

$$E(MS_{AB})=\sigma^2+n\sigma^2_{\tau\beta}\qquad\quad \Longrightarrow F_0=\frac{MS_{AB}}{MS_{E}}$$

$$E(MS_E)=\sigma^2$$

Furthermore, variance components can again be estimated using the analysis of variance method by equating the expected mean squares to their observed values.

$${\hat{\sigma}}^2_{\tau}=\frac{MS_A-MS_{AB}}{bn}$$

$${\hat{\sigma}}^2_{\beta}=\frac{MS_B-MS_{AB}}{an}$$

$${\hat{\sigma}}^2_{\tau\beta}=\frac{MS_{AB}-MS_E}{n}$$

$${\hat{\sigma}}^2=MS_E$$

Example 13.2 in the textbook discusses a two-factor factorial with random effects on a measurement system capability study. These studies are often called gauge capability studies or gauge repeatability and reproducibility (R&R) studies. In this example three randomly selected operators are selected to measure twenty randomly selected parts, each part twice. Data obtained from the experiment is shown in Table 13.3. The variance components are

$$\sigma^2_y=\sigma^2_\tau+\sigma^2_\beta+\sigma^2_{\tau\beta}+\sigma^2$$

Typically, $$\sigma^2$$ is called gauge repeatability because it shows the variation of the same part measured by the same operator and $$\sigma^{2}_{\beta} + \sigma^{2}_{\tau \beta}$$

which reflects the variation resulting from operators is called gauge reproducibility. Table 13.4 shows the analysis using Minitab’s Balanced ANOVA command.

#### Table 13-4 Analysis of variance (Minitab Balanced ANOVA) for Example 13-2

##### Analysis of Variance (Balanced Design)
Factor Type Levels Values
part random 20 1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
operator random 3 1 2 3
##### Analysis of Variance for Y
Source DF SS MS F P
part 19 1185.425 62.391 78.65 0.000
operator 2 2.617 1.308 1.84 0.173
part*operator 38 27.050 0.712 0.72 0.861
Error 60 59.500 0.992
Total 119 1274.592
Source Variance Component Error Term Expected mean Square for Each term (using unrestricted model)
1 part 10.2798 3 (4) + 2(3) + 6(1)
2 operator 0.0149 3 (4) + 2(3) + 40(2)
3 part*operator -0.1399 4 (4) + 2(3)
4 Error 0.9917   (4)
Table 13.4 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th Edition)

As it can be seen, the only significant effect is part. Estimates for components of variance and expected mean square for each term are given at the lower part of the table. Notice that the estimated variance for interaction term part*Operator is negative. The fact that the p-value for the interaction term is large along with the negative estimate of its variance is a good sign that the interaction term is actually zero. Therefore, we can proceed and fit a reduced model without part*operator term. The analysis of variance for the reduced model can be found in Table 13.5.

#### Table 13-5 Analysis of Variance for the Reduced Model, Example 13-2

##### Analysis of Variance (Balanced Design)
Factor Type Levels Values
part random 20 1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
operator random 3 1 2 3
##### Analysis of Variance for y
Source DF SS MS F P
part 19 1185.425 62.391 70.64 0.000
operator 2 2.617 1.308 1.48 0.232
Error 98 86.550 0.883
Total 119 1274.592
Source Variance Component Error Term Expected mean Square for Each term (using unrestricted model)
1 part 10.2513 3 (3) + 6(1)
2 operator 0.0106 3 (3) + 40(2)
3 Error 0.8832   (3)
Table 13.5 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th Edition)

Since the interaction term is zero, both of the effects is tested against the error term. Estimates of the variance component are given below at lower part of the table. Furthermore, as mentioned before, estimate of the variance of the gauge can be achieved as

$$\hat{\sigma}^2_{gauge}=\hat{\sigma}^2+\hat{\sigma}^2_{\beta}=0.88+0.01=0.89$$

which is relatively small comparing to the variability of the product.

# 13.3 - The Two Factor Mixed Models

13.3 - The Two Factor Mixed Models

Next, consider the case that one of the factors is fixed, say A, and the other one (B) is a random factor. This case is called the two-factor mixed model and the linear statistical model and respective components of variance is

$$y_{ijk}=\mu+\tau_i+\beta_j+(\tau\beta)_{ij}+\varepsilon_{ijk} \left\{\begin{array}{c} i=1,2,\ldots,a \\ j=1,2,\ldots,b \\ k=1,2,\ldots,n \end{array}\right.$$

$$V(\beta_j)=\sigma^2_\beta,\ V((\tau\beta)_{ij})=((a-1)/a)\sigma^2_{\tau\beta},\ V(\varepsilon_{ijk})=\sigma^2$$

$$\sum\limits_{i=1}^a \tau_i=0$$

Here $$\tau_i$$ is a fixed effect but $$\beta_j$$ and $$(\tau \beta)_{ij}$$ are assumed to be random effects and $$\epsilon_{ijk}$$ is a random error. Furthermore, $$\beta_{j}$$ and $$\epsilon_{ijk}$$ are NID. The interaction effect is also normal but not independent. There often is a restriction imposed on the interaction which is

$$\sum\limits_{i=1}^a(\tau\beta)_{ij}=(\tau\beta)_{.j}=0 \qquad j=1,2,\ldots,b$$

Because of the sum of interaction effects over the levels of the fixed factor equals zero, this version of the mixed model is called the restricted model. There exists another model which does not include such a restriction and is discussed later. Neither of these models is "correct" or "wrong" - they are both theoretical models for how the data behave. They have different implications for the meanings of the variance components. The restricted model is often used in the ANOVA setting. The unrestricted model is often used for more general designs that include continuous covariates and repeated or spatially correlated measurements.

Once again the tests of hypotheses for the mixed-model are:

$$H_0 \colon \tau_i=0\qquad H_0 \colon \sigma^2_{\beta}=0\qquad H_0 \colon \sigma^2_{\tau\beta}=0$$
$$H_1 \colon \tau_i\neq 0\qquad H_1 \colon \sigma^2_{\beta}>0\qquad H_1 \colon \sigma^2_{\tau\beta}>0$$

Furthermore, test statistics which are based on the expected mean squares are summarized as follows

$$E(MS_A)=\sigma^2+n\sigma^2_{\tau\beta}+\frac{bn\sum\limits_{i=1}^a \tau^2_i}{a-1} \Longrightarrow F_0=\frac{MS_A}{MS_{AB}}$$

$$E(MS_B)=\sigma^2+an\sigma^2_{\beta}\qquad\qquad \Longrightarrow F_0=\frac{MS_B}{MS_E}$$

$$E(MS_{AB})=\sigma^2+n\sigma^2_{\tau\beta}\qquad\qquad \Longrightarrow F_0=\frac{MS_{AB}}{MS_{E}}$$

$$E(MS_E)=\sigma^2$$

In the mixed model, it is possible to estimate the fixed factor effects as before which are shown here:

$$\hat{\mu}=\bar{y}_{..}$$

$$\hat{\tau}_i=\bar{y}_{i..}-\bar{y}_{...}$$

The variance components can be estimated using the analysis of variance method by equating the expected mean squares to their observed values:

$${\hat{\sigma}}^2_{\beta}=\frac{MS_B-MS_E}{an}$$

$${\hat{\sigma}}^2_{\tau\beta}=\frac{MS_{AB}-MS_E}{n}$$

$${\hat{\sigma}}^2=MS_E$$

Example 13.3 is the measurement system capability experiment where here we assume the operator has become a fixed factor while part is left as a random factor. Assuming the restricted version of the mixed effect model, Minitab’s balanced ANOVA routine output is given as follows.

#### The Restricted Model is Assumed.

##### Analysis of Variance (Balanced Design)
Factor Type Levels Values
part random 20 1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
operator random 3 1 2 3
##### Analysis of Variance for Y
Source DF SS MS F P
part 19 1185.425 62.391 62.92 0.000
operator 2 2.617 1.308 1.84 0.173
part*operator 38 27.050 0.712 0.72 0.861
Error 60 59.500 0.992
Total 119 1274.592
Source Variance Component Error Term Expected mean Square for Each term (using unrestricted model)
1 part 10.2332 4 (4) + 6(1)
2 operator   3 (4) + 2(3) + 40Q
3 part*operator -0.1399 4 (4) + 2(3)
4 Error 0.9917   (4)
Table 13.6 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th Edition)

Like before, there exists a large effect of parts, small operator effect and no part*operator interaction. Notice that again the variance component estimate for the part*operator interaction is negative, which considering its insignificant effect, leads us to assume it is zero and to delete this term from the model.

As mentioned before, there exist alternative analyses for the mixed effect models which are called the unrestricted mixed models. The linear statistical model and components of variance for the unrestricted mixed model are given as:

$$y_{ijk}=\mu+\alpha_i+\gamma_j+(\alpha\gamma)_{ij}+\varepsilon_{ijk} \left\{\begin{array}{c} i=1,2,\ldots,a \\ j=1,2,\ldots,b \\ k=1,2,\ldots,n \end{array}\right.$$

$$V(\gamma_j)=\sigma^2_\beta,\ V((\alpha\gamma)_{ij})=\sigma^2_{\alpha\gamma},\ V(\varepsilon_{ijk})=\sigma^2$$

$$\sum\limits_{i=1}^a \alpha_i=0$$

In the unrestricted mixed model, all of the random terms are assumed to be Normally and independently distributed (NID) and there is not a restriction on the interaction term which was previously imposed. As before, the relevant tests of hypotheses are given by:

$$H_0 \colon \alpha_i=0\qquad H_0 \colon \sigma^2_{\gamma}=0\qquad H_0 \colon \sigma^2_{\alpha\gamma}=0$$
$$H_1 \colon \alpha_i\neq 0\qquad H_1 \colon \sigma^2_{\gamma}>0\qquad H_1 \colon \sigma^2_{\alpha\gamma}>0$$

And the expected mean squares which determine the test statistics are

$$E(MS_A)=\sigma^2+n\sigma^2_{\alpha\gamma}+\frac{bn\sum\limits_{i=1}^a \alpha^2_i}{a-1} \Longrightarrow F_0=\frac{MS_A}{MS_{AB}}$$

$$E(MS_B)=\sigma^2+n\sigma^2_{\alpha\gamma}+an\sigma^2_\gamma\quad \Longrightarrow F_0=\frac{MS_B}{MS_{AB}}$$

$$E(MS_{AB})=\sigma^2+n\sigma^2_{\alpha\gamma}\qquad\qquad \Longrightarrow F_0=\frac{MS_{AB}}{MS_{E}}$$

$$E(MS_E)=\sigma^2$$

Again, to estimate the variance components, the analysis of variance method is used and the expected mean squares are equated to their observed values which result in:

$${\hat{\sigma}}^2_{\gamma}=\frac{MS_B-MS_{AB}}{an}$$

$${\hat{\sigma}}^2_{\alpha\gamma}=\frac{MS_{AB}-MS_E}{n}$$

$${\hat{\sigma}}^2=MS_E$$

Example 13.4 uses the unrestricted mixed model to analyze the measurement systems capability experiment. The Minitab solution for this unrestricted model is given here:

#### Table 13-7 Analysis of the Experiment in Example 13-3 Using the Unrestricted Model

##### Analysis of Variance (Balanced Design)
Factor Type Levels Values
part random 20 1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20
operator random 3 1 2 3
##### Analysis of Variance for y
Source DF SS MS F P
part 19 1185.425 62.391 87.65 0.000
operator 2 2.617 1.308 1.84 0.173
part*operator 38 27.050 0.712 0.72 0.861
Error 60 59.500 0.992
Total 119 1274.592
Source Variance Component Error Term Expected mean Square for Each term (using unrestricted model)
1 part 10.2798 3 (4) + 2(3) + 6(1)
2 operator   3 (4) + 2(3) + Q
3 part*operator -0.1399 4 (4) + 2(3)
4 Error 0.9917   (4)
Table 13.7 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th Edition)

It is difficult to provide guidelines for when the restricted or unrestricted mixed model should be used, because statisticians do not fully agree on this. Fortunately, the inference for the fixed effects does not differ for the 2 factor mixed model which is most often seen, and is usually the same in more complicated models as well.

# 13.4 - Finding Expected Mean Squares

13.4 - Finding Expected Mean Squares

As we have demonstrated, determining the appropriate test statistics in the analysis of variance method depends on finding the expected mean squares. In complex design situations and in the presence of random or mixed models it is tedious to apply the expectation operator. Therefore, it would be helpful to have a formal procedure by which we could derive the expected mean squares for the different terms in the model. Page 523 has listed a set of rules which works for any set of balanced models to derive the expected mean squares. These rules are consistent with the restricted mixed model and can be modified to incorporate the unrestricted model assumptions, as well.

It is worth mentioning that the test statistic is a ratio of two mean squares where the expected value of the numerator mean square differs from the expected value of the denominator mean square by the variance component or the fixed factor in which we are interested. Therefore, under the assumption of the null hypothesis, both the numerator and the denominator of the F ratio have the same EMS.

# 13.5 - Approximate F Tests

13.5 - Approximate F Tests

Sometimes in factorial experiments with three or more factors involving a random or mixed model, we determine that there is no exact test statistic for certain effects in the model. Satterthwaite (1946) proposed a test procedure which uses the linear combinations of the original mean squares to form the F-ratio. These linear combinations of the original mean squares are sometimes called “synthetic” mean squares. Details on how to build the test statistic and adjustments made to degrees of freedom based on Satterthwaite procedure can be found in Section 13.6.

Minitab will analyze these experiments and derive “synthetic” mean squares, although their “synthetic” mean squares are not always the best choice. Approximate tests based on large samples (which use modified versions of the Central Limit Theorem) are also available. Unfortunately, this is another case in which it is not clear that there is a best method.

  Link ↥ Has Tooltip/Popover Toggleable Visibility