14.1 - The Two-Stage Nested Design

When factor B is nested in levels of factor A, the levels of the nested factor don't have exactly the same meaning under each level of the main factor, in this case factor A. In a nested design, the levels of factor (B) are not identical to each other at different levels of factor (A), although they might have the same labels. For example, if A is school and B is teacher, teacher 1 will differ between the schools. This has to be kept in mind when trying to determine if the design is crossed or nested. To be crossed, the same teacher needs to teach at all the schools.

As another example, consider a company that purchases material from three suppliers and the material comes in batches. In this case, we might have 4 batches from each supplier, but the batches don't have the same characteristics of quality when purchased from different suppliers. Therefore, the batches would be nested. When we have a nested factor and you want to represent this in the model the identity of the batch always requires an index of the factor in which it is nested. The linear statistical model for the two-stage nested design is:

\(y_{ijk}=\mu+\tau_i+\beta_{j(i)}+\varepsilon_{k(ij)}
\left\{\begin{array}{c}
i=1,2,\ldots,a \\
j=1,2,\ldots,b \\
k=1,2,\ldots,n
\end{array}\right. \)

The subscript j(i) indicates that \(j^{th}\) level of factor B is nested under the \(i^{th}\) level of factor A. Furthermore, it is useful to think of replicates as being nested under the treatment combinations; thus, \(k(ij)\) is used for the error term. Because not every level of B appears with every level of A, there is no interaction between A and B. (In most of our designs, the error is nested in the treatments, but we only use this notation for error when there are other nested factors in the design).

When B is a random factor nested in A, we think of it as the replicates for A. So whether factor A is a fixed or random factor the error term for testing the hypothesis about A is based on the mean squares due to B(A) which is read "B nested in A". Table 14.1 displays the expected mean squares in the two-stage nested design for different combinations of factor A and B being fixed or random.

E(MS)	A Fixed	A Fixed	A Random
E(MS)	B Fixed	B Random	B Random
\(E(MS_A)\)	\(\sigma^{2}+\dfrac{b n \sum \tau_{i}^{2}}{a-1}\)	\(\sigma^{2}+n \sigma_{\beta}^{2}+\dfrac{b n \sum \tau_{i}^{2}}{a-1}\)	\(\sigma^{2}+n \sigma_{\beta}^{2}+b n \sigma_{\tau}^{2}\)
\(E(MS_{B(A)})\)	\(\sigma^2 + \dfrac{n \sum \sum \beta_{j(i)}^2}{a(b - 1)}\)	\(\sigma^{2}+n \sigma_{\beta}^{2}\)	\(\sigma^{2}+n \sigma_{\beta}^{2}\)
\(E(MS_E)\)	\(\sigma^2\)	\(\sigma^2\)	\(\sigma^2\)
Table 14.1 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th and 8th Edition

The analysis of variance table is shown in table 14.2.

Source of Variation	Sum of Squares	Degrees of Freedom	Mean Square
A	\(b n \sum\left(\overline{y}_{i . .}-\overline{y}_{\dots}\right)^{2}\)	a - 1	\(MS_A\)
B within A	\(n \sum \sum\left(\overline{y}_{i j .}-\overline{y}_{i .}\right)^{2}\)	a(b - 1)	\(MS_{B(A)}\)
Error	\(\sum \sum \sum\left(y_{i j k}-\overline{y}_{ij}\right)^{2}\)	ab(n - 1)	\(MS_E\)
Total	\(\sum \sum \sum\left(y_{i j k}-\overline{y}_{\dots}\right)^{2}\)	abn - 1
Table 14.2 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th and 8th Edition)

Another way to think about this is to note that batch is the experimental unit for the factor 'supplier'. Does it matter how many measurements you make on each batch? (Yes, this will improve your measurement precision on the batch.) However, the variability among the batches from the supplier is the appropriate measure of the variability of factor A, the suppliers.

Essentially the question that we want to answer is, "Is the purity of the material the same across suppliers?"

In this example the model assumes that the batches are random samples from each supplier, i.e. suppliers are fixed, the batches are random, and the observations are random.

Experimental design: Select four batches at random from each of three suppliers. Make three purity determinations from each batch. See the schematic representation of this design in Fig. 14-1.

Figure 14.1 A Two-staged nested design (Design and Analysis of Experiments, Douglas C. Montgomery, 7th and 8th Edition)

It is the average of the batches and the variability across the batches that are most important. When analyzing these data, we want to decide which supplier should they use? This will depend on both the supplier mean and the variability among batches?

Here is the design question: How many batches should you take and how many measurements should you make on each batch? This will depend on the cost of performing a measurement versus the cost of getting another batch. If measurements are expensive one could get many batches and just take a few measurements on each batch, or if it is costly to get a new batch then you may want to spend more money taking many multiple measurements per batch.

At a minimum, you need at least two measurements (n = 2) so that you can estimate the variability among your measurements, \(\sigma^2\), and at least two batches per supplier (b = 2) so you can estimate the variability among batches, \(\sigma^{2}_{\beta}\). Some would say that you need at least three in order to be sure!

To repeat the design question: how large should b and n be, or, how many batches versus how many samples per batch? This will be a function of the cost of taking a measurement and the cost of getting another batch. In order to answer these questions, you need to know these cost functions. It will also depend on the variance among batches versus the variance of the measurements within batches.

Minitab can provide the estimates of these variance components.

Minitab General Linear Model (unlike SAS GLM), bases its F tests on what the expected mean squares determine is the appropriate error. The program will tell us that when we test the hypothesis of no supplier effect, we should use the variation among batches (since Batch is random) as the error for the test.

Run the example given in Minitab Example14-1.mpx to see the test statistic, which is distributed as an F-distribution with 2 and 9 degrees of freedom.

Example 14.1: Practical Interpretation Section

There is no significant difference (p-value = 0.416) in purity among suppliers, but significant variation exists (p-value = 0.017) in purity among batches (within suppliers)

What are the practical implications of this conclusion?

Examine the residual plots. The plot of residuals versus supplier is very important (why?)

An assumption in the Analysis of Variance is that the variances are all equal. The measurement error should not depend on the batch means, i.e., the variation in measurement error is probably the same for a high-quality batch as it is for low-quality batch. We also assume the variability among batches, \(\sigma^{2}_B\), is the same for all suppliers. This is an assumption that you will want to check! Because the whole reason one supplier might be better than another is because they have lower variation among their batches. We always need to know what assumptions we are making and whether they are true or not. It is often the most important thing to learn - when you learn there is a failed assumption!

What if we had incorrectly analyzed this experiment as a crossed factorial rather than a nested design? The analysis would be:

The inappropriate Analysis of variance for crossed effects is shown in Table 14.5.

Source of Variation	Sum of Squares	Degrees of Freedom	Mean Square	\(F_0\)	P-Value
Suppliers (S)	15.06	2	7.53	1.02	0.42
Batches (B)	25.64	3	8.55	3.24	0.04
\(S \times B\) Interaction	44.28	6	7.38	2.80	0.03
Error	63.33	24	2.64
Total	148.31	35
Table 14.5 (Design and Analysis of Experiments, Douglas C. Montgomery, 7th and 8th Edition)

This analysis indicates that batches differ significantly and that there is significant interaction between batch and supplier. However, neither the main effect of Batch nor the interaction is meaningful, since batches are not the same across suppliers. Note that the sum of the Batch and the S × B Sum of Squares and Degree of Freedom is the Batch(Supplier) line in the correct Table.

For the model with the A factor also a random effect, analysis of variance method can be used to estimate all three components of variance.

\({\hat{\sigma}}^2=MS_E\)

\({\hat{\sigma}}^2_{\beta}=\frac{MS_{B(A)}-MS_E}{n}\)

And

\({\hat{\sigma}}^2_{\tau}=\frac{MS_A-MS_{B(A)}}{bn}\)