6.2 - Estimated Effects and the Sum of Squares from the Contrasts

How can we apply what we learned in the preceding section?

In general for \(2^k\) factorials the effect of each factor and interaction is:

\(\text{Effect} = (1/2^{(k-1)}n)\) [contrast of the totals]

We also defined the variance as follows:

\(\text{Variance(Effect)} = \sigma^2 / 2^{(k-2)}n\)

The true but unknown residual variance \(\sigma^2\), which is also called the within-cell variance, can be estimated by the MSE.

If we want to test an effect, for instance, say A = 0, then we can construct a t-test which is the effect over the square root of the estimated variance of the effect as follows:

\(t^{\ast}=\dfrac{Effect}{\sqrt{\frac{MSE}{n2^{k-2}}}} \sim t(2^k (n-1))\)

where ~ means that it has a t distribution with \(2^{k}(n-1)\) degrees of freedom.

Finally, here is the equation for the sum of squares due to an effect to complete the story here:

\(\text{SS(Effect)} = \text{(contrast of totals)}^{2} / 2^{k}n\)

Where does all of this come from? Each effect in a \(2^k\) model has one degree of freedom. In the simplest case, we have two main effects and one interaction. They each have 1 degree of freedom. So the t statistic is the ratio of the effect over its estimated standard error (standard deviation of the effect). You will recall that if you have a t statistic with \(\nu\) degrees of freedom and square it, you get an F distribution with one and \(\nu\) degrees of freedom.

\(t^2(v)=F(1,v)\)

We can use this fact to confirm the formulas just developed. We see that the

\((t^{\ast}(\nu))^2=\dfrac{(Effect)^2}{MSE/n2^{k-2}}\)

and from the definition of an F-test, when the numerator has 1 degree of freedom:

\(F(1,\nu)=\dfrac{SS(Effect)/1}{MSE}=\dfrac{(contrast)^2}{2^kn(MSE)}\)

But from the definition of an Effect, we can write \(\text{(Effect)}^2 = \text{(contrast)}^2 / (n2^{k-1})^{2}\) and thus \(F(1, \nu) = (t*(\nu))^2\) which you can show by some algebra or by calculating an example.

Hint: Multiply \(F(1, \nu)\) by \((2^{(k-1)}n)^{2} / (2^{(k-1)}n)^{2}\) and simplify.

Once you have these contrasts, you can easily calculate the effect, you can calculate the estimated variance of the effect and the sum of squares due to the effect as well.

Creating a Factorial Design in Minitab Section

Let's use Minitab to help us create a factorial design and then add data so that we can analyze it. The data come from Figure 6.1.

The video   demonstrations are based on Minitab v19.

In Minitab we use the software under Stat > Design of Experiments to create our full factorial design. We will come back to this command another time to look at fractional factorial and other types of factorial designs.

In the example that was shown above, we did not randomize the runs but kept them in standard order for the purpose of seeing more clearly the order of the runs. In practice, you would want to randomize the order of run when you are designing the experiment.

Once we have created a factorial design within the Minitab worksheet we then need to add the response data so that the design can be analyzed. These response data, Yield, are the individual observations, not the totals. So, we again go to the Stat > DOE > Factorial Menu where we will analyze the data set from the factorial design.

We began with the full model with all the terms included, both the main effects and all of the interactions. From here we were able to determine which effects were significant and should remain in the model and which effects were not significant and can be removed to form a simpler reduced model.

A Second Example - The Plasma Etch Experiment Section

Similar to the previous example, in this second industrial process example we have three factors, A equals Gap, B = Flow, C = Power and our response y = Etch Rate. (The data are from Table 6-4 in the text.) Once again in Minitab we will create a similar layout for a full factorial design for three factors with two replicates which gives us 16 observations. Next, we add the response data, Etch Rate, to this worksheet and analyze this data set. These are the results we get:

Factorial Fit: EtchRate versus A, B, C

Estimated Effects and Coefficients for EtchRate (coded units)
Term Effect Coef SE Coef T P
Constant   776.06 11.87 65.41 0.000
A -101.62 -50.81 11.87 -4.28 0.003
B 7.37 3.69 11.87 0.31 0.764
C 306.13 153.06 11.87 12.90 0.000
A*B -24.88 -12.44 11.87 -1.05 0.325
A*C -153.63 -76.81 11.87 -6.47 0.000
B*C -2.12 -1.06 11.87 -0.09 0.931
A*B*C 5.62 2.81 11.87 0.24 0.819
S = 47.4612 R-Sq = 96.61% R-Sq(adj) = 93.64%
Analysis of Variance for EtchRate (coded units)
Source DF Seq SS Adj SS Adj MS F P
Main Effects 3 416378 416378 138793 61.62 0.000
2-Way Interactions 3 96896 96896 32299 14.34 0.001
3-Way Interactions 1 127 127 127 0.06 0.819
Residual Error 8 18020 18020 2253    
Pure Error 8 18020 18020 2253    
Total 15 531421  

The analysis of variance shows the individual effects and the coefficients, (which are half of the effects), along with the corresponding t-tests. Now we can see from these results that the A effect and C effect are highly significant. The B effect is not significant. In looking at the interactions, AB, is not significant, BC is not significant, and the ABC are not significant. However the other interaction, AC is significant.

This is a nice example to illustrate the purpose of a screening design. You want to test a number of factors to see which ones are important. So what have we learned here? Two of these factors are clearly important, A and C. But B appears not to be important either as a main effect or within any interaction. It simply looks like random noise. B was the rate of gas flow across the edging process and it does not seem to be an important factor in this process, at least for the levels of the factor used in the experiment.

The analysis of variance summary table results show us that the main effects overall are significant. That is because two of them, A and C, are highly significant. The two-way interactions overall are significant. That is because one of them is significant. So, just looking at this summary information wouldn't tell us what to do except that we could drop the 3-way interaction.

Now we can go back to Minitab and use the Analyze command under Design of Experiments and we can remove all the effects that were seemingly not important such as any term having to do with B in the model. In running this new reduced model we get:

Factorial Fit: EtchRate versus A, C

Estimated Effects and Coefficients for EtchRate (coded units)
Term Effect Coef SE Coef T P
Constant   776.06 10.42 74.46 0.000
A -101.62 -50.81 10.42 -4.88 0.000
C 306.13 153.06 10.42 14.69 0.000
A*C -153.63 -76.81 10.42 -7.37 0.000
S = 41.6911 R-Sq = 96.08% R-Sq(adj) = 95.09%
Analysis of Variance for EtchRate (coded units)
Source DF Seq SS Adj SS Adj MS F P
Main Effects 2 416161 416161 208080 191.71 0.000
2-Way Interactions 1 94403 94403 94403 54.31 0.000
Residual Error 12 20858 20858 1738    
Pure Error 12 20858 20858 1738    
Total 15 531421  

For this model, all three terms are significant.