4.7 - Incomplete Block Designs
4.7 - Incomplete Block DesignsIn using incomplete block designs we will use the notation t = # of treatments. We define the block size as k. And, as you will see, in incomplete block designs k will be less than t. You cannot assign all of the treatments in each block. In short,
t = # of treatments,
k = block size,
b = # of blocks,
\(r_i\) = # of replicates for treatment i, in the entire design.
Remember that an equal number of replications is the best way to be sure that you have minimum variance if you're looking at all possible pairwise comparisons. If \(r_i = r\) for all treatments, the total number of observations in the experiment is N where:
\(N = t(r) = b(k)\)
The incidence matrix which defines the design of the experiment, gives the number of observations say \(n_{ij}\) for the \(i^{th}\) treatment in the \(j^{th}\) block. This is what it might look like here:
Here we have treatments 1, 2, up to t and the blocks 1, 2, up to b. For a complete block design, we would have each treatment occurring one time within each block, so all entries in this matrix would be 1's. For an incomplete block design, the incidence matrix would be 0's and 1's simply indicating whether or not that treatment occurs in that block.
Example 1
The example that we will look at is Table 4.22 (4.21 in 7th ed). Here is the incidence matrix for this example:
Here we have t = 4, b = 4, (four rows and four columns) and k = 3 ( so at each block we can only put three of the four treatments leaving one treatment out of each block). So, in this case, the row sums (\(r_i\) ) and the columns sums, k, are all equal to 3.
In general, we are faced with a situation where the number of treatments is specified, and the block size, or number of experimental units per block (k) is given. This is usually a constraint given from the experimental situation. And then, the researcher must decide how many blocks are needed to run and how many replicates that provides in order to achieve the precision or the power that you want for the test.
Example 2
Here is another example of an incidence matrix for allocating treatments and replicates in an incomplete block design. Let's take an example where k = 2, still t = 4, and b = 4. That gives us a case r = 2. In This case we could design our incidence matrix so that it might look like this:
This example has two observations per block so k = 2 in each case and for all treatments r = 2.
Balanced Incomplete Block Design (BIBD)
A BIBD is an incomplete block design where all pairs of treatments occur together within a block an equal number of times ( \(\lambda\) ). In general, we will specify \(\lambda_{ii^\prime}\) as the number of times treatment \(i\) occurs with \(i^\prime\), in a block.
Let's look at previous cases. How many times does treatment one and two occur together in this first example design?
It occurs together in block 2 and then again in block 4 (highlighted in light blue). So, \(\lambda_{12} = 2\). If we look at treatment one and three, this occurs together in block one and in block two therefore \(\lambda_{13} = 2\). In this design, you can look at all possible pairs. Let's look at 1 and 4 - they occur together twice, 2 and 3 occur together twice, 2 and 4 twice, and 3 and 4 occur together twice. For this design \(\lambda_{ii^\prime} = 2\) for all \(ii^\prime\) treatment pairs defining the concept of balance in this incomplete block design.
If the number of times treatments occur together within a block is equal across the design for all pairs of treatments then we call this a balanced incomplete block design (BIBD).
Now look at the incidence matrix for the second example.
We can see that:
\(\lambda_{12}\) occurs together 0 times.
\(\lambda_{13}\) occurs together 2 times.
\(\lambda_{14}\) occurs together 0 times.
\(\lambda_{23}\) occurs together 0 times.
\(\lambda_{24}\) occurs together 2 times.
\(\lambda_{34}\) occurs together to 0 times.
Here we have two pairs occurring together 2 times and the other four pairs occurring together 0 times. Therefore, this is not a balanced incomplete block design (BIBD).
What else is there about BIBD?
We can define \(\lambda\) in terms of our design parameters when we have equal block size k, and equal replication \(r_i = r\). For a given set of t, k, and r we define \(\lambda\) as:
\(\lambda = r(k-1) / t-1\)
So, for the first example that we looked at earlier - let's plug in the values and calculate \(\lambda\):
\(\lambda = 3 (3 - 1) / (4 -1) = 2\)
Here is the key: when \(\lambda\) is equal to an integer number it tells us that a balanced incomplete block design exists. Let's look at the second example and use the formula and plug in the values for this second example. So, for \(t = 4\), \(k = 2\), \(r = 2\) and \(b = 4\), we have:
\(\lambda = 2 (2 - 1) / (4 - 1) = 0.666\)
Since \(\lambda\) is not an integer there does not exist a balanced incomplete block design for this experiment. We would either need more replicates or a larger block size. Seeing as how the block size in this case is fixed, we can achieve a balanced complete block design by adding more replicates so that \(\lambda\) equals at least 1. It needs to be a whole number in order for the design to be balanced.
We will talk about partially balanced designs later. But in thinking about this case we note that a balanced design doesn't exist so what would be the best partially balanced design? That would be a question that you would ask if you could only afford four blocks and the block size is two. Given this situation, is the design in Example 2 the best design we can construct? The best partially balanced design is where \(\lambda_{ii^\prime}\) should be the nearest integers to the \(\lambda\) that we calculated. In our case each \(\lambda_{ii^\prime}\) should be either 0 or 1, the integers nearest 0.667. This example is not as close to balanced as it could be. In fact, it is not even a connected design where you can go from any treatment to any other treatment within a block. More about this later...
How do you construct a BIBD?
In some situations, it is easy to construct the best IBD, however, for other cases it can be quite difficult and we will look them up in a reference.
Let's say that we want six blocks, we still want 4 treatments and our block size is still 2. Calculate \(\lambda = r(k - 1) / (t - 1) = 1\). We want to create all possible pairs of treatments because lambda is equal to one. We do this by looking at all possible combinations of four treatments taking two at a time. We could set up the incidence matrix for the design or we could represent it like this - entries in the table are treatment labels: {1, 2, 3, 4}.
However, this method of constructing a BIBD using all possible combinations, does not always work as we now demonstrate. If the number of combinations is too large then you need to find a subset - - not always easy to do. However, sometimes you can use Latin Squares to construct a BIBD. As an example, let's take any 3 columns from a 4 × 4 Latin Square design. This subset of columns from the whole Latin Square creates a BIBD. However, not every subset of a Latin Square is a BIBD.
Let's look at an example. In this example we have t = 7, b = 7, and k = 3. This means that r = 3 = (bk) / t . Here is the 7 × 7 Latin square :
We want to select (k = 3) three columns out of this design where each treatment occurs once with every other treatment because \(\lambda = 3(3 - 1) / (7 - 1) = 1\).
We could select the first three columns - let's see if this will work. Click the animation below to see whether using the first three columns would give us combinations of treatments where treatment pairs are not repeated.
Since the first three columns contain some pairs more than once, let's try columns 1, 2, and now we need a third...how about the fourth column. If you look at all possible combinations in each row, each treatment pair occurs only one time.
What if we could afford a block size of 4 instead of 3? Here t = 7, b = 7, k = 4, then r = 4. We calculate \(\lambda = r(k - 1) / (t - 1) = 2\) so a BIBD does exist. For this design with a block size of 4 we can select 4 columns (or rows) from a Latin square. Let's look at columns again... can you select the correct 4?
Now consider the case with 8 treatments. The number of possible combinations of 8 treatments taking 4 at a time is 70. Thus with 70 sets of 4 from which you have to choose 14 blocks - - wow, this is a big job! At this point, we should simply look at an appropriate reference. Here is a handout - a catalog that will help you with this selection process - taken from Cochran & Cox, Experimental Design, p. 469-482.
Analysis of BIBD's
When we have missing data, it affects the average of the remaining treatments in a row, i.e., when complete data does not exist for each row - this affects the means. When we have complete data the block effect and the column effects both drop out of the analysis since they are orthogonal. With missing data or IBDs that are not orthogonal, even BIBD where orthogonality does not exist, the analysis requires us to use GLM which codes the data like we did previously. The GLM fits first the block and then the treatment.
The sequential sums of squares (Seq SS) for block is not the same as the Adj SS.
We have the following:
Seq SS
\(SS(\beta | \mu) 55.0\)
\(SS(\tau | \mu, \beta) = 22.50\)
Adj SS
\(SS(\beta | \mu, \tau) = 66.08\)
\(SS(\tau | \mu, \beta) = 22.75\)
Switch them around...now first fit treatments and then the blocks.
Seq SS
\(SS(\tau | \mu) = 11.67\)
\(SS(\beta | \mu, \tau) = 66.08\)
Adj SS
\(SS(\tau | \mu, \beta) = 22.75\)
\(SS(\beta | \mu, \tau_i) = 66.08\)
The 'least squares means' come from the fitted model. Regardless of the pattern of missing data or the design we can conceptually think of our design represented by the model:
\(Y_{ij}= \mu + +\beta _{i}+\tau _{j}+e_{ij}\)
\(i = 1, \dots , b\), \(j = 1, \dots , t\)
You can obtain the 'least squares means' from the estimated parameters from the least squares fit of the model.
Optional Section
See the discussion in the text for Recovery of Interblock Information, p. 154. This refers to a procedure which allows us to extract additional information from a BIBD when the blocks are a random effect. Optionally you can read this section. We illustrate the analysis by the use of the software, PROC Mixed in SAS (L03_sas_Ex_4_5.sas):
data; input blk trt Y; cards;
1 1 73
1 3 73
1 4 75
2 1 74
2 2 75
2 3 75
3 2 67
3 3 68
3 4 72
4 1 71
4 2 72
4 4 75
;;;;
/*This data is from Example 4-5 in Montgomery, Design and Analysis of experiments, 6th edition, */
/* Wiley, 2005, pages 147-154. This demonstrates the recovery of interblock information when */
/* the blocks are considered random. */
proc glm; class trt blk;
model Y = blk trt;
lsmeans trt/ e stderr pdiff;
proc mixed; class trt blk;
model Y = trt;
random blk;
lsmeans trt/ e pdiff;
/* The next 4 estimate statements calculate the treatment effects from the solution*/
estimate "trt effect 1" trt +.75 -.25 -.25 -.25/e;
estimate "trt effect 2" trt -.25 +.75 -.25 -.25/e;
estimate "trt effect 3" trt -.25 -.25 +.75 -.25/e;
estimate "trt effect 4" trt -.25 -.25 -.25 +.75/e;
/* The next 3 contrast statements show one set of orthogonal contrasts*/
contrast "trt1 vs trt2-4" trt 3 -1 -1 -1;
contrast "trt2 vs trt3-4" trt 0 2 -1 -1 ;
contrast "trt3 vs trt4" trt 0 0 1 -1 ;
run;
The SAS System 12:49 Friday, August 15, 2008 1
The GLM Procedure
Class Level Information
Class Levels Values
trt 4 1 2 3 4
blk 4 1 2 3 4
Number of Observations Read 12
Number of Observations Used 12
The SAS System 12:49 Friday, August 15, 2008 2
The GLM Procedure
Dependent Variable: Y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 6 77.75000000 12.95833333 19.94 0.0024
Error 5 3.25000000 0.65000000
Corrected Total 11 81.00000000
R-Square Coeff Var Root MSE Y Mean
0.959877 1.112036 0.806226 72.50000
Source DF Type I SS Mean Square F Value Pr > F
blk 3 55.00000000 18.33333333 28.21 0.0015
trt 3 22.75000000 7.58333333 11.67 0.0107
Source DF Type III SS Mean Square F Value Pr > F
blk 3 66.08333333 22.02777778 33.89 0.0010
trt 3 22.75000000 7.58333333 11.67 0.0107
The SAS System 12:49 Friday, August 15, 2008 3
The GLM Procedure
Least Squares Means
Coefficients for trt Least Square Means
trt Level
Effect 1 2 3 4
Intercept 1 1 1 1
blk 1 0.25 0.25 0.25 0.25
blk 2 0.25 0.25 0.25 0.25
blk 3 0.25 0.25 0.25 0.25
blk 4 0.25 0.25 0.25 0.25
trt 1 1 0 0 0
trt 2 0 1 0 0
trt 3 0 0 1 0
trt 4 0 0 0 1
Standard LSMEAN
trt Y LSMEAN Error Pr > |t| Number
1 71.3750000 0.4868051 <.0001 1
2 71.6250000 0.4868051 <.0001 2
3 72.0000000 0.4868051 <.0001 3
4 75.0000000 0.4868051 <.0001 4
Least Squares Means for effect trt
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: Y
i/j 1 2 3 4
1 0.7349 0.4117 0.0035
2 0.7349 0.6142 0.0047
3 0.4117 0.6142 0.0077
4 0.0035 0.0047 0.0077
NOTE: To ensure overall protection level, only probabilities associated with pre-planned
comparisons should be used.
The SAS System 12:49 Friday, August 15, 2008 4
The Mixed Procedure
Model Information
Data Set WORK.DATA1
Dependent Variable Y
Covariance Structure Variance Components
Estimation Method REML
Residual Variance Method Profile
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Containment
Class Level Information
Class Levels Values
trt 4 1 2 3 4
blk 4 1 2 3 4
Dimensions
Covariance Parameters 2
Columns in X 5
Columns in Z 4
Subjects 1
Max Obs Per Subject 12
Number of Observations
Number of Observations Read 12
Number of Observations Used 12
Number of Observations Not Used 0
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 44.37333968
1 1 34.22046396 0.00000000
Convergence criteria met.
The SAS System 12:49 Friday, August 15, 2008 5
The Mixed Procedure
Covariance Parameter
Estimates
Cov Parm Estimate
blk 8.0167
Residual 0.6500
Fit Statistics
-2 Res Log Likelihood 34.2
AIC (smaller is better) 38.2
AICC (smaller is better) 40.6
BIC (smaller is better) 37.0
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
trt 3 5 11.41 0.0113
Coefficients for
trt effect 1
Effect trt Row1
Intercept
trt 1 0.75
trt 2 -0.25
trt 3 -0.25
trt 4 -0.25
Coefficients for
trt effect 2
Effect trt Row1
Intercept
trt 1 -0.25
trt 2 0.75
trt 3 -0.25
trt 4 -0.25
The SAS System 12:49 Friday, August 15, 2008 6
The Mixed Procedure
Coefficients for
trt effect 3
Effect trt Row1
Intercept
trt 1 -0.25
trt 2 -0.25
trt 3 0.75
trt 4 -0.25
Coefficients for
trt effect 4
Effect trt Row1
Intercept
trt 1 -0.25
trt 2 -0.25
trt 3 -0.25
trt 4 0.75
Estimates
Standard
Label Estimate Error DF t Value Pr > |t|
trt effect 1 -1.0869 0.4269 5 -2.55 0.0515
trt effect 2 -0.8836 0.4269 5 -2.07 0.0932
trt effect 3 -0.5000 0.4269 5 -1.17 0.2942
trt effect 4 2.4705 0.4269 5 5.79 0.0022
Contrasts
Num Den
Label DF DF F Value Pr > F
trt1 vs trt2-4 1 5 6.48 0.0515
trt2 vs trt3-4 1 5 9.58 0.0270
trt3 vs trt4 1 5 18.16 0.0080
The SAS System 12:49 Friday, August 15, 2008 7
The Mixed Procedure
Coefficients for trt Least Squares Means
Effect trt Row1 Row2 Row3 Row4
Intercept 1 1 1 1
trt 1 1
trt 2 1
trt 3 1
trt 4 1
Least Squares Means
Standard
Effect trt Estimate Error DF t Value Pr > |t|
trt 1 71.4131 1.4968 5 47.71 <.0001
trt 2 71.6164 1.4968 5 47.84 <.0001
trt 3 72.0000 1.4968 5 48.10 <.0001
trt 4 74.9705 1.4968 5 50.09 <.0001
Differences of Least Squares Means
Standard
Effect trt _trt Estimate Error DF t Value Pr > |t|
trt 1 2 -0.2033 0.6971 5 -0.29 0.7823
trt 1 3 -0.5869 0.6971 5 -0.84 0.4382
trt 1 4 -3.5574 0.6971 5 -5.10 0.0038
trt 2 3 -0.3836 0.6971 5 -0.55 0.6058
trt 2 4 -3.3541 0.6971 5 -4.81 0.0048
trt 3 4 -2.9705 0.6971 5 -4.26 0.0080
Note that the least squares means for treatments when using PROC Mixed, correspond to the combined intra- and inter-block estimates of the treatment effects.
Random Effect Factor
So far we have discussed experimental designs with fixed factors, that is, the levels of the factors are fixed and constrained to some specific values. However, this is often not the case. In some cases, the levels of the factors are selected at random from a larger population. In this case, the inference made on the significance of the factor can be extended to the whole population but the factor effects are treated as contributions to variance.
Minitab’s General Linear Command handles random factors appropriately as long as you are careful to select which factors are fixed and which are random.