3.4 - The Optimum Allocation for the Dunnett Test

3.4 - The Optimum Allocation for the Dunnett Test

The Dunnett test for comparing means is a multiple comparison procedure but is precisely designed to test t treatments against a control.

We compared the Dunnett test to the Bonferroni - and there was only a slight difference, reflecting the fact that the Bonferroni procedure is an approximation. This is a situation where we have a = t + 1 groups; a control group and t treatments.

I like to think of an example where we have a standard therapy, (a control group), and we want to test t new treatments to compare them against the existing acceptable therapy. This is a case where we are not so much interested in comparing each of the treatments against each other, but instead, we are interested in finding out whether each of the new treatments is better than the original control treatment.

We have \(Y_ij\) distributed with mean \(\mu_i\), and variance \(\sigma^{2}\), where \(i = 1, \dots , t, \text{ and } j = 1, \dots , n_i\) for the t treatment groups and a control group with mean \(\mu_0\) with variance \(\sigma^2\).

We are assuming equal variance among all treatment groups.

The question that I want to address here is the design question.

The Dunnett procedure is based on t comparisons for testing \(H_0\) that \(\mu_i = \mu_0\), for \(i = 1, \dots , t\). This is really t different tests where t = a - 1.

The \(H_A\) is that the \(\mu_i\) are not equal to \(\mu_0\).

Or viewing this as an estimation problem, we want to estimate the t differences \(\mu_i = \mu_0\).

How Should We Allocate Our Observations?

This is the question we are trying to answer. We have a fixed set of resources and a budget that only allows for only N observations. So, how should we allocate our resources?

Should we assign half to the control group and the rest spread out among the treatments? Or, should we assign an equal number of observations among all treatments and the control? Or what?

We want to answer this question by seeing how we can maximize the power of these tests with the N observations that we have available. We approach this using an estimation approach where we want to estimate the t differences \(\mu_i - \mu_0\). Let's estimate the variance of these differences.

What we want to do is minimize the total variance. Remember that the variance of \((\bar{y}_i-\bar{y}_0)\) is \(\sigma^{2} / n_i + \sigma^{2} / n_0\). The total variance is the sum of these t parts.

We need to find \(n_0\), and \(n_i\) that will minimize this total variance. However, this is subject to a constraint, the constraint being that \(N = n_0 + (t \times n)\), if the \(n_i = n\) for all treatments, an assumption we can reasonably make when all treatments are of equal importance.

Given N observations and a groups, where \(a = t + 1\):

the model is:

\(y_{ij} = \mu_i + \epsilon_{ij}\), where \(i = 0, 1, \dots , t\) and \(j = 1, \dots , n_i\)

sample mean: \(\bar{y}_{i.}=\dfrac{1}{n_i} \sum\limits_j^{n_i} y_{ij}\) and \(Var(\bar{y}_{i.})=\dfrac{\sigma^2}{n_i}\)

Furthermore, \(Var(\bar{y}_{i.}-\bar{y}_0)=\dfrac{\sigma^2}{n_i}+\dfrac{\sigma^2}{n_0}\)

Use \(\hat{\sigma}^2=MSE\) and assume \(n_i= n\) for \(i = 1, \dots , t\).

Then the Total Sample Variance (TSV) = \((TSV)=\sum\limits_{i=1}^t \widehat{var} (\bar{y}_{i.}-\bar{y}_{0.})=t(\dfrac{\sigma^2}{n}+\dfrac{\sigma^2}{n_0})\)

We want to minimize \(t\sigma^2(\frac{1}{n}+\frac{1}{n_0})\) where \(N = tn + n_0\)

This is a LaGrange multiplier problem (calculus): \(\text{min} {TSV + \lambda(N - tn - n_0}\):

Solve:

1) \(\dfrac{\partial(\ast)}{\delta n}=\dfrac{-t\sigma^2}{n^2}-\lambda t=0\)

2) \(\dfrac{\partial(\ast)}{\partial n_0}=\dfrac{-t\sigma^2}{n_0^2}-\lambda =0\)

From 2) \(\lambda=\dfrac{-t\sigma^2}{n_0^2}\) we can then substitute into 1) as follows:

\(\dfrac{-t\sigma^2}{n^2}=\lambda t=\dfrac{-t\sigma^2}{n_0^2}
\Longrightarrow n^2=\dfrac{n_0^2}{t}
\Longrightarrow n=\dfrac{n_0}{\sqrt{t}}
\Longrightarrow n_0=n \sqrt{t}\)

Therefore, from \(N=tn+n_0=tn+\sqrt{t} n=n(t+\sqrt{t})\Longrightarrow n=\dfrac{N}{(t+\sqrt{t})}\)

When this is all worked out we have a nice simple rule to guide our decision about how to allocate our observations:

\(n_{0}=n\sqrt{t}\)

Or, the number of observations in the control group should be the square root of the number of treatments times the number of observations in the treatment groups.

If we want to get the exact n based on our resources, let \(n=N/(t+\sqrt{t})\) and \(n_{0}=\sqrt{t}\times n\) and then round to the nearest integers.

Back to our example...

In our example, we had N = 60 and t = 4. Plugging these values into the equation above gives us \(n = 10\) and \(n_0 = 20\). We should allocate 20 observations in the control and 10 observations in each of the treatments. The purpose is not to compare each of the new drugs to each other but rather to answer whether or not the new drug is better than the control.

These calculations demonstrate once again, that the design principles we use in this course are almost always based on trying to minimize the variance and maximizing the power of the experiment. Here is a case where equal allocation is not optimal because you are not interested equally in all comparisons. You are interested in specific comparisons i.e. treatments versus the control, so the control takes on special importance. In this case, we allocate additional observations to the control group for the purpose of minimizing the total variance.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility