Suppose that a comparative treatment efficacy (CTE) trial consists of comparing two independent treatment groups with respect to the means of the primary clinical endpoint. Let \(\mu_1\) and \(\mu_2\) denote the unknown population means of the two groups, and let \(\sigma\) denote the known standard deviation common to both groups. Also, let \(n_1\) and \(n_2\) denote the sample sizes of the two groups.

The treatment difference in means is \(\Delta = \mu_1 -\mu_2\) and the null hypothesis is \(H_0\colon \Delta = 0\). The test statistic is

\( Z = \left( \bar{Y}_1 - \bar{Y}_2 \right) / \sigma \sqrt{\frac{1}{n_1}+\frac{1}{n_2}} \)

which follows a standard normal distribution when the null hypothesis is true. If the alternative hypothesis is two-sided, i.e., \(H_1 \colon \Delta \ne 0\), then the null hypothesis is rejected for large values of |Z|.

Under a particular alternative where there might be some difference \(\Delta, \Delta = \mu_1 - \mu_2\),

\( Z = \left( \bar{Y}_1 - \bar{Y}_2 - \Delta \right)/ \sigma \sqrt{\frac{1}{n_1}+\frac{1}{n_2}} \)

Suppose we let \(AR = \dfrac{n_1}{n_2}\) denote the allocation ratio \(\left(AR\right)\), (in most cases we will assign \(AR = 1\) to get equal sample sizes). If we wish to a have large enough sample size to detect an effect size Δ with a two-sided, α-significance level test with \(100 \left(1 - \beta \right)\%\) statistical power, then

\( n_2 = \left( \frac{AR+1}{AR}\right) \left( z_{1-\alpha/2}+z_{1-\beta} \right)^2\sigma^2/\Delta^2 \)

and \(n_1 = AR \times n_2\).

Note this formula matches the sample size formula in our FFDRG text on p. 180, assuming equal allocation to the two treatment groups and multiplying the result here by 2 to get 2N, which FFDRG uses to denote the total sample size.

If the alternative hypothesis is one-sided, then \(Z_{1 - α}\) replaces \(Z_{1 - \frac{\alpha}{2}}\) in either formula.

Notice that the sample size expression contains \(\left(\dfrac{\sigma}{\Delta}\right)^2\), the square of the effect size expressed in standard deviation units. Thus, *sample size is a quadratic function of the effect size and precision*. As the variance gets larger, it has a quadratic effect on the sample size. For example, reducing the effect size by one-half quadruples the required sample size.

Although this sample size formula assumes that the standard deviation is known so that a z test can be applied, it works relatively well when the standard deviation must be estimated and a t-test applied. A preliminary guess of σ must be available, however, either from a small pilot study or a report in the literature. For smaller sample sizes \(\left(n_1 ≤ 30, n_2 ≤ 30 \right)\) percentiles from a t distribution can be substituted, although this results in both sides of the formula involving \(n_2\) so that it must be solved iteratively:

\( n_2 = \left( \dfrac{AR+1}{AR}\right) \left( t_{n_1+n_2-2,1-\alpha/2}+t_{n_1+n_2-2,1-\beta} \right)^2\sigma^2/\Delta^2 \)