6a.10 - Adjustment Factors for Sample Size Calculations

When calculating a sample size, we may need to adjust our calculations due to multiple primary comparisons or for nonadherence to therapy or to consider the anticipated dropout rate.

  • If there is more than one primary outcome variable (for example, co-primary outcomes) or more than one primary comparison (for example, 3 treatment groups), then the significance level should be adjusted to account for the multiple comparisons in order not to inflate the overall false-positive rate.

For example, suppose a clinical trial will involve two treatment groups and a placebo group. The investigator may decide that there are two primary comparisons of interest, namely, each treatment group compared to placebo. The simplest adjustment to the significance level for each test is the Bonferroni correction, which uses \(\dfrac{\alpha}{2}\) instead of \(\alpha\).

In general, if there are K comparisons of primary interest, then the Bonferroni correction is to use a significance level of \(\dfrac{\alpha}{K}\) for each of the K comparisons. The Bonferroni correction is not the most powerful or most sophisticated multiple comparison adjustment, but it is a conservative approach and easy to apply.

In the case of multiple primary endpoints, an adjustment to the significance level may not be necessary, depending on how the investigator plans to interpret the results. For example, suppose there are two primary outcome variables. If the investigator plans to claim “success of the trial” if either endpoint yields a statistically significant treatment effect, then an adjustment to the significance level is warranted. If the investigator plans to claim “success of the trial” only if both endpoints yield statistically significant treatment effects, then an adjustment to the significance level is not necessary. Thus, an adjustment to the significance level in the presence of multiple primary endpoints depends on whether it is an “or” or an “and” situation.

(A composite outcome, such as "time to stroke, MI or major cardiovascular event", is different from a co-primary outcome. In this situation, when the composite results in one statistical analysis, there is no need for adjustment.)

  • Another consideration is nonadhereance to the protocol (noncompliance). All participants randomized to therapy are expected to be included in the primary statistical analysis, an intention-to-treat analysis. Intention-to-treat analysis will compare the treatments using all data from subjects in the group to which they were originally assigned, regardless of whether or not they followed the protocol, stayed on therapy,etc. Some participants will choose to withdraw from a trial before it is complete. Every effort will be made to continue obtaining data from all randomized subjects; for those who withdraw from the study completely and do not provide data, an imputation procedure may be required to represent their missing data in subsequent data analyses. Some participants assigned to active therapy discontinue therapy but continue to provide data (therapeutic drop-outs). Some on a placebo or control add an active therapy (drop-ins) and continue to be observed. The nonadherence (noncompliance) can lead to a dilution of the treatment effect and lead to lower power for the study as well as biased estimates of treatment effects.

Thus, a further adjustment to the sample size estimate may be made based on the anticipated drop-out and drop-in rates in each arm (See Wittes (2002). A similar formula is on p. 179 FFDRG.

\(N^∗= \dfrac{N}{((1- R_O - R_I))^2}\) where N is the sample size without regard to nonadherence and N* is the adjusted number for that treatment arm.

\(R_O\) and \(R_I\) represent the proportion of participants anticipated to discontinue test therapy and the proportion in the control who will add or change to a more effective therapy, respectively.

  • In other situations, an adjustment may be made to increase the sample size to account for the anticipated number of subjects who will drop-out of the study altogether so that there is sufficient power with the remaining observations to detect a certain difference. This adjustment is made by dividing the calculated sample size N by (1-W) where W is the proportion expected to withdraw.


Let's work an example.


Suppose a study has two treatment groups and will compare test therapy to placebo. With only one primary comparison, we do not need to adjust the significance level for multiple comparisons. Suppose that the sample size for a certain power, significance level and clinically important difference works to be 200 participants/group or 400 total.

  1. We plan an intention-to-treat analysis as our primary analysis and our concern is dilution of the true treatment effect due to these deviations from the assigned therapy. To adjust for noncompliance/nonadherence, we must estimate the proportion from the placebo group who will begin an active therapy before the study is complete. Let's estimate these 'drop-ins' to be 0.20. In the test therapy group, we estimate 0.10 will discontinue active therapy.

    To adjust for noncompliance, we calculate \(N^{*} = 200/((1-0.2-0.1)^{2})\) . N^{*} = 409/\text{ group}\) or 818 total. What an increase in sample size to maintain the power! (note whether I use n/group, 200/(0.49) or total n, 400/(0.49) I will get the same sample sizes. Just remember what your N represents. If there is any fraction at the end of sample size calculations, round UP to the next number divisible by the number of treatment groups.)

  2. Suppose instead of dilution of treatment effect in an ITT analysis, our concern is the proportion of subjects who will leave the study without providing a key observation on the primary outcome. If we anticipate a 15% rate of discontinuing before this measurement, we may want to increase the sample size accordingly. 200/0.85= 236/group (rounded up).

These are relatively simple calculations to introduce the idea of adjusting for noncompliance, multiple comparisons or the withdrawal rate. More complicated processes can be modeled.

Finally, when estimating a sample size for a study, an iterative process may be followed (adapted from Wittes, 2002)

  1. Determine the null and alternative hypotheses as related to the primary outcome.
  2. What is the desired type I error rate and power? If more than one primary outcome or comparison, make required adjustments to Type 1 error.
  3. Determine the population that will be studied. What information is there about the variability of the primary outcome in this population? Would would constitute a clinically important difference?
  4. If the study is measuring time to failure, how long is the followup period? What assumptions should be made about recruitment?
  5. Consider ranges of rates or events, loss to follow-up, competing risks, and noncompliance.
  6. Calculate sample size over a range of reasonable assumptions.
  7. Select a sample size. Plot power curves as the parameters range over reasonable values.
  8. Iterate as needed.

Which of these adjustments (or others, such as modeling dropout rates that are not independent of outcome) is important for a particular study depends on the study objectives. Not only must we consider whether there is more than one primary outcome or multiple primary comparisions, we must also consider the nature of the trial. For example, if the study results are headed to a regulatory agency, using a primary intention-to-treat analysis, it is important to demonstrate an effect of a certain magnitude. Adjusting the sample size to account for non-adherence is sensible. On the other hand, in a comparative effectiveness study, the objective may be to estimate the difference in effect when the intervention is prescribed vs the control, regardless of adherence. In this situation, the dilution of effect due to nonadherence may be of little concern.

As we noted beginning this lesson, sample size calculations are estimates! When stating a required sample size, always state any assumptions that have been made in the calculations.