Lesson 6: Sample Size and Power - Part b

Overview

This week we continue exploring the issues of sample size and power, this time with regard to the differing purposes of clinical trials. Often the objective of the trial is to establish that a therapy is efficacious, but what is the proper control group? Can superiority to placebo be clearly established when there are other effective therapies on the market? These questions lead to special considerations based on whether the trial has an objective of establishing superiority, equivalence, or non-inferiority. So, let’s move ahead…

Objectives

Upon completion of this lesson, you should be able to:

Distinguish between superiority, non-inferiority and equivalence trials in terms of
- objectives
- control group
- hypotheses tested and
- formation of confidence intervals.
Recognize the characteristics of a clinical trial with high external validity
Define which data are included in an intention-to-treat analysis
Recognize the major considerations for designing an equivalence or non-inferiority trial.
Perform sample size calculations for some equivalence and non-inferiority trials, using SAS programs.

References:

Berger RL, Hsu JC. Bioequivalence trials, intersection-union tests, and equivalence confidence sets. Statistical Science 1996, 11: 283-319).

Piantadosi Steven. (2005) Sample size and power. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hoboken, NJ: John Wiley and Sons, Inc.

6b.1 - Control Groups

Placebo-Control

Placebo-controlled trials typically provide an unambiguous statement of the research hypothesis: either we want to show that the experimental treatment is superior to placebo (one-sided alternative) or as is more often the case, we want to show that the experimental treatment is different than placebo (two-sided alternative). For this reason, we frequently refer to a placebo-controlled trial as a confirmatory trial, or in the most recent language, it is called a superiority trial (even if we are using a two-sided alternative).

Active Control

Active control groups are often used because placebo control groups are unethical, such as when:

the disease is life-threatening or debilitating, and/or
an effective therapy already exists and is considered standard-of-care.

Investigators can use an active control group in a superiority trial, an equivalence trial, or a non-inferiority trial. The new treatment may be preferred due to less cost, fewer side effects, or less impact on the quality of life. Or the new treatment may have superior efficacy.

Equivalence trials and non-inferiority trials have different objectives than superiority trials. The objective of an equivalence trial is to demonstrate that a therapy is “equivalent” to the active control (it is not inferior to and not superior to the active control). Equivalent might not be the best word choice for this type of trial as we will see later. The objective of a non-inferiority trial is to demonstrate that a therapy is not inferior to the active control, i.e., it is not worse than the treatment.

There are a number of issues related to the design and analysis of equivalence and non-inferiority trials that are not well understood by clinical investigators. We will examine these issues using examples of such trials in this lesson.

6b.2 - Combination Therapy Trials

Combination therapy trials are an example where the appropriateness of a placebo control must be carefully considered.

Suppose that for a particular disease or condition, there exists a standard therapy that is accepted as the best available treatment (standard-of-care). The standard-of-care could be a drug, a medical device, a surgical procedure, diet, exercise, etc., or some combination of these various regimens.

In the MIRACLE trial, the standard-of-care for the eligible patients with heart failure during the course of the trial consisted of some combination of the following medications:

Diuretic
Angiotensin-converting enzyme (ACE) inhibitor or angiotensin-receptor blocker
Digitalis
Beta-blocker

Suppose that the experimental therapy is of a different modality or different mechanism of action than the standard-of-care. If so, then it may be possible to use the experimental therapy in combination with the standard-of-care and we can consider designing a two-armed trial that compares:

standard-of-care + experimental therapy

versus

standard-of-care + placebo therapy

This situation is comparable to a superiority trial because the research objective is to demonstrate superiority of the combination therapy to the standard-of-care.

In the MIRACLE trial, for example, the comparison consisted of:

standard-of-care + pacemaker

versus

standard-of-care + inactive pacemaker

In other situations, a superiority trial is not feasible. If the experimental therapy is a similar modality as one of the components of standard-of-care, then it may not be appropriate to combine the experimental therapy with the standard-of-care.

There are two other possibilities to consider for designing the clinical trial, namely, an equivalence trial and a non-inferiority trial.

6b.3 - Equivalence Trials

For an equivalence trial, it is necessary to determine a “zone of clinical equivalence” prior to the trial onset.

For example, consider standard (active control) and experimental antihypertensive drugs (a drug that controls blood pressure). Suppose that the standard drug yields a mean reduction of 5 mm Hg in diastolic blood pressure for a certain patient population. The investigator may decide that the experimental drug is clinically equivalent to the standard drug if its mean reduction in diastolic blood pressure is 3-7 mm Hg. This is based on clinical judgment and there may be differences of opinion on this 'arbitrary' level of equivalence.

Thus, the difference in means between the two therapies does not exceed 2 mm Hg. Let's suppose that we are willing to accept this level.

In general, the zone of equivalence is defined by \(±\Psi\). The difference in population means between the experimental therapy and the active control, \(\mu_E - \mu_A\), should lie within \((-\Psi, +\Psi)\). Differences in response less than \(\Psi\) are considered 'equally effective' or 'noninferior'.

In nearly every equivalence trial, the selection of \(\Psi\) is arbitrary and can be controversial. Some researchers recommend that Ψ be selected as less than one-half of the magnitude of the effect observed from the superiority trials comparing the active control to placebo.

Given what we know in the the antihypertensive example above, \(\Psi = 2\) satisfies this requirement \((\dfrac{2}{5} = 0.4 < 0.5)\), but why not select \(\Psi = 1\)?

Here is a second issue to consider...

Unlike a placebo-controlled trial, an equivalence trial does not provide a natural check for internal validity because equivalence of the experimental and active control therapies does not necessarily imply that either of them is effective. In other words, if a third treatment arm of placebo had been included in the trial, it is possible that neither the experimental therapy nor the active control therapy would demonstrate superiority over placebo. There is no direct establishment of superiority inherent in the way the trial is set up.

The investigator needs to select an active control therapy for the equivalence trial that has been proven to be superior to the placebo. An important assumption is that the active control would be superior to placebo (had placebo been a treatment arm in the current trial).

In the past, a few equivalence trials incorporated appropriate active controls, but at doses less than recommended (rendering them ineffective). It is important to select the proper control and use it at an appropriate dose level.

One way to ascertain internal validity is through an external validity check, e.g., compare the experimental and active control therapies of the current study to published reports for comparative trials that involve the active control therapy versus a placebo control. Are similar results observed for the active therapy in the equivalence trial as in the published study against a placebo?

External comparisons should examine response levels, patient compliance, withdrawal rates, use of rescue medications, etc. An external validity check is only possible if the chosen active control therapy for the equivalence trial was determined effective in a superiority trial. An under-dosed or over-dosed regimen for the active control therapy in an equivalence trial can bias the results and interpretations. In addition, the design for the equivalence trial should mimic (within reason) the design for the superiority trial. Some of this advice is difficult to follow and may be impossible to implement.

(Another aspect of internal validity, of course, is the quality of the trial, in terms of inclusion/exclusion criteria, dosing regimens, quality control, etc. Do not run a sloppy study!)

The U.S. Food and Drug Administration (FDA) and the National Institutes of Health (NIH) typically require intent-to-treat (ITT) analyses in placebo-controlled trials. In an ITT analysis, data on all randomized patients are included in the analysis, regardless of protocol violations, lack of adherence, withdrawal, incorrectly taking the other treatment, etc. The ITT analysis reflects what will happen in the real world, outside the realm of a controlled clinical trial.

Is this appropriate? In a superiority trial, the ITT analysis usually is conservative because it tends to diffuse the difference between the treatment arms. There is more 'noise' in an ITT study. This is due to the increased variability from protocol violations, lack of adherence, withdrawal, etc. You can overcome this noise by increasing the sample size.

In an equivalence trial, the ITT analysis still is appropriate. There is a misconception that the ITT analysis will have the opposite effect in an equivalence trial, i.e., it will be easier to demonstrate equivalence. This is not so. Even with an ITT analysis in an equivalence trial, it still is important to conduct a well-designed study with a sufficient sample size and good quality control.

An alternative to an intent-to-treat analysis is a protocol analysis, whereby subjects are analyzed according to the treatment received. A protocol analysis excludes subjects who did not satisfy the inclusion criteria, did not comply with taking study medications, violated the protocol, etc. You are excluding data from the patients that do not follow the protocol when it comes to the analysis. A protocol analysis is expected to enhance differences between treatments, so it usually will be conservative for an equivalence trial. Obviously, a protocol analysis is susceptible to many biases and must be performed very carefully. You may think that you are removing all of the biases, when in fact you may not be. A protocol analysis could be considered as supplemental to the ITT analysis. The U.S. FDA moved to ITT studies years ago to avoid biases introduced when researchers selectively excluded patients from analysis because of various protocol deviations. Many of the major medical journals also will only accept ITT studies for these reasons as well.

6b.4 - Non-Inferiority Trials

A non-inferiority trial is similar to an equivalence trial. The research question in a non-inferiority trial is whether the experimental therapy is not inferior to the active control (whereas the experimental therapy in an equivalence trial should not be inferior to, nor superior to, the active control). Thus, a non-inferiority trial is one-sided, whereas an equivalence trial is two-sided. (For non-inferiority, we want experimental therapy to be not inferior to the active control.)

Assume that the larger response is the better response. The one-sided zone of non-inferiority is defined by \(-\Psi\), i.e., the difference in population means between the experimental therapy and the active control, \(\mu_E - \mu_A\), should lie within \(\left(-\Psi, + ∞\right)\).

Many of the same issues that are critical for designing an equivalence trial also are critical for designing a non-inferiority trial, namely, appropriate selection of an active control and appropriate selection of the “zone of clinical non-inferiority” defined by \(\Psi\).

Hypertensive Example

Consider the previous example with the standard and experimental antihypertensive therapies.

The researchers may decide that the experimental drug is clinically not inferior to the standard drug if its mean reduction in diastolic blood pressure is at least 3 mm Hg \(\left(\Psi = 2\right)\). Thus, the difference in population means between the experimental therapy and the active control therapy, \(\mu_E - \mu_A\), should lie within \(\left(-\Psi, + ∞\right)\). It does not matter if the experimental drug is much better than active control drug, provided that it is not inferior to the active control drug.

Because a non-inferiority trial design allows for the possibility that the experimental therapy is superior to the active control therapy, the non-inferiority design is preferred over the equivalence design. The equivalence design is useful when evaluating generic drugs.

6b.5 - Statistical Inference - Hypothesis Testing

Statisticians construct the null hypothesis and the alternative hypothesis for statistical hypothesis testing such that the research hypothesis is the alternative hypothesis:

\(H_0: \left\{ \text{non-equivalence}\right\} \text{ vs. } H_1: \left\{ \text{equivalence}\right\} \)

\(H_0: \left\{ \text{inferiority}\right\} \text{ vs. } H_1: \left\{ \text{non-inferiority}\right\} \)

In terms of the population means, the hypotheses for testing equivalence are expressed as:

\(H_0: \left\{ \mu_{E} - \mu_{A} \le -\Psi \text{ or } \mu_{E} - \mu_{A} \ge \Psi \right\}\)

vs.

\(H_1: \left\{-\Psi < \mu_{E} - \mu_{A}< \Psi \right\}\)

also expressed as

\(H_0: \left\{|\mu_{E} - \mu_{A}| \ge \Psi \right\} \text{ vs. } H_1: \left\{|\mu_{E} - \mu_{A}| < \Psi \right\}\)

In terms of the population means, the hypotheses for testing non-inferiority are expressed as

\(H_0: \left\{\mu_{E} - \mu_{A} \le -\Psi \right\} \text{ vs. } H_1: \left\{\mu_{E} - \mu_{A} > -\Psi \right\}\)

The null and alternative hypotheses for an equivalence trial can be decomposed into two distinct hypothesis testing problems, one for non-inferiority:

\(H_{01}: \left\{\mu_{E} - \mu_{A} \le -\Psi \right\} \text{ vs. } H_{11}: \left\{\mu_{E} - \mu_{A} > -\Psi \right\}\)

and one for non-superiority

\(H_{02}: \left\{\mu_{E} - \mu_{A} \ge \Psi \right\} \text{ vs. } H_{12}: \left\{\mu_{E} - \mu_{A} <\Psi \right\}\)

The null hypothesis of non-equivalence is rejected if and only if the null hypothesis of non-inferiority \(\left(H_{01}\right)\) is rejected AND the null hypothesis of non-superiority \(\left(H_{02}\right)\) is rejected.

This rationale leads to what is called two one-sided testing (TOST). If the data are approximately normally distributed, then two-sample t tests can be applied. If normality is suspect, then Wilcoxon rank-sum tests can be applied.

With respect to two-sample t tests, reject the null hypothesis of inferiority if:

\( t_{inf}= \left(\bar{Y}_E - \bar{Y}_A + \Psi \right) / s \sqrt{\frac {1}{n_E}+\frac {1}{n_Z}}>-t_{n_{E}+n_{A}-2, 1-\alpha}\)

and reject the null hypothesis of superiority if:

\( t_{sup}= \left(\bar{Y}_E - \bar{Y}_A + \Psi \right) / s \sqrt{\frac {1}{n_E}+\frac {1}{n_A}}< t_{n_{E}+n_{A}-2, 1-\alpha}\)

where s is the pooled sample estimate of the standard deviation, calculated as the square-root of the pooled sample estimate of the variance:

\( s^2 = \left(\sum_{i=1}^{n_E}\left(Y_{Ei}-\bar{Y}_E\right)^2+\sum_{j=1}^{n_A}\left(Y_{Aj}-\bar{Y}_A\right)^2\right) / \left(n_E + n_A -2\right) \)

NOTE! Each one-sided t test is conducted at the \(\alpha\) significance level.

6b.6 - Statistical Inference - Confidence Intervals

Confidence intervals can be used in place of the statistical tests. Reporting of confidence intervals is more informative because it indicates the magnitude of the treatment difference and how close it approaches the equivalence zone.

The \(100(1 - \alpha)\%\) confidence interval that corresponds to testing the null hypothesis of non-equivalence versus the alternative hypothesis of equivalence at the \(\alpha\) significance level has the following limits

\( \text{lower limit } = \text{min } \left [ 0, \left( \bar{Y}_{E} - \bar{Y}_{A} \right) - s \sqrt{\frac{1}{n_E}+\frac{1}{n_A}} t_{n_{E}+n_{A}-2, 1- \alpha} \right ] \)

\( \text{upper limit } = \text{max } \left [ 0, \left( \bar{Y}_{E} - \bar{Y}_{A} \right) + s \sqrt{\frac{1}{n_E}+\frac{1}{n_A}} t_{n_{E}+n_{A}-2, 1- \alpha} \right ] \)

This confidence interval does provide \(100(1 - \alpha)\%\) coverage - (see Berger RL, Hsu JC. Bioequivalence trials, intersection-union tests, and equivalence confidence sets. Statistical Science 1996, 11: 283-319).

Some researchers mistakenly believe that a \(100(1 - 2\alpha)\%\) confidence interval is consistent with testing the null hypothesis of non-equivalence versus the alternative hypothesis of equivalence at the \(\alpha\) significance level. Note that the Berger and Hsu \(100(1 - \alpha)\%\) confidence interval is similar to the \(100(1 - 2\alpha)\%\) confidence interval in its construction except that (1) the lower limit, if positive, is set to zero, and (2) the upper limit, if negative, is set to zero.

If the \(100(1 - \alpha)\%\) confidence interval lies entirely within \(\left(-\Psi, +\Psi \right)\), then the null hypothesis of non-equivalence is rejected in favor of the alternative hypothesis of equivalence at the \(\alpha\) significance level.

For a non-inferiority trial, the two-sample t statistic labeled \(t_{inf}\) , previously discussed,can be applied to test:

\( H_0: \left\{ \mu_E - \mu_A \le - \Psi \right\} \text{ vs. } H_1: \left\{ \mu_E - \mu_A > - \Psi \right\}\)

Because a non-inferiority design reflects a one-sided situation, only the \(100(1 - \alpha)\%\) lower confidence limit is of interest:

If the \(100(1 - \alpha)\%\) lower confidence limit lies within \(\left(-\Psi, +∞\right)\), then the null hypothesis of inferiority is rejected in favor of the alternative hypothesis of non-inferiority at the \(\alpha\) significance level.

The FDA typically is more stringent than is required in non-inferiority tests. The FDA typically requires companies to use \(\alpha = 0.025\) for a non-inferiority trial, so that the one-sided test or lower confidence limit is comparable to what would be used in a two-sided superiority trial.

Equivalence

Non-Equivalence

Non-Inferiority

Inferiority

Example

As an example, suppose an investigator conducted an equivalence trial with 30 patients in each of the experimental therapy and active control groups \(\left(n_E = n_A = 30\right)\). He defines the zone of equivalence with \(\Psi = 4\). The sample means and the pooled sample standard deviation are

\( \bar{Y}_E = 17.4, \bar{Y}_A = 20.6, s = 6.5 \)

The t percentile, \(t_{58,0.95}\), can be found from the TINV function in SAS as TINV(0.95,58), which yields that \(t_{58,0.95} = 1.67\). Thus, using the formulas in the section above, the lower limit = min{0, -3.2 - 2.8} = min{0, -6.0} = -6.0; the upper limit = max{0, -3.2 + 2.8} = max{0, -0.4} = 0.0. This yields the 95% confidence interval for testing equivalence of \(\mu_E - \mu_A\) is (-6.0, 0.0). Because the 95% confidence interval for \(\mu_E - \mu_A\) does not lie entirely within \(\left(-\Psi, +\Psi \right) = \left(-4, +4 \right)\), the null hypothesis of non-equivalence is not rejected at the 0.05 significance level. Hence, the investigator cannot conclude that the experimental therapy is equivalent to the active control.

Suppose this had been conducted as a non-inferiority trial instead of an equivalence trial, and he defines the zone of non-inferiority with \(\Psi = 4\), i.e., \(\left(-4, +∞ \right)\). The 95% lower confidence limit for \(\mu_E - \mu_A\) is -6.0, which does not lie within \(\left(-4, +∞\right)\). Therefore, the investigator cannot claim non-inferiority of the experimental therapy to the active control.

A real example of a non-inferiority trial is the VALIANT trial in patients with myocardial infarction and heart failure. Patients were randomized to valsartan monotherapy \(\left(n_V = 4,909\right)\), captopril monotherapy \(\left(n_C = 4,909\right)\), or valsartan + captopril combination therapy \(\left(n_{VC} = 4,885\right)\). The primary outcome was death from any cause. One objective of the VALIANT trial was to determine if the combination therapy is superior to each of the monotherapies. Another objective of the trial was to determine if valsartan is non-inferior to captopril, defined by \(\Psi = 2.5\%\) in the overall death rate.

Switching Objectives

Suppose that in a non-inferiority trial, the 95% lower confidence limit for \(\mu_E - \mu_A\) not only lies within \(\left(-\Psi, +∞\right)\) to establish non-inferiority, but also lies within \(\left(0, +\Psi\right)\). It is safe to claim the superiority of the experimental therapy to the active control in such a situation (without any statistical penalty).

Non-Inferiority and Superiority

In a superiority trial, suppose that the 95% lower confidence limit for \(\mu_E - \mu_A\) does not lie within \(\left(0, +\Psi\right)\), indicating that the experimental therapy is not superior to the active control. If the protocol had specified non-inferiority as a secondary objective and specified an appropriate value of \(\Psi\), then it is safe to claim non-inferiority if the 95% lower confidence limit for \(\mu_E - \mu_A\) lies within \(\left(-\Psi, +∞\right)\).

Non-Inferiority

6b.7 - Sample Size and Power

For a continuous outcome that is approximately normally distributed in an equivalence trial, the number of patients needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1 - \beta \right)\%\) statistical power with an \(\alpha\)-level significance test is approximated by:

\( n_A = \left( \frac{AR+1}{AR}\right) \left(t_{n_{1}+n_{2}-2, 1-\alpha}+ t_{n_{1}+n_{2}-2, 1-\beta}\right)^2 \sigma^2 / \left(\Psi - |\Delta|\right)^2 \)

Notice the difference in the t percentiles between this formula and that for a superiority comparison, described earlier. The difference is due to the two one-sided testing that is performed.

Most investigators assume that the true difference in population means, \(\Delta = \mu_E - \mu_A\), is null in this sample size formula. This is an optimistic assumption and may not be realistic.

NOTE! the formula above simplifies to the formula on p. 189 in the FFDRG text if \(AR =1, \Delta = 0\) and substituting Z for t )

For a binary outcome, the zone of equivalence for the difference in population proportions between the experimental therapy and the active control, \(p_E - p_A\), is defined by the interval \((-\Psi, +\Psi)\). The number of patients needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1 - \beta \right)\%\) statistical power with an \(\alpha\) significance test is approximated by:

\( n_A = \left( \frac{AR+1}{AR}\right) \left(z_{1-\alpha}+ z_{1-\beta}\right)^2 \bar{p}(1-\bar{p}) / \left(\Psi - |p_E-p_A|\right)^2 \)

where

\( \bar{p}= \left( AR \cdot p_E+p_A\right) / (AR+1) \)

How does this formula compare to FFDRG p. 189? The choice of the value for p in our text is to use the control group value, assuming, that \(p_e - p_a = 0\).

For a time-to-event outcome, the zone of equivalence for the hazard ratio between the experimental therapy and the active control, \(\Lambda\), is defined by the interval \(\left(\dfrac{1}{\Psi}, +\Psi\right)\), where \(\Psi\) is chosen > 1. The number of patients who need to experience the event to achieve \(100 \left(1 - \beta \right)\%\) statistical power with an \(\alpha\)-level significance test is approximated by

\(E = \left( \frac{(AR+1)^2}{AR}\right) \left(z_{1-\alpha}+ z_{1-\beta}\right)^2 / \left( log_{e} \left(\Psi / \Lambda \right)\right)^2 \)

If \(p_E\) and \(p_A\) represent the anticipated failure rates in the two treatment groups, then the sample sizes can be determined from \(n_A = \dfrac{E}{\left(AR\times p_E + p_A\right)}\) and \(n_E = AR \times n_A\)

If a hazard function is assumed to be constant during the follow-up period [0, T], then it can be expressed as \(\lambda(t) = \lambda = \dfrac{-\text{log}_e(1 - p)}{T}\). In such a situation, the hazard ratio for comparing two groups is \(\Lambda = \dfrac{\text{log}_e\left(1 - p_E\right)}{\text{log}_e\left(1 - p_A\right)}\). The same formula can be applied, with different values of \(p_E\) and \(p_A\), to determine \(\Psi\).

For a continuous outcome that is approximately normally distributed in a non-inferiority trial, the number of subjects needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1 - \beta \right)\%\) statistical power with a \(\alpha\)-level significance test is approximated by:

\( n_A = \left( \frac{AR+1}{AR}\right) \left(t_{n_{1}+n_{2}-2, 1-\alpha}+ t_{n_{1}+n_{2}-2, 1-\beta}\right)^2 \sigma^2 / \left(\Psi - |\Delta|\right)^2 \)

Notice that the sample size formulae for non-inferiority trials are exactly the same as the sample size formulae for equivalence trials. This is because of the one-sided testing for both types of designs (even though an equivalence trial involves two one-sided tests). Also notice that the choice of \(Z_\alpha\) in the formulas above have assumed a one-sided test or two one-sided tests, but the requirements of regulatory agencies and the approach in our FFDRG text is to use the Z value that would have been used for a 2-sided hypothesis test. In homework, be sure to state any assumptions and the approach you are taking.

6b.8 - SAS Examples

SAS® Example

Sample Size for Comparing Two Normal Means

An investigator wants to determine the sample size for an asthma equivalence trial with an experimental therapy and an active control. The primary outcome is forced expiratory volume in one second \(\left(FEV_1\right)\). The investigator desires a 0.05-significance level test with 90% statistical power and decides that the zone of equivalence is \(\left(-\Psi, +\Psi\right) = \left(-0.1 L, +0.1L\right)\) and that the true difference in means does not exceed \(Delta = 0.05\) L. The standard deviation reported in the literature for a similar population is \(\sigma = 0.75\) L. The investigator plans to have equal allocation to the two treatment groups \(\left(AR = 1\right)\).

***********************************************************************
* This is a program that illustrates the use of PROC POWER to         *
* calculate sample size when comparing two normal means in an         *
* equivalence trial.                                                  *
***********************************************************************;

proc power;
twosamplemeans dist=normal groupweights=(1 1) alpha=0.05 power=0.9 stddev=0.75 
   lower=-0.10 upper=0.10 meandiff=0.05 test=equiv_diff ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Normal Means (1:1 Allocation)in an Equivalence Trial"; 
run;

Assuming that \(FEV_{1}\) has an approximate normal distribution, the approximate number of patients required for the active control group is:

\(n_A = \dfrac{\left(2\right)\left(1.645 + 1. 28\right)^{2}\left(0.75\right)^{2}}{\left(0.1 - 0.05\right)^{2}} = 3,851\)

The total sample size required is \(n_E + n_A = 3,851 + 3,851 = 7,702\).

SAS PROC POWER yields \(n_E + n_A = 3,855 + 3,855 = 7,710\).

Try It!

What happens to the total sample size if the power is to be 0.95 and the investigator uses 2:1 allocation?

n = 10,959

Here is the SAS output that you should have gotten:

`Fixed Scenario Elements`
`The POWER Procedure Equivalence Test for Mean Differences`
`Fixed Scenario Elements`
`Distribution`	`Normal`
`Method`	`Exact`
`Lower Equivalence Bound`	`-0.1`
`Upper Equivalence Bound`	`0.1`
`Alph`	`0.05`
`Mean Difference`	`0.05`
`Standard Deviation`	`0.75`
`Group 1 Weight`	`1`
`Group 2 Weight`	`2`
`Nominal Power`	`0.95`

`Computed N Total`
`Computed N Total`
`Actual Power`	`N Total`
`0.950`	`10959`

Come up with an answer to this question by yourself and then click on the icon to the left to reveal the solution.

What happens to the total sample size if the power is to be 0.95 and the investigator uses 2:1 allocation?

SAS® Example

Sample Size for Comparing Two Binomial Proportions

An investigator wants to compare an experimental therapy to an active control in a non-inferiority trial when the response is treatment success. She desires a 0.025 significance level test and 90% statistical power. She knows 70% of the active control patients will experience success, so she decides that the experimental therapy is not inferior if it yields at least 65% success. Thus, \(\Psi = 0.05\) and she assumes that the true difference is \(p_E - p_A = 0\).

***********************************************************************
* This is a program that illustrates the use of PROC POWER to         *
* calculate sample size when comparing two binomial proportions in a  *
* non-inferiority trial.                                              *
***********************************************************************;
    
proc power;
twosamplefreq groupweights=(1 1) groupps=(0.65 0.70) alpha=0.025 power=0.9
   test=PChi sides=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Binomial Proportions (1:1 Allocation"
title2 "in a Non-Inferiority Trial"; 
run;

With equal allocation, the number of patients in the active control group is:

\(n_A = \dfrac{\left(2\right)\left(1.96 + 1.28\right)^{2}{0.7\left(1 - 0.7\right)}}{\left(0.05\right)^{2}} = 1,764\)

Thus, \(n_E = n_A = 1,764\) patients for a total of 3,528 patients.

SAS PROC POWER does not contain a feature for an equivalence trial or a non-inferiority trial with binary outcomes. Fisher’s exact test for a superiority trial can be adapted to yield \(n_E = n_A = 1,882\) for a total of 3,764 patients. The discrepancy is due to the superiority trial using p-bar = 0.675 instead of 0.7.

Try It!

Suppose the proportions were 0.65 and 0.75. How does the required sample size, n, change?

n = 880 instead of 3684 with Pearson’s Chi Square.

Here is the output for the proportions 0.65 and 0.75.

`Fixed Scenario Elements`
`The POWER Procedure` `Pearson Chi-Square Test for Two Proportions`
`Fixed Scenario Elements`
`Distribution`	`Asymptotic Normal`
`Method`	`Normal Approximation`
`Number of Sides`	`1`
`Alpha`	`0.025`
`Group 1 Proportion`	`0.65`
`Group 2 Proportion`	`0.75`
`Group 1 Weight`	`1`
`Group 2 Weight`	`1`
`Nominal Power`	`0.9`
`Null Proportion Differences`	`0`

`Computed N Total`
`Computed N Total`
`Actual Power`	`N Total`
`0.900`	`880`

Come up with an answer to this question by yourself and then click on the icon to the left to reveal the solution.

Suppose the proportions were 0.65 and 0.75. How does the required sample size, n, change?

SAS® Example

Sample Size for Comparing Two Hazard Functions

An investigator wants to compare an experimental therapy to an active control in a non-inferiority trial. The response is time to infection. He desires a 0.025-significance level test with 90% statistical power and \(AR =1\). Follow-up for each patient is one year and he expects 20% of the active control group will get an infection \(\left(p_A = 0.2\right)\). Although he believes that \(p_E = 0.2\), he considers the experimental therapy to be non-inferior if \(p_E ≤ 0.25\). The SAS program below, for a one-sided superiority trial, may approximate the required sample size.

***********************************************************************
* This is a program that illustrates the use of PROC POWER to         *
* calculate sample size when comparing two hazard functions in a      *
* non-inferiority trial.                                              *
***********************************************************************;
    
proc power;
twosamplesurvival groupweights=(1 1) alpha=0.025 power=0.9 sides=1
   test=logrank curve("Placebo")=(1.01):(0.8) curve("Therapy")=(1.01):(0.75) 
   groupsurvival="Placebo"|"Therapy" accrualtime=0.01 followuptime=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Hazard Functions (1:1 Allocation)";
title2 "in a Non-Inferiority Trial"; 
run;

The sample size can be worked out exactly. as follows:

Assuming constant hazard functions, then the effect size with \(p_E = p_A = 0.2\) is \(Lambda = 1\). With \(p_E = 0.25\) and \(p_A = 0.2\), the zone of non-inferiority is defined by:

\(\Psi = \dfrac{\text{log}_e\left(0.75\right)}{\text{log}_e\left(0.8\right)} = 1.29\)

The number of events is \(E = \dfrac{\left(4\right)\left(1.96 + 1.28\right)^{2}}{{\text{log}_e\left(1.29\right)}^{2}} = 648\)

and the sample sizes are \(n_A = \dfrac{E}{\left(AR \times p_E + p_A \right)} = \dfrac{648}{\left(0.2 + 0.2\right)} = 1,620\) and \(n_E = 1,620\)

Since SAS PROC POWER does not contain a feature for an equivalence trial or a non-inferiority trial with time-to-event outcomes, the results from the logrank test for a superiority trial were adapted to yield \(n_E = n_A = 1,457\). The discrepancy in numbers between the program and the calculated n is due to the superiority trial using \(p_E = 0.25\) instead of 0.2 in \(n_A = \dfrac{E}{\left(AR \times p_E + p_A \right)}\).

Notice that the resultant sample sizes in SAS Examples 7.7-7.9 all are relatively large. This is because the zone of equivalence or non-inferiority is defined by a small value of Ψ. Generally, equivalence trials and non-inferiority trials will require larger sample sizes than superiority trials.

None of SAS Examples 7.7-7.9 accounted for withdrawals. If a withdrawal rate of \(\gamma\) is anticipated, then the sample size should be increased by the factor \(\dfrac{1}{\left(1 - \gamma\right)}\).

6b.9 - Summary

In this lesson, among other things, we learned:

Distinguish between superiority, non-inferiority and equivalence trials in terms of
- objectives
- control group
- hypotheses tested and
- formation of confidence intervals.
Recognize the characteristics of a clinical trial with high external validity
Define which data are included in an intention-to-treat analysis
Recognize the major considerations for designing an equivalence or non-inferiority trial.
Perform sample size calculations for some equivalence and non-inferiority trials, using SAS programs.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility