Lesson 6: Sample Size and Power  Part b
Lesson 6: Sample Size and Power  Part bOverview
This week we continue exploring the issues of sample size and power, this time with regard to the differing purposes of clinical trials. Often the objective of the trial is to establish that a therapy is efficacious, but what is the proper control group? Can superiority to placebo be clearly established when there are other effective therapies on the market? These questions lead to special considerations based on whether the trial has an objective of establishing superiority, equivalence, or noninferiority. So, let’s move ahead…
Objectives
 Distinguish between superiority, noninferiority and equivalence trials in terms of
 objectives
 control group
 hypotheses tested and
 formation of confidence intervals.
 Recognize the characteristics of a clinical trial with high external validity
 Define which data are included in an intentiontotreat analysis
 Recognize the major considerations for designing an equivalence or noninferiority trial.
 Perform sample size calculations for some equivalence and noninferiority trials, using SAS programs.
References:
Berger RL, Hsu JC. Bioequivalence trials, intersectionunion tests, and equivalence confidence sets. Statistical Science 1996, 11: 283319).
Piantadosi Steven. (2005) Sample size and power. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hoboken, NJ: John Wiley and Sons, Inc.
6b.1  Control Groups
6b.1  Control GroupsPlaceboControl
Placebocontrolled trials typically provide an unambiguous statement of the research hypothesis: either we want to show that the experimental treatment is superior to placebo (onesided alternative) or as is more often the case, we want to show that the experimental treatment is different than placebo (twosided alternative). For this reason, we frequently refer to a placebocontrolled trial as a confirmatory trial, or in the most recent language, it is called a superiority trial (even if we are using a twosided alternative).
Active Control
Active control groups are often used because placebo control groups are unethical, such as when:
 the disease is lifethreatening or debilitating, and/or
 an effective therapy already exists and is considered standardofcare.
Investigators can use an active control group in a superiority trial, an equivalence trial, or a noninferiority trial. The new treatment may be preferred due to less cost, fewer side effects, or less impact on the quality of life. Or the new treatment may have superior efficacy.
Equivalence trials and noninferiority trials have different objectives than superiority trials. The objective of an equivalence trial is to demonstrate that a therapy is “equivalent” to the active control (it is not inferior to and not superior to the active control). Equivalent might not be the best word choice for this type of trial as we will see later. The objective of a noninferiority trial is to demonstrate that a therapy is not inferior to the active control, i.e., it is not worse than the treatment.
There are a number of issues related to the design and analysis of equivalence and noninferiority trials that are not well understood by clinical investigators. We will examine these issues using examples of such trials in this lesson.
6b.2  Combination Therapy Trials
6b.2  Combination Therapy TrialsCombination therapy trials are an example where the appropriateness of a placebo control must be carefully considered.
Suppose that for a particular disease or condition, there exists a standard therapy that is accepted as the best available treatment (standardofcare). The standardofcare could be a drug, a medical device, a surgical procedure, diet, exercise, etc., or some combination of these various regimens.
In the MIRACLE trial, the standardofcare for the eligible patients with heart failure during the course of the trial consisted of some combination of the following medications:
 Diuretic
 Angiotensinconverting enzyme (ACE) inhibitor or angiotensinreceptor blocker
 Digitalis
 Betablocker
Suppose that the experimental therapy is of a different modality or different mechanism of action than the standardofcare. If so, then it may be possible to use the experimental therapy in combination with the standardofcare and we can consider designing a twoarmed trial that compares:
standardofcare + experimental therapy
versus
standardofcare + placebo therapy
This situation is comparable to a superiority trial because the research objective is to demonstrate superiority of the combination therapy to the standardofcare.
In the MIRACLE trial, for example, the comparison consisted of:
standardofcare + pacemaker
versus
standardofcare + inactive pacemaker
In other situations, a superiority trial is not feasible. If the experimental therapy is a similar modality as one of the components of standardofcare, then it may not be appropriate to combine the experimental therapy with the standardofcare.
There are two other possibilities to consider for designing the clinical trial, namely, an equivalence trial and a noninferiority trial.
6b.3  Equivalence Trials
6b.3  Equivalence TrialsFor an equivalence trial, it is necessary to determine a “zone of clinical equivalence” prior to the trial onset.
For example, consider standard (active control) and experimental antihypertensive drugs (a drug that controls blood pressure). Suppose that the standard drug yields a mean reduction of 5 mm Hg in diastolic blood pressure for a certain patient population. The investigator may decide that the experimental drug is clinically equivalent to the standard drug if its mean reduction in diastolic blood pressure is 37 mm Hg. This is based on clinical judgment and there may be differences of opinion on this 'arbitrary' level of equivalence.
Thus, the difference in means between the two therapies does not exceed 2 mm Hg. Let's suppose that we are willing to accept this level.
In general, the zone of equivalence is defined by \(±\Psi\). The difference in population means between the experimental therapy and the active control, \(\mu_E  \mu_A\), should lie within \((\Psi, +\Psi)\). Differences in response less than \(\Psi\) are considered 'equally effective' or 'noninferior'.
In nearly every equivalence trial, the selection of \(\Psi\) is arbitrary and can be controversial. Some researchers recommend that Ψ be selected as less than onehalf of the magnitude of the effect observed from the superiority trials comparing the active control to placebo.
Given what we know in the the antihypertensive example above, \(\Psi = 2\) satisfies this requirement \((\dfrac{2}{5} = 0.4 < 0.5)\), but why not select \(\Psi = 1\)?
Here is a second issue to consider...
Unlike a placebocontrolled trial, an equivalence trial does not provide a natural check for internal validity because equivalence of the experimental and active control therapies does not necessarily imply that either of them is effective. In other words, if a third treatment arm of placebo had been included in the trial, it is possible that neither the experimental therapy nor the active control therapy would demonstrate superiority over placebo. There is no direct establishment of superiority inherent in the way the trial is set up.
The investigator needs to select an active control therapy for the equivalence trial that has been proven to be superior to the placebo. An important assumption is that the active control would be superior to placebo (had placebo been a treatment arm in the current trial).
In the past, a few equivalence trials incorporated appropriate active controls, but at doses less than recommended (rendering them ineffective). It is important to select the proper control and use it at an appropriate dose level.
One way to ascertain internal validity is through an external validity check, e.g., compare the experimental and active control therapies of the current study to published reports for comparative trials that involve the active control therapy versus a placebo control. Are similar results observed for the active therapy in the equivalence trial as in the published study against a placebo?
External comparisons should examine response levels, patient compliance, withdrawal rates, use of rescue medications, etc. An external validity check is only possible if the chosen active control therapy for the equivalence trial was determined effective in a superiority trial. An underdosed or overdosed regimen for the active control therapy in an equivalence trial can bias the results and interpretations. In addition, the design for the equivalence trial should mimic (within reason) the design for the superiority trial. Some of this advice is difficult to follow and may be impossible to implement.
(Another aspect of internal validity, of course, is the quality of the trial, in terms of inclusion/exclusion criteria, dosing regimens, quality control, etc. Do not run a sloppy study!)
The U.S. Food and Drug Administration (FDA) and the National Institutes of Health (NIH) typically require intenttotreat (ITT) analyses in placebocontrolled trials. In an ITT analysis, data on all randomized patients are included in the analysis, regardless of protocol violations, lack of adherence, withdrawal, incorrectly taking the other treatment, etc. The ITT analysis reflects what will happen in the real world, outside the realm of a controlled clinical trial.
Is this appropriate? In a superiority trial, the ITT analysis usually is conservative because it tends to diffuse the difference between the treatment arms. There is more 'noise' in an ITT study. This is due to the increased variability from protocol violations, lack of adherence, withdrawal, etc. You can overcome this noise by increasing the sample size.
In an equivalence trial, the ITT analysis still is appropriate. There is a misconception that the ITT analysis will have the opposite effect in an equivalence trial, i.e., it will be easier to demonstrate equivalence. This is not so. Even with an ITT analysis in an equivalence trial, it still is important to conduct a welldesigned study with a sufficient sample size and good quality control.
An alternative to an intenttotreat analysis is a protocol analysis, whereby subjects are analyzed according to the treatment received. A protocol analysis excludes subjects who did not satisfy the inclusion criteria, did not comply with taking study medications, violated the protocol, etc. You are excluding data from the patients that do not follow the protocol when it comes to the analysis. A protocol analysis is expected to enhance differences between treatments, so it usually will be conservative for an equivalence trial. Obviously, a protocol analysis is susceptible to many biases and must be performed very carefully. You may think that you are removing all of the biases, when in fact you may not be. A protocol analysis could be considered as supplemental to the ITT analysis. The U.S. FDA moved to ITT studies years ago to avoid biases introduced when researchers selectively excluded patients from analysis because of various protocol deviations. Many of the major medical journals also will only accept ITT studies for these reasons as well.
6b.4  NonInferiority Trials
6b.4  NonInferiority TrialsA noninferiority trial is similar to an equivalence trial. The research question in a noninferiority trial is whether the experimental therapy is not inferior to the active control (whereas the experimental therapy in an equivalence trial should not be inferior to, nor superior to, the active control). Thus, a noninferiority trial is onesided, whereas an equivalence trial is twosided. (For noninferiority, we want experimental therapy to be not inferior to the active control.)
Assume that the larger response is the better response. The onesided zone of noninferiority is defined by \(\Psi\), i.e., the difference in population means between the experimental therapy and the active control, \(\mu_E  \mu_A\), should lie within \(\left(\Psi, + ∞\right)\).
Many of the same issues that are critical for designing an equivalence trial also are critical for designing a noninferiority trial, namely, appropriate selection of an active control and appropriate selection of the “zone of clinical noninferiority” defined by \(\Psi\).
Hypertensive Example
Consider the previous example with the standard and experimental antihypertensive therapies.
The researchers may decide that the experimental drug is clinically not inferior to the standard drug if its mean reduction in diastolic blood pressure is at least 3 mm Hg \(\left(\Psi = 2\right)\). Thus, the difference in population means between the experimental therapy and the active control therapy, \(\mu_E  \mu_A\), should lie within \(\left(\Psi, + ∞\right)\). It does not matter if the experimental drug is much better than active control drug, provided that it is not inferior to the active control drug.
Because a noninferiority trial design allows for the possibility that the experimental therapy is superior to the active control therapy, the noninferiority design is preferred over the equivalence design. The equivalence design is useful when evaluating generic drugs.
6b.5  Statistical Inference  Hypothesis Testing
6b.5  Statistical Inference  Hypothesis TestingStatisticians construct the null hypothesis and the alternative hypothesis for statistical hypothesis testing such that the research hypothesis is the alternative hypothesis:
\(H_0: \left\{ \text{nonequivalence}\right\} \text{ vs. } H_1: \left\{ \text{equivalence}\right\} \)
or
\(H_0: \left\{ \text{inferiority}\right\} \text{ vs. } H_1: \left\{ \text{noninferiority}\right\} \)
In terms of the population means, the hypotheses for testing equivalence are expressed as:
\(H_0: \left\{ \mu_{E}  \mu_{A} \le \Psi \text{ or } \mu_{E}  \mu_{A} \ge \Psi \right\}\)
vs.
\(H_1: \left\{\Psi < \mu_{E}  \mu_{A}< \Psi \right\}\)
also expressed as
\(H_0: \left\{\mu_{E}  \mu_{A} \ge \Psi \right\} \text{ vs. } H_1: \left\{\mu_{E}  \mu_{A} < \Psi \right\}\)
In terms of the population means, the hypotheses for testing noninferiority are expressed as
\(H_0: \left\{\mu_{E}  \mu_{A} \le \Psi \right\} \text{ vs. } H_1: \left\{\mu_{E}  \mu_{A} > \Psi \right\}\)
The null and alternative hypotheses for an equivalence trial can be decomposed into two distinct hypothesis testing problems, one for noninferiority:
\(H_{01}: \left\{\mu_{E}  \mu_{A} \le \Psi \right\} \text{ vs. } H_{11}: \left\{\mu_{E}  \mu_{A} > \Psi \right\}\)
and one for nonsuperiority
\(H_{02}: \left\{\mu_{E}  \mu_{A} \ge \Psi \right\} \text{ vs. } H_{12}: \left\{\mu_{E}  \mu_{A} <\Psi \right\}\)
The null hypothesis of nonequivalence is rejected if and only if the null hypothesis of noninferiority \(\left(H_{01}\right)\) is rejected AND the null hypothesis of nonsuperiority \(\left(H_{02}\right)\) is rejected.
This rationale leads to what is called two onesided testing (TOST). If the data are approximately normally distributed, then twosample t tests can be applied. If normality is suspect, then Wilcoxon ranksum tests can be applied.
With respect to twosample t tests, reject the null hypothesis of inferiority if:
\( t_{inf}= \left(\bar{Y}_E  \bar{Y}_A + \Psi \right) / s \sqrt{\frac {1}{n_E}+\frac {1}{n_Z}}>t_{n_{E}+n_{A}2, 1\alpha}\)
and reject the null hypothesis of superiority if:
\( t_{sup}= \left(\bar{Y}_E  \bar{Y}_A + \Psi \right) / s \sqrt{\frac {1}{n_E}+\frac {1}{n_A}}< t_{n_{E}+n_{A}2, 1\alpha}\)
where s is the pooled sample estimate of the standard deviation, calculated as the squareroot of the pooled sample estimate of the variance:
\( s^2 = \left(\sum_{i=1}^{n_E}\left(Y_{Ei}\bar{Y}_E\right)^2+\sum_{j=1}^{n_A}\left(Y_{Aj}\bar{Y}_A\right)^2\right) / \left(n_E + n_A 2\right) \)
6b.6  Statistical Inference  Confidence Intervals
6b.6  Statistical Inference  Confidence IntervalsConfidence intervals can be used in place of the statistical tests. Reporting of confidence intervals is more informative because it indicates the magnitude of the treatment difference and how close it approaches the equivalence zone.
The \(100(1  \alpha)\%\) confidence interval that corresponds to testing the null hypothesis of nonequivalence versus the alternative hypothesis of equivalence at the \(\alpha\) significance level has the following limits
\( \text{lower limit } = \text{min } \left [ 0, \left( \bar{Y}_{E}  \bar{Y}_{A} \right)  s \sqrt{\frac{1}{n_E}+\frac{1}{n_A}} t_{n_{E}+n_{A}2, 1 \alpha} \right ] \)
\( \text{upper limit } = \text{max } \left [ 0, \left( \bar{Y}_{E}  \bar{Y}_{A} \right) + s \sqrt{\frac{1}{n_E}+\frac{1}{n_A}} t_{n_{E}+n_{A}2, 1 \alpha} \right ] \)
This confidence interval does provide \(100(1  \alpha)\%\) coverage  (see Berger RL, Hsu JC. Bioequivalence trials, intersectionunion tests, and equivalence confidence sets. Statistical Science 1996, 11: 283319).
Some researchers mistakenly believe that a \(100(1  2\alpha)\%\) confidence interval is consistent with testing the null hypothesis of nonequivalence versus the alternative hypothesis of equivalence at the \(\alpha\) significance level. Note that the Berger and Hsu \(100(1  \alpha)\%\) confidence interval is similar to the \(100(1  2\alpha)\%\) confidence interval in its construction except that (1) the lower limit, if positive, is set to zero, and (2) the upper limit, if negative, is set to zero.
If the \(100(1  \alpha)\%\) confidence interval lies entirely within \(\left(\Psi, +\Psi \right)\), then the null hypothesis of nonequivalence is rejected in favor of the alternative hypothesis of equivalence at the \(\alpha\) significance level.
For a noninferiority trial, the twosample t statistic labeled \(t_{inf}\) , previously discussed,can be applied to test:
\( H_0: \left\{ \mu_E  \mu_A \le  \Psi \right\} \text{ vs. } H_1: \left\{ \mu_E  \mu_A >  \Psi \right\}\)
Because a noninferiority design reflects a onesided situation, only the \(100(1  \alpha)\%\) lower confidence limit is of interest:
If the \(100(1  \alpha)\%\) lower confidence limit lies within \(\left(\Psi, +∞\right)\), then the null hypothesis of inferiority is rejected in favor of the alternative hypothesis of noninferiority at the \(\alpha\) significance level.
The FDA typically is more stringent than is required in noninferiority tests. The FDA typically requires companies to use \(\alpha = 0.025\) for a noninferiority trial, so that the onesided test or lower confidence limit is comparable to what would be used in a twosided superiority trial.
Equivalence
NonEquivalence
NonInferiority
Inferiority
Example
As an example, suppose an investigator conducted an equivalence trial with 30 patients in each of the experimental therapy and active control groups \(\left(n_E = n_A = 30\right)\). He defines the zone of equivalence with \(\Psi = 4\). The sample means and the pooled sample standard deviation are
\( \bar{Y}_E = 17.4, \bar{Y}_A = 20.6, s = 6.5 \)
The t percentile, \(t_{58,0.95}\), can be found from the TINV function in SAS as TINV(0.95,58), which yields that \(t_{58,0.95} = 1.67\). Thus, using the formulas in the section above, the lower limit = min{0, 3.2  2.8} = min{0, 6.0} = 6.0; the upper limit = max{0, 3.2 + 2.8} = max{0, 0.4} = 0.0. This yields the 95% confidence interval for testing equivalence of \(\mu_E  \mu_A\) is (6.0, 0.0). Because the 95% confidence interval for \(\mu_E  \mu_A\) does not lie entirely within \(\left(\Psi, +\Psi \right) = \left(4, +4 \right)\), the null hypothesis of nonequivalence is not rejected at the 0.05 significance level. Hence, the investigator cannot conclude that the experimental therapy is equivalent to the active control.
Suppose this had been conducted as a noninferiority trial instead of an equivalence trial, and he defines the zone of noninferiority with \(\Psi = 4\), i.e., \(\left(4, +∞ \right)\). The 95% lower confidence limit for \(\mu_E  \mu_A\) is 6.0, which does not lie within \(\left(4, +∞\right)\). Therefore, the investigator cannot claim noninferiority of the experimental therapy to the active control.
A real example of a noninferiority trial is the VALIANT trial in patients with myocardial infarction and heart failure. Patients were randomized to valsartan monotherapy \(\left(n_V = 4,909\right)\), captopril monotherapy \(\left(n_C = 4,909\right)\), or valsartan + captopril combination therapy \(\left(n_{VC} = 4,885\right)\). The primary outcome was death from any cause. One objective of the VALIANT trial was to determine if the combination therapy is superior to each of the monotherapies. Another objective of the trial was to determine if valsartan is noninferior to captopril, defined by \(\Psi = 2.5\%\) in the overall death rate.
Switching Objectives
Suppose that in a noninferiority trial, the 95% lower confidence limit for \(\mu_E  \mu_A\) not only lies within \(\left(\Psi, +∞\right)\) to establish noninferiority, but also lies within \(\left(0, +\Psi\right)\). It is safe to claim the superiority of the experimental therapy to the active control in such a situation (without any statistical penalty).
NonInferiority and Superiority
In a superiority trial, suppose that the 95% lower confidence limit for \(\mu_E  \mu_A\) does not lie within \(\left(0, +\Psi\right)\), indicating that the experimental therapy is not superior to the active control. If the protocol had specified noninferiority as a secondary objective and specified an appropriate value of \(\Psi\), then it is safe to claim noninferiority if the 95% lower confidence limit for \(\mu_E  \mu_A\) lies within \(\left(\Psi, +∞\right)\).
NonInferiority
6b.7  Sample Size and Power
6b.7  Sample Size and PowerFor a continuous outcome that is approximately normally distributed in an equivalence trial, the number of patients needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1  \beta \right)\%\) statistical power with an \(\alpha\)level significance test is approximated by:
\( n_A = \left( \frac{AR+1}{AR}\right) \left(t_{n_{1}+n_{2}2, 1\alpha}+ t_{n_{1}+n_{2}2, 1\beta}\right)^2 \sigma^2 / \left(\Psi  \Delta\right)^2 \)
Notice the difference in the t percentiles between this formula and that for a superiority comparison, described earlier. The difference is due to the two onesided testing that is performed.
Most investigators assume that the true difference in population means, \(\Delta = \mu_E  \mu_A\), is null in this sample size formula. This is an optimistic assumption and may not be realistic.
For a binary outcome, the zone of equivalence for the difference in population proportions between the experimental therapy and the active control, \(p_E  p_A\), is defined by the interval \((\Psi, +\Psi)\). The number of patients needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1  \beta \right)\%\) statistical power with an \(\alpha\) significance test is approximated by:
\( n_A = \left( \frac{AR+1}{AR}\right) \left(z_{1\alpha}+ z_{1\beta}\right)^2 \bar{p}(1\bar{p}) / \left(\Psi  p_Ep_A\right)^2 \)
where
\( \bar{p}= \left( AR \cdot p_E+p_A\right) / (AR+1) \)
How does this formula compare to FFDRG p. 189? The choice of the value for p in our text is to use the control group value, assuming, that \(p_e  p_a = 0\).
For a timetoevent outcome, the zone of equivalence for the hazard ratio between the experimental therapy and the active control, \(\Lambda\), is defined by the interval \(\left(\dfrac{1}{\Psi}, +\Psi\right)\), where \(\Psi\) is chosen > 1. The number of patients who need to experience the event to achieve \(100 \left(1  \beta \right)\%\) statistical power with an \(\alpha\)level significance test is approximated by
\(E = \left( \frac{(AR+1)^2}{AR}\right) \left(z_{1\alpha}+ z_{1\beta}\right)^2 / \left( log_{e} \left(\Psi / \Lambda \right)\right)^2 \)
If \(p_E\) and \(p_A\) represent the anticipated failure rates in the two treatment groups, then the sample sizes can be determined from \(n_A = \dfrac{E}{\left(AR\times p_E + p_A\right)}\) and \(n_E = AR \times n_A\)
If a hazard function is assumed to be constant during the followup period [0, T], then it can be expressed as \(\lambda(t) = \lambda = \dfrac{\text{log}_e(1  p)}{T}\). In such a situation, the hazard ratio for comparing two groups is \(\Lambda = \dfrac{\text{log}_e\left(1  p_E\right)}{\text{log}_e\left(1  p_A\right)}\). The same formula can be applied, with different values of \(p_E\) and \(p_A\), to determine \(\Psi\).
For a continuous outcome that is approximately normally distributed in a noninferiority trial, the number of subjects needed in the active control arm, \(n_A\), where \(AR = \dfrac{n_E}{n_A}\), to achieve \(100 \left(1  \beta \right)\%\) statistical power with a \(\alpha\)level significance test is approximated by:
\( n_A = \left( \frac{AR+1}{AR}\right) \left(t_{n_{1}+n_{2}2, 1\alpha}+ t_{n_{1}+n_{2}2, 1\beta}\right)^2 \sigma^2 / \left(\Psi  \Delta\right)^2 \)
Notice that the sample size formulae for noninferiority trials are exactly the same as the sample size formulae for equivalence trials. This is because of the onesided testing for both types of designs (even though an equivalence trial involves two onesided tests). Also notice that the choice of \(Z_\alpha\) in the formulas above have assumed a onesided test or two onesided tests, but the requirements of regulatory agencies and the approach in our FFDRG text is to use the Z value that would have been used for a 2sided hypothesis test. In homework, be sure to state any assumptions and the approach you are taking.
6b.8  SAS Examples
6b.8  SAS ExamplesSAS® Example
Sample Size for Comparing Two Normal Means
An investigator wants to determine the sample size for an asthma equivalence trial with an experimental therapy and an active control. The primary outcome is forced expiratory volume in one second \(\left(FEV_1\right)\). The investigator desires a 0.05significance level test with 90% statistical power and decides that the zone of equivalence is \(\left(\Psi, +\Psi\right) = \left(0.1 L, +0.1L\right)\) and that the true difference in means does not exceed \(Delta = 0.05\) L. The standard deviation reported in the literature for a similar population is \(\sigma = 0.75\) L. The investigator plans to have equal allocation to the two treatment groups \(\left(AR = 1\right)\).
***********************************************************************
* This is a program that illustrates the use of PROC POWER to *
* calculate sample size when comparing two normal means in an *
* equivalence trial. *
***********************************************************************;
proc power;
twosamplemeans dist=normal groupweights=(1 1) alpha=0.05 power=0.9 stddev=0.75
lower=0.10 upper=0.10 meandiff=0.05 test=equiv_diff ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Normal Means (1:1 Allocation)in an Equivalence Trial";
run;
Assuming that \(FEV_{1}\) has an approximate normal distribution, the approximate number of patients required for the active control group is:
\(n_A = \dfrac{\left(2\right)\left(1.645 + 1. 28\right)^{2}\left(0.75\right)^{2}}{\left(0.1  0.05\right)^{2}} = 3,851\)
The total sample size required is \(n_E + n_A = 3,851 + 3,851 = 7,702\).
SAS PROC POWER yields \(n_E + n_A = 3,855 + 3,855 = 7,710\).
n = 10,959
Here is the SAS output that you should have gotten:
The POWER Procedure 


Fixed Scenario Elements  
Distribution  Normal 
Method  Exact 
Lower Equivalence Bound  0.1 
Upper Equivalence Bound  0.1 
Alph  0.05 
Mean Difference  0.05 
Standard Deviation  0.75 
Group 1 Weight  1 
Group 2 Weight  2 
Nominal Power  0.95 
Computed N Total  

Actual Power  N Total 
0.950  10959 
Come up with an answer to this question by yourself and then click on the icon to the left to reveal the solution.
What happens to the total sample size if the power is to be 0.95 and the investigator uses 2:1 allocation?
SAS® Example
Sample Size for Comparing Two Binomial Proportions
An investigator wants to compare an experimental therapy to an active control in a noninferiority trial when the response is treatment success. She desires a 0.025 significance level test and 90% statistical power. She knows 70% of the active control patients will experience success, so she decides that the experimental therapy is not inferior if it yields at least 65% success. Thus, \(\Psi = 0.05\) and she assumes that the true difference is \(p_E  p_A = 0\).
***********************************************************************
* This is a program that illustrates the use of PROC POWER to *
* calculate sample size when comparing two binomial proportions in a *
* noninferiority trial. *
***********************************************************************;
proc power;
twosamplefreq groupweights=(1 1) groupps=(0.65 0.70) alpha=0.025 power=0.9
test=PChi sides=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Binomial Proportions (1:1 Allocation"
title2 "in a NonInferiority Trial";
run;
With equal allocation, the number of patients in the active control group is:
\(n_A = \dfrac{\left(2\right)\left(1.96 + 1.28\right)^{2}{0.7\left(1  0.7\right)}}{\left(0.05\right)^{2}} = 1,764\)
Thus, \(n_E = n_A = 1,764\) patients for a total of 3,528 patients.
SAS PROC POWER does not contain a feature for an equivalence trial or a noninferiority trial with binary outcomes. Fisher’s exact test for a superiority trial can be adapted to yield \(n_E = n_A = 1,882\) for a total of 3,764 patients. The discrepancy is due to the superiority trial using pbar = 0.675 instead of 0.7.
n = 880 instead of 3684 with Pearson’s Chi Square.
Here is the output for the proportions 0.65 and 0.75.
The POWER Procedure 


Fixed Scenario Elements  
Distribution  Asymptotic Normal 
Method  Normal Approximation 
Number of Sides  1 
Alpha  0.025 
Group 1 Proportion  0.65 
Group 2 Proportion  0.75 
Group 1 Weight  1 
Group 2 Weight  1 
Nominal Power  0.9 
Null Proportion Differences  0 
Computed N Total  

Actual Power  N Total 
0.900  880 
Come up with an answer to this question by yourself and then click on the icon to the left to reveal the solution.
Suppose the proportions were 0.65 and 0.75. How does the required sample size, n, change?
SAS® Example
Sample Size for Comparing Two Hazard Functions
An investigator wants to compare an experimental therapy to an active control in a noninferiority trial. The response is time to infection. He desires a 0.025significance level test with 90% statistical power and \(AR =1\). Followup for each patient is one year and he expects 20% of the active control group will get an infection \(\left(p_A = 0.2\right)\). Although he believes that \(p_E = 0.2\), he considers the experimental therapy to be noninferior if \(p_E ≤ 0.25\). The SAS program below, for a onesided superiority trial, may approximate the required sample size.
***********************************************************************
* This is a program that illustrates the use of PROC POWER to *
* calculate sample size when comparing two hazard functions in a *
* noninferiority trial. *
***********************************************************************;
proc power;
twosamplesurvival groupweights=(1 1) alpha=0.025 power=0.9 sides=1
test=logrank curve("Placebo")=(1.01):(0.8) curve("Therapy")=(1.01):(0.75)
groupsurvival="Placebo""Therapy" accrualtime=0.01 followuptime=1 ntotal=.;
plot min=0.1 max=0.9;
title "Sample Size Calculation for Comparing Two Hazard Functions (1:1 Allocation)";
title2 "in a NonInferiority Trial";
run;
The sample size can be worked out exactly. as follows:
Assuming constant hazard functions, then the effect size with \(p_E = p_A = 0.2\) is \(Lambda = 1\). With \(p_E = 0.25\) and \(p_A = 0.2\), the zone of noninferiority is defined by:
\(\Psi = \dfrac{\text{log}_e\left(0.75\right)}{\text{log}_e\left(0.8\right)} = 1.29\)
The number of events is \(E = \dfrac{\left(4\right)\left(1.96 + 1.28\right)^{2}}{{\text{log}_e\left(1.29\right)}^{2}} = 648\)
and the sample sizes are \(n_A = \dfrac{E}{\left(AR \times p_E + p_A \right)} = \dfrac{648}{\left(0.2 + 0.2\right)} = 1,620\) and \(n_E = 1,620\)
Since SAS PROC POWER does not contain a feature for an equivalence trial or a noninferiority trial with timetoevent outcomes, the results from the logrank test for a superiority trial were adapted to yield \(n_E = n_A = 1,457\). The discrepancy in numbers between the program and the calculated n is due to the superiority trial using \(p_E = 0.25\) instead of 0.2 in \(n_A = \dfrac{E}{\left(AR \times p_E + p_A \right)}\).
Notice that the resultant sample sizes in SAS Examples 7.77.9 all are relatively large. This is because the zone of equivalence or noninferiority is defined by a small value of Ψ. Generally, equivalence trials and noninferiority trials will require larger sample sizes than superiority trials.
None of SAS Examples 7.77.9 accounted for withdrawals. If a withdrawal rate of \(\gamma\) is anticipated, then the sample size should be increased by the factor \(\dfrac{1}{\left(1  \gamma\right)}\).
6b.9  Summary
6b.9  SummaryIn this lesson, among other things, we learned:
 Distinguish between superiority, noninferiority and equivalence trials in terms of
 objectives
 control group
 hypotheses tested and
 formation of confidence intervals.
 Recognize the characteristics of a clinical trial with high external validity
 Define which data are included in an intentiontotreat analysis
 Recognize the major considerations for designing an equivalence or noninferiority trial.
 Perform sample size calculations for some equivalence and noninferiority trials, using SAS programs.