Lesson 15: Crossover Designs

Overview

A crossover design is a repeated measurements design such that each experimental unit (patient) receives different treatments during the different time periods, i.e., the patients cross over from one treatment to another during the course of the trial. This is in contrast to a parallel design in which patients are randomized to a treatment and remain on that treatment throughout the duration of the trial.

The reason to consider a crossover design when planning a clinical trial is that it could yield a more efficient comparison of treatments than a parallel design, i.e., fewer patients might be required in the crossover design in order to attain the same level of statistical power or precision as a parallel design.(This will become more evident later in this lesson...) Intuitively, this seems reasonable because each patient serves as his/her own matched control. Every patient receives both treatment A and B. Crossover designs are popular in medicine, agriculture, manufacturing, education, and many other disciplines. A comparison is made of the subject's response on A vs. B.

Although the concept of patients serving as their own controls is very appealing to biomedical investigators, crossover designs are not preferred routinely because of the problems that are inherent with this design. In medical clinical trials, the disease should be chronic and stable, and the treatments should not result in total cures but only alleviate the disease condition. If treatment A cures the patient during the first period, then treatment B will not have the opportunity to demonstrate its effectiveness when the patient crosses over to treatment B in the second period. Therefore this type of design works only for those conditions that are chronic, such as asthma where there is no cure and the treatments attempt to improve quality of life.

Crossover designs are the designs of choice for bioequivalence trials. The objective of a bioequivalence trial is to determine whether test and reference pharmaceutical formulations yield equivalent blood concentration levels. In these types of trials, we are not interested in whether there is a cure, this is a demonstration is that a new formulation, (for instance, a new generic drug), results in the same concentration in the blood system. Thus, it is highly desirable to administer both formulations to each subject, which translates into a crossover design.

Objectives

Upon completion of this lesson, you should be able to:

Distinguish between situations where a crossover design would or would not be advantageous.
Use the following terms appropriately: first-order carryover, sequence, period, washout, aliased effect.
State why an adequate washout period is essential between periods of a crossover study in terms of aliased effects.
Evaluate a crossover design as to its uniformity and balance and state the implications of these characteristics.
Understand and modify SAS programs for analysis of data from 2 × 2 crossover trials with continuous or binary data.
Provide an approach to analysis of event time data from a crossover study.
Distinguish between population bioequivalence, average bioequivalence and individual bioequivalence.
Relate the different types of bioequivalence to prescribability and switchability.

Reference:

Piantadosi Steven. (2005) Crossover Designs. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hobaken, NJ: John Wiley and Sons, Inc.

15.1 - Overview of the Crossover Designs

The order of treatment administration in a crossover experiment is called a sequence and the time of a treatment administration is called a period. Typically, the treatments are designated with capital letters, such as A, B, etc.

The sequences should be determined a priori and the experimental units are randomized to sequences. The most popular crossover design is the 2-sequence, 2-period, 2-treatment crossover design, with sequences AB and BA, sometimes called the 2 × 2 crossover design.

In this particular design, experimental units that are randomized to the AB sequence receive treatment A in the first period and treatment B in the second period, whereas experimental units that are randomized to the BA sequence receive treatment B in the first period and treatment A in the second period.

We express this particular design as AB|BA or diagram it as:

Design 1
	Period 1	Period 2
Sequence AB	A	B
Sequence BA	B	A

Examples of 3-period, 2-treatment crossover designs are:

Design 2
	Period 1	Period 2	Period 3
Sequence ABB	A	B	B
Sequence BAA	B	A	A

and

Design 3
	Period 1	Period 2	Period 3
Sequence AAB	A	A	B
Sequence ABA	A	B	A
Sequence BAA	B	A	A

Examples of 3-period, 3-treatment crossover designs are

Design 4
	Period 1	Period 2	Period 3
Sequence ABC	A	B	C
Sequence BCA	B	C	A
Sequence CAB	C	A	B

and

Design 5
	Period 1	Period 2	Period 3
Sequence ABC	A	B	C
Sequence BCA	B	C	A
Sequence CAB	C	A	B
Sequence ACB	A	C	B
Sequence BAC	B	A	C
Sequence CBA	C	B	A

Some designs even incorporate non-crossover sequences such as Balaam's design:

Design 6
	Period 1	Period 2
Sequence AB	A	B
Sequence BA	B	A
Sequence AA	A	A
Sequence BB	B	B

Balaam’s design is unusual, with elements of both parallel and crossover design. There are advantages and disadvantages to all of these designs; we will discuss some and the implications for statistical analysis as we continue through this lesson.

15.2 - Disadvantages

The main disadvantage of a crossover design is that carryover effects may be aliased (confounded) with direct treatment effects, in the sense that these effects cannot be estimated separately. You think you are estimating the effect of treatment A but there is also a bias from the previous treatment to account for. Significant carryover effects can bias the interpretation of data analysis, so an investigator should proceed cautiously whenever he/she is considering the implementation of a crossover design.

A carryover effect is defined as the effect of the treatment from the previous time period on the response at the current time period. In other words, if a patient receives treatment A during the first period and treatment B during the second period, then measurements taken during the second period could be a result of the direct effect of treatment B administered during the second period, and/or the carryover or residual effect of treatment A administered during the first period. These carryover effects yield statistical bias.

What can we do about this carryover effect?

The incorporation of lengthy washout periods in the experimental design can diminish the impact of carryover effects. A washout period is defined as the time between treatment periods. Instead of immediately stopping and then starting the new treatment, there will be a period of time where the treatment from the first period where the drug is washed out of the patient's system.

The rationale for this is that the previously administered treatment is “washed out” of the patient and, therefore, it can not affect the measurements taken during the current period. This may be true, but it is possible that the previously administered treatment may have altered the patient in some manner so that the patient will react differently to any treatment administered from that time onward. An example is when a pharmaceutical treatment causes permanent liver damage so that the patients metabolize future drugs differently. Another example occurs if the treatments are different types of educational tests. Then subjects may be affected permanently by what they learned during the first period.

How long of a washout period should there be?

In a trial involving pharmaceutical products, the length of the washout period usually is determined as some multiple of the half-life of the pharmaceutical product within the population of interest. For example, an investigator might implement a washout period equivalent to 5 (or more) times the length of the half-life of the drug concentration in the blood. The figure below depicts the half-life of a hypothetical drug.

Actually, it is not the presence of carryover effects per se that leads to aliasing with direct treatment effects in the AB|BA crossover, but rather the presence of differential carryover effects, i.e., the carryover effect due to treatment A differs from the carryover effect due to treatment B. If the carryover effects for A and B are equivalent in the AB|BA crossover design, then this common carryover effect is not aliased with the treatment difference. So, for crossover designs, when the carryover effects are different from one another, this presents us with a significant problem.

In the example of the educational tests, differential carryover effects could occur if test A leads to more learning than test B. Another situation where differential carryover effects may occur is in clinical trials where an active drug (A) is compared to placebo (B) and the washout period is of inadequate length. The patients in the AB sequence might experience a strong A carryover during the second period, whereas the patients in the BA sequence might experience a weak B carryover during the second period.

The recommendation for crossover designs is to avoid the problems caused by differential carryover effects at all costs by employing lengthy washout periods and/or designs where treatment and carryover are not aliased or confounded with each other. It is always much more prudent to address a problem a priori by using a proper design rather than a posteriori by applying a statistical analysis that may require unreasonable assumptions and/or perform unsatisfactorily. You will see this later on in this lesson...

For example, one approach for the statistical analysis of the 2 × 2 crossover is to conduct a preliminary test for differential carryover effects. If this is significant, then only the data from the first period are analyzed because the first period is free of carryover effects. Essentially you are throwing out half of your data!

If the preliminary test for differential carryover is not significant, then the data from both periods are analyzed in the usual manner. Recent work, however, has revealed that this 2-stage analysis performs poorly because the unconditional Type I error rate operates at a much higher level than desired. We won't go into the specific details here, but part of the reason for this is that the test for differential carryover and the test for treatment differences in the first period are highly correlated and do not act independently.

Even worse, this two-stage approach could lead to losing one-half of the data. If differential carryover effects are of concern, then a better approach would be to use a study design that can account for them.

Prior to the development of a general statistical model and investigations into its implications, we require more definitions.

15.3 - Definitions with a Crossover Design

First-order and Higher-order Carryover Effects

Within time period \(j, j = 2, \dots, p\), it is possible that there are carryover effects from treatments administered during periods \(1, \dots, j - 1\). Usually in period j we only consider first-order carryover effects (from period \(j - 1\)) because:

if first-order carryover effects are negligible, then higher-order carryover effects usually are negligible;
the designs needed for eliminating the aliasing between higher-order carryover effects and treatment effects are very cumbersome and not practical. Therefore, we usually assume that these higher-order carryover effects are negligible.

In actuality, the length of the washout periods between treatment administrations may be the determining factor as to whether higher-order carryover effects should be considered. We focus on designs for dealing with first-order carryover effects, but the development can be generalized if higher-order carryover effects need to be considered. We will focus on:

Uniformity

A crossover design is labeled as:

uniform within sequences if each treatment appears the same number of times within each sequence, and
uniform within periods if each treatment appears the same number of times within each period.

For example, AB/BA is uniform within sequences and period (each sequence and each period has 1 A and 1 B) while ABA/BAB is uniform within period but is not uniform within sequence because the sequences differ in the numbers of A and B.

If a design is uniform within sequences and uniform within periods, then it is said to be uniform. If the design is uniform across periods you will be able to remove the period effects. If the design is uniform across sequences then you will be also be able to remove the sequence effects. An example of a uniform crossover is ABC/BCA/CAB.

Latin Squares

Latin squares historically have provided the foundation for r-period, r-treatment crossover designs because they yield uniform crossover designs in that each treatment occurs only once within each sequence and once within each period. As will be demonstrated later, Latin squares also serve as building blocks for other types of crossover designs. Latin squares for 4-period, 4-treatment crossover designs are:

Design 7
	Period 1	Period 2	Period 3	Period 4
Sequence ABCD	A	B	C	D
Sequence BCDA	B	C	D	A
Sequence CDAB	C	D	A	B
Sequence DABC	D	A	B	C

and

Design 8
	Period 1	Period 2	Period 3	Period 4
Sequence ABCD	A	B	C	D
Sequence BDAC	B	D	A	C
Sequence CADB	C	A	D	B
Sequence DCBA	D	C	B	A

Latin squares are uniform crossover designs, uniform both within periods and within sequences. Although with 4 periods and 4 treatments there are \(4! = (4)(3)(2)(1) = 24\) possible sequences from which to choose, the Latin square only requires 4 sequences.

Balanced Designs

The Latin square in [Design 8] has an additional property that the Latin square in [Design 7] does not have. Each treatment precedes every other treatment the same number of times (once). For example, how many times is treatment A followed by treatment B? Only once. How many times do you have one treatment B followed by a second treatment? Only once. This is an advantageous property for Design 8. This same property does not occur in [Design 7]. When this occurs, as in [Design 8], the crossover design is said to be balanced with respect to first-order carryover effects.

Try it!

Look back through each of the designs that we have looked at thus far and determine whether or not it is balanced with respect to first-order carryover effects

The designs that are balanced with respect to first order carryover effects are:

Designs 1, 2, 3, 5, 6, 8.

When r is an even number, only 1 Latin square is needed to achieve balance in the r-period, r-treatment crossover. When r is an odd number, 2 Latin squares are required. For example, the design in [Design 5] is a 6-sequence, 3-period, 3-treatment crossover design that is balanced with respect to first-order carryover effects because each treatment precedes every other treatment twice.

Strongly Balanced Designs

A crossover design is said to be strongly balanced with respect to first-order carryover effects if each treatment precedes every other treatment, including itself, the same number of times. A strongly balanced design can be constructed by repeating the last period in a balanced design.

Here is an example:

Design 9
	Period 1	Period 2	Period 3	Period 4	Period 5
Sequence ABCDD	A	B	C	D	D
Sequence BDACC	B	D	A	C	C
Sequence CADBB	C	A	D	B	B
Sequence DCBAA	D	C	B	A	A

This is a 4-sequence, 5-period, 4-treatment crossover design that is strongly balanced with respect to first-order carryover effects because each treatment precedes every other treatment, including itself, once. Obviously, the uniformity of the Latin square design disappears because the design in [Design 9] is no longer is uniform within sequences.

Uniform and Strongly Balanced Design

Latin squares yield uniform crossover designs, but strongly balanced designs constructed by replicating the last period of a balanced design are not uniform crossover designs. The following 4-sequence, 4-period, 2-treatment crossover design is an example of a strongly balanced and uniform design.

Design 10
	Period 1	Period 2	Period 3	Period 4
Sequence ABBA	A	B	B	A
Sequence BAAB	B	A	A	B
Sequence AABB	A	A	B	B
Sequence BBAA	B	B	A	A

15.4 - Statistical Bias

Why are these properties important in statistical analysis?

We now investigate statistical bias issues. In other words, does a particular crossover design have any nuisance effects, such as sequence, period, or first-order carryover effects, aliased with direct treatment effects? We consider first-order carryover effects only. If the design incorporates washout periods of inadequate length, then treatment effects could be aliased with higher-order carryover effects as well, but let us assume the washout period was adequate for eliminating carryover beyond 1 treatment period.

The approach is very simple in that the expected value of each cell in the crossover design is expressed in terms of a direct treatment effect and the assumed nuisance effects. Then these expected values are averaged and/or differenced to construct the desired effects.

For example, in the 2 × 2 crossover design in [Design 1], if we include nuisance effects for sequence, period, and first-order carryover, then model for this would look like:

Design 11
	Period 1	Period 2
Sequence AB	\(\mu_A + \nu + \rho\)	\(\mu_B + \nu - \rho + \lambda_A\)
Sequence BA	\(\mu_B - \nu + \rho\)	\(\mu_A - \nu - \rho + \lambda_B\)

where \(\mu_A\) and \(\mu_B\) represent population means for the direct effects of treatments A and B, respectively, \(\nu\) represents a sequence effect, \(\rho\) represents a period effect, and \(\lambda_A\) and \(\lambda_B\) represent carryover effects of treatments A and B, respectively.

A natural choice of an estimate of \(\mu_A\) (or \(\mu_B\)) is simply the average over all cells where treatment A (or B) is assigned: [12]

\(\hat{\mu}_A=\dfrac{1}{2}\left( \bar{Y}_{AB, 1}+ \bar{Y}_{BA, 2}\right) \text{ and } \hat{\mu}_B=\dfrac{1}{2}\left( \bar{Y}_{AB, 2}+ \bar{Y}_{BA, 1}\right)\)

Will this give us a good estimate of the means across the treatment? Not quite...

The mathematical expectations of these estimates are as follows: [13]

\(E(\hat{\mu}_A)=\dfrac{1}{2}\left( \mu_A+\nu+\rho+\mu_A-\nu-\rho+ \lambda_B \right)=\mu_A +\dfrac{1}{2}\lambda_B\)

\(E(\hat{\mu}_B)=\dfrac{1}{2}\left( \mu_B+\nu-\rho+\mu_B-\nu+\rho+ \lambda_A \right)=\mu_B +\dfrac{1}{2}\lambda_A\)

\(E(\hat{\mu}_A-\hat{\mu}_B) = ( \mu_A-\mu_B) - \dfrac{1}{2}( \lambda_A- \lambda_B) \)

From [Design 13] it is observed that the direct treatment effects and the treatment difference are not aliased with sequence or period effects, but are aliased with the carryover effects.

The treatment difference, however, is not aliased with carryover effects when the carryover effects are equal, i.e., \(\lambda_A = \lambda_B\). The results in [13] are due to the fact that the AB|BA crossover design is uniform and balanced with respect to first-order carryover effects. Any crossover design which is uniform and balanced with respect to first-order carryover effects, such as the designs in [Design 5] and [Design 8], also exhibits these results.

Example

Consider the ABB|BAA design, which is uniform within periods, not uniform with sequences, and is strongly balanced.

[14]	Period 1	Period 2	Period 3
Sequence ABB	\(\mu_A + \nu + \rho_1\)	\(\mu_B + \nu + \rho_2 + \lambda_A\)	\(mu_B + \nu - \rho_1 - \rho_2 + \lambda_B\)
Sequence BAA	\(\mu_B - \nu + \rho_1\)	\(\mu_A - \nu + \rho_2 + \lambda_B\)	\(\mu_A - \nu - \rho_1 - \rho_2 + \lambda_A\)

A natural choice of an estimate of \(\mu_A\) (or \(\mu_B\)) is simply the average over all cells where treatment A (or B) is assigned: [15]

\(\hat{\mu}_A=\dfrac{1}{3}\left( \bar{Y}_{ABB, 1}+ \bar{Y}_{BAA, 2}+ \bar{Y}_{BAA, 3}\right) \text{ and } \hat{\mu}_B=\dfrac{1}{3}\left( \bar{Y}_{ABB, 2}+ \bar{Y}_{ABB, 3}+ \bar{Y}_{BAA, 1}\right)\)

The mathematical expectations of these estimates are solved to be: [16]

\( E(\hat{\mu}_A)=\mu_A+\dfrac{1}{3}(\lambda_A+ \lambda_B-\nu)\)

\( E(\hat{\mu}_B)=\mu_B+\dfrac{1}{3}(\lambda_A+ \lambda_B+\nu)\)

\( E(\hat{\mu}_A-\hat{\mu}_B)=(\mu_A-\mu_B)-\dfrac{2}{3}\nu\)

From [16], the direct treatment effects are aliased with the sequence effect and the carryover effects, whereas the treatment difference only is aliased with the sequence effect. The results in [16] are due to the ABB|BAA crossover design being uniform within periods and strongly balanced with respect to first-order carryover effects.

15.5 - Higher-order Carryover Effects

The lack of aliasing between the treatment difference and the first-order carryover effects does not guarantee that the treatment difference and higher-order carryover effects also will not be aliased or confounded. For example, let \(\lambda_{2A}\) and \(\lambda_{2B}\) denote the second-order carryover effects of treatments A and B, respectively, for the design in [Design 2] (Second-order carryover effects looks at the carryover effects of the treatment that took place previous to the prior treatment.):

Design 17
	Period 1	Period 2	Period 3
Sequence ABB	\(mu_A + \nu + rho_1\)	\(\mu_B + \nu + \rho_2 + \lambda_A\)	\(\mu_B + \nu - \rho_1 - \rho_2 + \lambda_B + \lambda_{2A}\)
Sequence BAA	\(mu_B - \nu + \rho_1\)	\(\mu_A - \nu + \rho_2 + \lambda_B\)	\(\mu_A - \nu - \rho_1 - \rho_2 + \lambda_A + \lambda_{2B}\)

[18] \( E(\hat{\mu}_A-\hat{\mu}_B)=(\mu_A-\mu_B)-\dfrac{2}{3}\nu-\dfrac{1}{3}(\lambda_{2A}-\lambda_{2B}) \)

The expectation of the treatment mean difference indicates that it is aliased with second-order carryover effects.

Summary of Impacts of Design Types

The ensuing remarks summarize the impact of various design features on the aliasing of direct treatment and nuisance effects.

If the crossover design is uniform within sequences, then sequence effects are not aliased with treatment differences.
If the crossover design is uniform within periods, then period effects are not aliased with treatment differences.
If the crossover design is balanced with respect to first-order carryover effects, then carryover effects are aliased with treatment differences. If the carryover effects are equal, then carryover effects are not aliased with treatment differences.
If the crossover design is strongly balanced with respect to first- order carryover effects, then carryover effects are not aliased with treatment differences.

Complex Carryover

The type of carryover effects we modeled here is called “simple carryover” because it is assumed that the treatment in the current period does not interact with the carryover from the previous period. “Complex carryover” refers to the situation in which such an interaction is modeled. For example, suppose we have a crossover design and want to model carryover effects. With simple carryover in a two-treatment design, there are two carryover parameters, namely, \(\lambda_A\) and \(\lambda_B\).

With complex carryover, however, there are four carryover parameters, namely, \(\lambda_{AB}, \lambda_{BA}, \lambda_{AA}\) and \(\lambda_{BB}\), where \(\lambda_{AB}\) represents the carryover effect of treatment A into a period in which treatment B is administered, \(\lambda_{BA}\) represents the carryover effect of treatment B into a period in which treatment A is administered, etc. As you might imagine, this will certainly complicate things!

15.6 - Implementation Overview

Obviously, it appears that an “ideal” crossover design is uniform and strongly balanced.

There are situations, however, where it may be reasonable to assume that some of the nuisance parameters are null, so that resorting to a uniform and strongly balanced design is not necessary (although it provides a safety net if the assumptions do not hold).

For example, some researchers argue that sequence effects should be null or negligible because they represent randomization effects. Another example occurs in bioequivalence trials where some researchers argue that carryover effects should be null. This is because blood concentration levels of the drug or active ingredient are monitored and any residual drug administered from an earlier period would be detected.

The message to be emphasized is that every proposed crossover trial should be examined to determine which, if any, nuisance effects may play a role. Once this determination is made, then an appropriate crossover design should be employed that avoids aliasing of those nuisance effects with treatment effects. This is a decision that the researchers should be prepared to address.

For example, an investigator wants to conduct a two-period crossover design, but is concerned that he will have unequal carryover effects so he is reluctant to invoke the 2 × 2 crossover design. If the investigator is not as concerned about sequence effects, then Balaam’s design in [Design 8] may be appropriate. Balaam’s design is uniform within periods but not within sequences, and it is strongly balanced. Therefore, Balaam’s design will not be adversely affected in the presence of unequal carryover effects.

Some researchers consider randomization in a crossover design to be a minor issue because a patient eventually undergoes all of the treatments (this is true in most crossover designs). Obviously, randomization is very important if the crossover design is not uniform within sequences because the underlying assumption is that the sequence effect is negligible. Randomization is important in crossover trials even if the design is uniform within sequences because biases could result from investigators assigning patients to treatment sequences.

At a minimum, it always is recommended to invoke a design that is uniform within periods because period effects are common. Period effects can be due to:

increased patient comfort in later periods with trial processes;
increased patient knowledge in later periods;
improvement in skill and technique of those researchers taking the measurements.

The following is a listing of various crossover designs with some, all, or none of the properties.


Uniform within Sequences	Uniform within Periods	Balanced	Strongly Balanced	Examples
no	no	no	no	AAB\|ABB, ABCC\|BCAA
yes	no	no	no	ABB\|BAB, ABC\|CBA
no	yes	no	no	ABCC\|BCAA\|CABB
no	no	yes	no	ABAA\|BAAB
no	no	yes	yes	AABBA\|BAABB
yes	yes	no	no	ABC\|BCA\|CAB
yes	no	yes	no	AABA\|ABAA
no	yes	yes	no	ABA\|BAB
yes	no	yes	yes	AABBA\|ABBAA
no	yes	yes	yes	ABB\|BAA, AB\|BA\|AA\|BB
yes	yes	yes	no	AB\|BA
yes	yes	yes	yes	ABBA\|BAAB\|AABB\|BBAA

It would be a good idea to go through each of these designs and diagram out what these would look like, the degree to which they are uniform and/or balanced. Make sure you see how these principles come into play!

15.7 - Statistical Precision

Now that we have examined statistical biases that can arise in crossover designs, we next examine statistical precision.

During the design phase of a trial, the question may arise as to which crossover design provides the best precision. For our purposes, we label one design as more precise than another if it yields a smaller variance for the estimated treatment mean difference.

Although a comparison of treatment means may be the primary interest of the experimenter, there may be other circumstances that affect the choice of an appropriate design. For example, later we will compare designs with respect to which designs are best for estimating and comparing variances.

At the moment, however, we focus on differences in estimated treatment means in two-period, two-treatment designs.

The two-period, two-treatment designs we consider here are the 2 × 2 crossover design AB|BA in [Design 1], Balaam's design AB|BA|AA|BB in [Design 6], and the two-period parallel design AA|BB.

In order for the resources to be equitable across designs, we assume that the total sample size, n, is a positive integer divisible by 4. Then:

\(\dfrac{1}{2}\)n patients will be randomized to each sequence in the AB|BA design
\(\dfrac{1}{2}\)n patients will be randomized to each sequence in the AA|BB design, and
\(\dfrac{1}{4}\)n patients will be randomized to each sequence in the AB|BA|AA|BB design.

Because the designs we are considering involve repeated measurements on patients, the statistical modeling must account for between-patient variability and within-patient variability.

Between-patient variability accounts for the dispersion in measurements from one patient to another. Within-patient variability accounts for the dispersion in measurements from one time point to another within a patient. Within-patient variability tends to be smaller than between-patient variability.

The variance components we model are as follows:

\(W_{AA}\) = between-patient variance for treatment A;
\(W_{BB}\) = between-patient variance for treatment B;
\(W_{AB}\) = between-patient covariance between treatments A and B;
\(\sigma_{AA}\) = within-patient variance for treatment A;
\(\sigma_{BB}\) = within-patient variance for treatment B.

The following table provides expressions for the variance of the estimated treatment mean difference for each of the two-period, two-treatment designs:

Design	Variance
Crossover	\(\dfrac{\sigma^2}{n} = \dfrac{1.0(W_{AA} + W_{BB}) - 2.0(W_{AB}) + (\sigma_{AA} + \sigma_{BB})}{n}\)
Balaam	\(\dfrac{\sigma^2}{n} = \dfrac{1.5(W_{AA} + W_{BB}) - 1.0(W_{AB}) + (\sigma_{AA} + \sigma_{BB})}{n}\)
Parallel	\(\dfrac{\sigma^2}{n} = \dfrac{2.0(W_{AA} + W_{BB}) - 0.0(W_{AB}) + (\sigma_{AA} + \sigma_{BB})}{n}\)

Under most circumstances, \(W_{AB}\) will be positive, so we assume this is so for the sake of comparison. Not surprisingly, the 2 × 2 crossover design yields the smallest variance for the estimated treatment mean difference, followed by Balaam's design and then the parallel design.

The investigator needs to consider other design issues, however, prior to selecting the 2 × 2 crossover. In particular, if there is any concern over the possibility of differential first-order carryover effects, then the 2 × 2 crossover is not recommended. In this situation, the parallel design would be a better choice than the 2 × 2 crossover design. Balaam's design is strongly balanced so that the treatment difference is not aliased with differential first-order carryover effects, so it also is a better choice than the 2 × 2 crossover design.

With respect to a sample size calculation, the total sample size, n, required for a two-sided, \(\alpha\) significance level test with \(100 \left(1 - \beta \right)\%\) statistical power and effect size \(\mu_A - \mu_B\) is:

\(n=(z_{1-\alpha/2}+z_{1-\beta})^2 \sigma2/(\mu_A -\mu_B)^2 \)

Suppose that an investigator wants to conduct a two-period trial but is not sure whether to invoke a parallel design, a crossover design, or Balaam's design. He wants to use a 0.05 significance level test with 90% statistical power for detecting the effect size of \(\mu_A - \mu_B= 10\). From published results, the investigator assumes that:

\(W_{AA} = W_{BB} = W_{AB} = 400\), and

\(\sigma_{AA} = \sigma_{BB}\) = 100

The sample sizes for the three different designs are as follows:

Parallel n = 190

Balaam n = 105

Crossover n = 21

The crossover design yields a much smaller sample size because the within-patient variances are one-fourth that of the inter-patient variances (which is not unusual).

Another issue in selecting a design is whether the experimenter wishes to compare the within-patient variances\(\sigma_{AA}\) and \(\sigma_{BB}\).

For the 2 × 2 crossover design, the within-patient variances can be estimated by imposing restrictions on the between-patient variances and covariances. The resultant estimators of\(\sigma_{AA}\) and \(\sigma_{BB}\), however, may lack precision and be unstable. Hence, the 2 × 2 crossover design is not recommended when comparing\(\sigma_{AA}\) and \(\sigma_{BB}\) is an objective.

The parallel design provides an optimal estimation of the within-unit variances because it has ½ n patients who can provide data in estimating each of\(\sigma_{AA}\) and \(\sigma_{BB}\), whereas Balaam's design has ¼ n patients who can provide data in estimating each of\(\sigma_{AA}\) and \(\sigma_{BB}\). Again, Balaam's design is a compromise between the 2 × 2 crossover design and the parallel design.

15.8 - Analysis - Continuous Outcome

The statistical analysis of normally-distributed data from a 2 × 2 crossover trial, under the assumption that the carryover effects are equal \(\left(\lambda_A = \lambda_A = \lambda\right)\), is relatively straightforward.

Remember the statistical model we assumed for continuous data from the 2 × 2 crossover trial:

Design 11
Period 1	Period 2
Sequence AB	\(\mu_A + \nu + \rho\)	\(\mu_B + \nu - \rho + \lambda_A\)
Sequence BA	\(mu_B - \nu + \rho\)	\(mu_A - \nu - \rho + \lambda_B\)

For a patient in the AB sequence, the Period 1 vs. Period 2 difference has expectation \(\mu_{AB} = \mu_A - \mu_B + 2\rho - \lambda\).

For a patient in the BA sequence, the Period 1 vs. Period 2 difference has expectation \(\mu_{BA} = \mu_B - \mu_A + 2\rho - \lambda\).

Therefore, we construct these differences for every patient and compare the two sequences with respect to these differences using a two-sample t test or a Wilcoxon rank sumtest. Thus, we are testing:

\(H_0 \colon \mu_{AB} - \mu_{BA} = 0\)

The expression:

\(\mu_{AB} - \mu_{BA} = 2\left( \mu_A - \mu_B \right)\)

so testing \(H_0 \colon \mu_{AB} - \mu_{BA} = 0\), is equivalent to testing:

\(H_0 \colon \mu_A - \mu_B = 0\)

To get a confidence interval for \(\mu_A - \mu_B\) , simply multiply each difference by ½ prior to constructing the confidence interval for the difference in population means for two independent samples.

SAS® Example

Analysis of the data from a 2x2 crossover using SAS

(16.1_-_2x2_crossover__contin.sas )

This is an example of an analysis of the data from a 2 × 2 crossover trial. The example is taken from Example 3.1 from Senn's book (Senn S. Cross-over Trials in Clinical Research , Chichester, England: John Wiley & Sons, 1993). The data set consists of 13 children enrolled in a trial to investigate the effects of two bronchodilators, formoterol and salbutamol, in the treatment of asthma. The outcome variable is peak expiratory flow rate (liters per minute) and was measured eight hours after treatment. There was a one-day washout period between treatment periods.

*************************************************************************
* This is an example of an analysis of the data from a 2x2 crossover    *
* using SAS.  The example is taken from Example 3.1 of                  *
*    Senn, S.  (1993).  Cross-over Trials in Clinical Research.         *
*    Chichester, England:  John Wiley & Sons.                           *
*                                                                       *
* The data set consists of 13 children enrolled in a trial to           *
* investigate the effects of two bronchodilators in the treatment of    *
* asthma.  The outcome variable is peak expiratory flow rate (liters    *
* per minute) and was measured eight hours after treatment.  There was  *
* a one-day washout period between treatment periods.                   *
*************************************************************************;

proc format;
value trtfmt 1='Salbutamol' 2='Formoterol';
run;

data senn;
input patient sequence $ salbutamol formoterol;
cards;
01 FS 270 310
02 SF 370 385
03 SF 310 400
04 FS 260 310
05 SF 380 410
06 FS 300 370
07 FS 390 410
09 SF 290 320
10 FS 210 250
11 FS 350 380
12 SF 260 340
13 SF  90 220
14 FS 365 330
;
run;

proc print data=senn;
title 'Formoterol vs. Salbutamol in the 2x2 Crossover Trial';
run;

*************************************************************************
* Construct the intra-subject differences (times one-half) within each  *
* sequence.  Then perform the statistical analysis.                     *
*************************************************************************;

data diff;
set senn;
if sequence='FS' then diff=0.5*(formoterol-salbutamol);
if sequence='SF' then diff=0.5*(salbutamol-formoterol);
run;

proc sort data=diff;
by sequence;
run;

proc univariate data=diff normal plot;
by sequence;
var diff;
title2 'Descriptive Statistics and Graphics for Treatment Difference';
run;

proc ttest data=diff;
class sequence;
var diff;
title2 'Parametric Analysis';
run;

proc npar1way data=diff wilcoxon;
class sequence;
var diff;
title2 'Nonparametric Analysis';
run;

The estimated treatment mean difference was 46.6 L/min in favor of formoterol \(\left(p = 0.0012\right)\) and the 95% confidence interval for the treatment mean difference is (22.9, 70.3). The Wilcoxon rank sumtest also indicated statistical significance between the treatment groups \(\left(p = 0.0276\right)\).

15.9 - Analysis - Binary Outcome

Suppose that the response from a crossover trial is binary and that there are no period effects. Then the probabilities of response are:

	Failure on B	Success on B	marginal probabilities
Failure on A	\(p_{00}\)	\(p_{01}\)	\(p_{0.}\)
Success on A	\(p_{10}\)	\(p_{11}\)	\(p_{1.}\)
marginal probabilities	\(p_{.0}\)	\(p_{.1}\)

The probability of success on treatment A is \(p_{1.}\) and the probability of success on treatment B is \(p_{.1}\) testing the null hypothesis:

\(H_{0} : p_{1.} - p_{.1} = 0\)

is the same as testing:

\(H_{0} : p_{1.} - p_{.1} = (p_{10} + p_{11}) - (p_{01} + p_{11}) = p_{10} - p_{01} = 0\)

This indicates that only the patients who display a (1,0) or (0,1) response contribute to the treatment comparison. For instance, if they failed on both, or were successful on both, there is no way to determine which treatment is better. Therefore we will let:

	Failure on B	Success on B
Failure on A	\(n_{00}\)	\(n_{01}\)
Success on A	\(n_{10}\)	\(n_{11}\)

denote the frequency of responses from the study data instead of the probabilities listed above.

McNemar's test for this situation is as follows. Given the number of patients who displayed a treatment preference, \(n_{10} + n_{01}\) , then \(n_{10}\) follows a binomial \(\left(p, n_{10} + n_{01}\right)\) distribution and the null hypothesis reduces to testing:

\(H_{0} : p = 0.5\)

i.e., we would expect a 50-50 split in the number of patients that would be successful with either treatment in support of the null hypothesis, looking at only the cells where there was success with one treatment and failure with the other. The data in cells for both success or failure with both treatment would be ignored.

SAS® Example

Analysis of the data from a 2x2 crossover for a binary outcome, assuming null period effects

(16.2_-_2x2_crossover__binary.sas )

This is an example of an analysis of the data from a 2 × 2 crossover trial with a binary outcome of failure/success. Fifty patients were randomized and the following results were observed:

	Failure on B	Success on B
Failure on A	21	15
Success on A	7	7

Thus, 22 patients displayed a treatment preference, of which 7 preferred A and 15 preferred B. McNemar's test, however, indicated that this was not statistically significant (exact \(p = 0.1338\)).

*************************************************************************
* This is an example of an analysis of the data from a 2x2 crossover    *
* for a binary outcome, assuming null period effects.                   *
*************************************************************************;

proc format;
value outfmt 0='Failure' 1='Success';
run;

data example;
input patient sequence $ treatment_A treatment_B;
format treatment_A treatment_B outfmt.;
cards;
01 AB 0 0
02 AB 1 0
03 AB 0 0
04 AB 0 1
05 AB 1 0
06 AB 0 0
07 AB 0 0
08 AB 0 0
09 AB 1 1
10 AB 0 1
11 AB 0 0
12 AB 1 0
13 AB 0 0
14 AB 0 0
15 AB 0 1
16 AB 0 0
17 AB 1 1
18 AB 0 1
19 AB 1 1
20 AB 0 1
21 AB 1 1
22 AB 0 1
23 AB 0 0
24 AB 1 1
25 AB 0 1
26 BA 0 1
27 BA 0 0
28 BA 0 0
29 BA 1 0
30 BA 0 1
31 BA 0 0
32 BA 0 1
33 BA 1 0
34 BA 0 1
35 BA 0 0
36 BA 1 0
37 BA 0 0
38 BA 0 1
39 BA 0 0
40 BA 0 1
41 BA 1 0
42 BA 0 1
43 BA 0 0
44 BA 1 1
45 BA 0 0
46 BA 0 1
47 BA 0 0
48 BA 1 1
49 BA 0 0
50 BA 0 0
;
run;

proc freq data=example;
tables treatment_A*treatment_B/agree;
exact McNem;
title "McNemar's Test for a Binary Outcome in a 2 x 2 Crossover Trial";
run;

A problem that can arise from the application of McNemar's test to the binary outcome from a 2 × 2 crossover trial can occur if there is non-negligible period effects. If that is the case, then the treatment comparison should account for this. This is possible via logistic regression analysis.

The Rationale:

The probability of a 50-50 split between treatment A and treatment B preferences under the null hypothesis is equivalent to the odds ratio for the treatment A preference to the treatment B preference being 1.0. Because logistic regression analysis models the natural logarithm of the odds, testing whether there is a 50-50 split between treatment A preference and treatment B preference is comparable to testing whether the intercept term is null in a logistic regression analysis.

To account for the possible period effect in the 2 × 2 crossover trial, a term for period can be included in the logistic regression analysis.

SAS® Example

Analysis of data from a 2x2 crossover for a binary outcome, assuming nonnull period effects

(16.3_-_2x2_crossover__binary.sas )

Use the same data set from SAS Example 16.2 only now it is partitioned as to patients within the two sequences:

Sequence AB	Failure on B	Success on B
Failure on A	10	7
Success on A	3	5

Sequence BA	Failure on B	Success on B
Failure on A	11	8
Success on A	4	2

*************************************************************************
* This is an example of an analysis of the data from a 2x2 crossover    *
* for a binary outcome, assuming nonnull period effects.                *
*************************************************************************;

proc format;
value outfmt 0='Failure' 1='Success';
value preffmt 1='A' -1='B';
run;

data example;
input patient sequence $ treatment_A treatment_B;
cards;
01 AB 0 0
02 AB 1 0
03 AB 0 0
04 AB 0 1
05 AB 1 0
06 AB 0 0
07 AB 0 0
08 AB 0 0
09 AB 1 1
10 AB 0 1
11 AB 0 0
12 AB 1 0
13 AB 0 0
14 AB 0 0
15 AB 0 1
16 AB 0 0
17 AB 1 1
18 AB 0 1
19 AB 1 1
20 AB 0 1
21 AB 1 1
22 AB 0 1
23 AB 0 0
24 AB 1 1
25 AB 0 1
26 BA 0 1
27 BA 0 0
28 BA 0 0
29 BA 1 0
30 BA 0 1
31 BA 0 0
32 BA 0 1
33 BA 1 0
34 BA 0 1
35 BA 0 0
36 BA 1 0
37 BA 0 0
38 BA 0 1
39 BA 0 0
40 BA 0 1
41 BA 1 0
42 BA 0 1
43 BA 0 0
44 BA 1 1
45 BA 0 0
46 BA 0 1
47 BA 0 0
48 BA 1 1
49 BA 0 0
50 BA 0 0
;
run;

data example2;
set example;
preference=treatment_A - treatment_B;
if preference=0 then delete;
format preference preffmt.; 
if sequence='AB' & preference=1 then period2=0;
if sequence='AB' & preference=-1 then period2=1;
if sequence='BA' & preference=1 then period2=1;
if sequence='BA' & preference=-1 then period2=0;
run;

proc logistic data=example2;
model preference=period2;
exact 'intercept' intercept;
exact 'period2' period2;
title "Logistic Regression Analysis for a Binary Outcome in a 2 x 2 Crossover Trial";
run;

The logistic regression analysis yielded a nonsignificant result for the treatment comparison (exact \(p = 0.2266\)). There is still no significant statistical difference to report.

15.10 - Analysis - Time-to-Event Outcome

You don't often see a cross-over design used in a time-to-event trial. If the event is death, the patient would not be able to cross-over to a second treatment. Even when the event is treatment failure, this often implies that patients must be watched closely and perhaps rescued with other medicines when event failure occurs.

When it is implemented, a time-to-event outcome within the context of a 2 × 2 crossover trial actually can reduce to a binary outcome score of preference. Suppose that in a clinical trial, time to treatment failure is determined for each patient when receiving treatment A and treatment B.

If the time to treatment failure on A equals that on B, then the patient is assigned a (0,0) score and displays no preference.
If the time to treatment failure on A is less than that on B, then the patient is assigned a (0,1) score and prefers B.
If the time to treatment failure on B is less than that on A, then the patient is assigned a (1,0) score and prefers A.
If the patient does not experience treatment failure on either treatment, then the patient is assigned a (1,1) score and displays no preference.

Hence, we can use the procedures which we implemented with binary outcomes.

15.11 - Analysis - More Complex Designs

The analysis of continuous, binary, and time-to-event outcome data from a design more complex than the 2 × 2 crossover is not as straightforward as that for the 2 × 2 crossover design.

With respect to a continuous outcome, the analysis involves a mixed-effects linear model (SAS PROC MIXED) to account for the repeated measurements that yield period, sequence, and carryover effects and to model the various sources of intra-patient and inter-patient variability.

With respect to a binary outcome, the analysis involves generalized estimating equations (SAS PROC GENMOD) to account for the repeated measurements that yield period, sequence, and carryover effects and to model the various sources of intra-patient and inter-patient variability.

In either case, with a design more complex than the 2 × 2 crossover, extensive modeling is required.

15.12 - Bioequivalence Trials

The objective of a bioequivalence trial is to determine whether test (T) and reference (R) formulations of a pharmaceutical product are "equivalent" with respect to blood concentration × time profiles.

Bioequivalence trials are of interest in two basic situations:

Company A demonstrates the safety and efficacy of a drug formulation, but wishes to market a more convenient formulation, ( i.e., an injection vs a time-release capsule). This situation is less common.
Company B wishes to market a drug formulation similar to the approved formulation of Company A with an expired patent. Company B has to prove that they can deliver the same amount of active drug into the blood stream which the approved formula does.

Pharmaceutical scientists use crossover designs for such trials in order for each trial participant to yield a profile for both formulations. The blood concentration × time profile is a multivariate response and is a surrogate measure of therapeutic response. The pharmaceutical company does not need to demonstrate the safety and efficacy of the drug because that already has been established.

Are the reference and test blood concentration × time profiles similar? The test formulation could be toxic if it yields concentration levels higher than the reference formulation. On the other hand, the test formulation could be ineffective if it yields concentration levels lower than the reference formulation.

Typically, pharmaceutical scientists summarize the rate and extent of drug absorption with summary measurements of the blood concentration × time profile, such as area under the curve (AUC), maximum concentration (CMAX), etc. These summary measurements are subjected to statistical analysis (not the profiles) and inferences are drawn as to whether or not the formulations are bioequivalent.

There are numerous definitions for what is meant by bioequivalence:

population bioequivalence - the formulations are equivalent with respect to their underlying probability distributions. You want the see that the AUC or CMAX distributions would be similar.
average bioequivalence - the formulations are equivalent with respect to the means (medians) of their probability distributions.
individual bioequivalence - the formulations are equivalent for a large proportion of individuals in the population. i.e., how well do the AUC's and CMAX compare across patients?

Prescribability means that a patient is ready to embark on a treatment regimen for the first time, so that either the reference or test formulations can be chosen. Switchability means that a patient, who already has established a regimen on either the reference or test formulation, can switch to the other formulation without any noticeable change in efficacy and safety.

Prescribability requires that the test and reference formulations are population bioequivalent, whereas switchability requires that the test and reference formulations have individual bioequivalence.

Currently, the USFDA only requires pharmaceutical companies to establish that the test and reference formulations are average bioequivalent. It is felt that most consumers, however, assume bioequivalence refers to individual bioequivalence, and that switching formulations does not lead to any health problems.

The hypothesis testing problem for assessing average bioequivalence is stated as:

\(H_0 : { \dfrac{\mu_T}{ \mu_R} ≤ \Psi_1 \text{ or } \dfrac{\mu_T}{ \mu_R} ≥ \Psi_2 }\) vs. \(H_1 : {\Psi_1 < \dfrac{\mu_T}{ \mu_R} < \Psi_2 }\)

where \(\mu_T\) and \(\mu_R\) represent the population means for the test and reference formulations, respectively, and \(\Psi_1\) and \(\Psi_2\) are chosen constants.

The FDA recommended values are \(\Psi_1 = 0.80\) and \(\Psi_2 = 1.25\), ( i.e., the ratios 4/5 and 5/4), for responses such as AUC and CMAX which typically follow lognormal distributions.

Thus, a logarithmic transformation typically is applied to the summary measure, the statistical analysis is performed for the crossover experiment, and then the two one-sided testing approach or corresponding confidence intervals are calculated for the purposes of investigating average bioequivalence.

SAS® Example

Assessment of average bioequivalence from a 2x2 crossover design

( 16.4_-_bioequivalence.sas )

Test and reference formulations were studied in a bioequivalence trial that used a 2 × 2 crossover design. There were 28 healthy volunteers, (instead of patients with disease), who were randomized (14 each to the TR and RT sequences). AUC and CMAX were measured and transformed via the natural logarithm.

*************************************************************************
* This is a SAS program that illustrates the assessment of average      *
* bioequivalence from a 2x2 crossover design                            *
*************************************************************************;

data bioequiv;
input subject sequence $ AUC_T CMAX_T AUC_R CMAX_R;
logAUC_T=log(AUC_T);
logCMAX_T=log(CMAX_T);
logAUC_R=log(AUC_R);
logCMAX_R=log(CMAX_R);
cards;
01 TR  4321.7  360.11  4415.3  278.48
02 RT  2139.5  326.19  8846.2  605.45
03 RT  8295.5  578.66  7399.1  608.67
04 TR  5974.9  412.69  3071.2  196.19
05 TR  5121.9  423.98  7513.4  681.34
06 TR 18750.4 1359.86 17096.2 1191.91
07 RT  5609.7 1046.37  4498.8  392.45
08 RT  9040.9  733.80  7847.2  571.51
09 TR 21120.1 2403.67 22603.2 1449.63
10 RT  4865.3  669.17  8127.3  738.77
11 RT  7244.6  717.01  6249.4  364.80
12 TR  5543.7  409.05  4915.3  468.31
13 RT  9748.2  907.64 12218.5  884.73
14 RT 10257.4  995.68  6021.4  557.16
15 TR 10996.2 2434.96 15284.0 2509.14
16 TR  8031.0  476.31  2495.1  265.09
17 TR  6527.5  565.66 14736.4 1070.06
18 RT  6385.5  579.72  5118.7  483.04
19 RT  4077.1  297.07  2068.3  123.90
20 TR  7088.0 1334.96  6363.2 1064.23
21 RT 10689.3 1702.07 36108.5 2552.32
22 RT  4615.1  656.33  4551.9  403.34
23 TR  3792.8  441.33 10444.7 2057.38
24 TR  9429.5 1458.82 21239.7 1899.42
25 TR  4272.4  500.45  7247.8  945.48
26 RT  9442.7  905.35  3872.8  247.49
27 RT  7373.2  688.03  5522.3  449.56
28 TR  1883.5  139.61  3299.3  213.63
;
run;

*************************************************************************
* Construct the intra-subject differences (times one-half) within each  *
* sequence.  Then perform the statistical analysis.                     *
*************************************************************************;

data diff;
set bioequiv;
if sequence='TR' then logAUC_diff=0.5*(logAUC_T-logAUC_R);
if sequence='RT' then logAUC_diff=0.5*(logAUC_R-logAUC_T);
if sequence='TR' then logCMAX_diff=0.5*(logCMAX_T-logCMAX_R);
if sequence='RT' then logCMAX_diff=0.5*(logCMAX_R-logCMAX_T);
run;

proc ttest data=diff alpha=0.10;
class sequence;
var logAUC_diff logCMAX_diff;
title2 'Parametric Analysis';
run;

The analysis yielded the following results:

	AUC	CMAX
Est for \(\text{log}_e \dfrac{\mu_R}{\mu_T}\)	0.0893	-0.104
Est for \(\dfrac{\mu_R}{\mu_T}\)	1.09	0.90
95% CI for \(\text{log}_e \dfrac{\mu_R}{\mu_T}\)	(-0.113, 0.294)	(-0.289, 0.080)
95% CI for \(\dfrac{\mu_R}{\mu_T}\)	(0.89, 1.34)	(0.75, 1.08)

Neither 90% confidence interval lies within (0.80, 1.25) specified by the USFDA, therefore bioequivalence cannot be concluded in this example and the USFDA would not allow this company to market their generic drug. Both CMAX and AUC are used because they summarize the desired equivalence.

15.13 - Summary

In this lesson, among other things, we learned:

Distinguish between situations where a crossover design would or would not be advantageous.
Use the following terms appropriately: first-order carryover, sequence, period, washout, aliased effect.
State why an adequate washout period is essential between periods of a crossover study in terms of aliased effects.
Evaluate a crossover design as to its uniformity and balance and state the implications of these characteristics.
Understand and modify SAS programs for analysis of data from 2x2 crossover trials with continuous or binary data.
Provide an approach to analysis of event time data from a crossover study.
Distinguish between population bioequivalence, average bioequivalence and individual bioequivalence.
Relate the different types of bioequivalence to prescribability and switchability

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility