Lesson 3: Clinical Trial DesignsLesson 3: Clinical Trial Designs
Experimental design originated in agricultural research and influenced laboratory and industrial research before being applied to trials of pharmaceuticals in humans. Experimental design is characterized by control of the experimental process to reduce experimental error, replication of the experiment to estimate variability in the response, and randomization. For example, in comparing the yields of two varieties of corn, the experimenter uses the same type of corn planter and the same fertilizer and weed control methods in each test plot. Multiple plots of ground are planted with the two varieties of corn. The assignment of a seed variety to a test plot is randomized.
Clinical trial design has its roots in classical experimental design, yet has some different features. The clinical investigator is not able to control as many sources of variability through design as a laboratory or industrial experimenter. Human responses to medical treatments display greater variability than observations from experiments in genetically identical plants and animals or measuring the effects of tightly-controlled physical and chemical processes. And of course, ethical issues are paramount in clinical research. To study a clinical response with adequate precision, a trial may require lengthy periods for patient accrual and follow-up. It is unlikely to enroll all the study subjects on the same day. There is an opportunity for study volunteers to decide to no longer participate.
Each of these issues will be considered as we extend the classical experimental design to clinical trials.
Let's get started!
- State 6 general objectives that will be met with proper trial design.
- Name at least 6 sources of potential bias in clinical studies.
- Suggest design strategies to reduce bias, variability and ‘placebo effects’ in a proposed clinical study.
- Compare and contrast the following study designs with respect to the ability of the investigator to minimize bias: Case report or case series, database analysis, prospective cohort study, case-control study, parallel design clinical trial, crossover clinical trial.
- Identify the experimental unit in a proposed study.
- Differentiate between Phase I - IV trials.
- Recognize features that should be described in a written protocol for a clinical trial.
- Recognize confounding in a clinical study proposal.
- Identify characteristics and purposes of translational studies.
3.1 - Clinical Trial Design3.1 - Clinical Trial Design
Good trial design and conduct are far more important than selecting the correct statistical analysis. When a trial is well designed and properly conducted, statistical analyses can be performed, modified, and if necessary, corrected. On the other hand, inaccuracy (bias) and imprecision (large variability) in estimating treatment effects, the two major shortcomings of poorly designed and conducted trials, cannot be ameliorated after the trial.. Skillful statistical analysis cannot overcome basic design flaws.
Piantadosi (2005) lists the following advantages of proper design:
- Allows investigators to satisfy ethical constraints
- Permits efficient use of scarce resources
- Isolates the treatment effect of interest from confounders
- Controls precision
- Reduces selection bias and observer bias.
- Minimizes and quantifies random error or uncertainty
- Simplifies and validates the analysis
- Increases the external validity of the trial
The objective of most clinical trials is to estimate the magnitude of treatment effects or estimate differences in treatment effects. Precise statements about observed treatment effects are dependent on a study design that allows the treatment effect to be sorted out from person-to-person variability in response. An accurate estimate requires a study design that minimizes bias.
Piantadosi (2005) states that clinical trial design should accomplish the following:
- Quantify and reduce errors due to chance
- Reduce or eliminate bias
- Yield clinically relevant estimates of effects and precision
- Be simple in design and analysis
- Provide a high degree of credibility, reproducibility, and external validity
- Influence future clinical practice
3.2 - Controlled Clinical Trials Compared to Observational Studies3.2 - Controlled Clinical Trials Compared to Observational Studies
Medical research, as a scientific investigation, is based on careful observation and theory. Theory directs the observation and provides a basis for interpreting the results. The strength of the evidence from a clinical study is proportional to amount of the control of bias and variability when the study was conducted as well as the magnitude of the observed effect. Clinical studies can be characterized as uncontrolled observations, observational comparative and controlled clinical trials.
Case reports and case-series are uncontrolled observational studies.
A case report only demonstrates that a clinical event of interest is possible. In a case report, there is no control of treatment assignment, endpoint ascertainment, or confounders. There is no control group for the sake of comparison. The report is descriptive in nature, not a formal statistical analysis.
Case reports are useful in generating hypotheses for future testing. For example, a physician may report that a patient in his practice, who was taking a specific anorexic drug, developed primary pulmonary hypertension (PPH), a rare condition that occurs in 1-2 out of every million Americans. Is this convincing evidence that the anorexic drug causes PPH?
A case series carries more weight than a single case report, but cannot prove the efficacy of a treatment. Case series and case reports are susceptible to large selection biases. Consider the example of laetrile, an apricot pit extract that was reputed to cure cancer. Seven case series were reported; the strength of evidence from these studies has been summarized by the US National Cancer Institute (NCI). While a proportion of patients may have experienced spontaneous remission of cancer, rigorous testing in controlled environments was never performed. After an estimated 70,000 patients had been treated, the NCI undertook a retrospective analysis of laetrile only to decide no definite conclusions supporting anti-cancer activity could be made (Special Report on Laetrile: The NCI Laetrile Review). The Cochrane review on laetrile (2015), states, “ there is no reliable evidence for the alleged effects of laetrile or amygdalin for curative effects in cancer patients.” Based on a series of reported cases, many believed laetrile would cure their cancer, perhaps refusing other effective treatments, and subjecting themselves to adverse effects of cyanide, for many years, this continued for many years with anti-tumor efficacy of laetrile unsupported while associated adverse effects were coming to light.
A database analysis is similar to a case series but may have a control group, depending on the data source. The source and quality of the data used for this secondary analysis are key. If the analysis attempts to evaluate treatment differences from data in which treatment assignment was based on physician and patient discretion, nonrandomized and open-label, bias is likely.
Databases are best used to study patterns with exploratory statistical analyses. For example, the NIH sponsored a database analysis of interstitial cystitis (IC) during the 1990s. This consisted of data from over 400 individuals with IC who underwent various and numerous therapies for their condition. The objective of the database analysis was to determine if there were patterns of treatments that may be effective in treating the disease. (Rovner et al. 2000).
As another example, in the case of genomic research, specific data mining tools have been developed to search for patterns in large databases of genetic data, leading to the discovery of particular candidate genes.
An epidemiologic study is often a case-control or a cohort design, both comparative observational studies. An observational study lacks the key component of an experiment, namely, control over treatment assignment. Commonly these designs are used in assessing the influence of risk factors for a disease. Subjects meeting entrance criteria may have been identified through a database search. The choice of the control group is a crucial design component in observational studies.
In a case-control study, the investigator identifies cases (subjects with the disease) and controls (subjects without the disease) and retrospectively assesses some type of treatment or exposure. Because the investigator has selected the cases and controls, relative risk cannot be calculated directly from a case-control study.
In addition, levels of treatment or exposure may be recorded based on a subject’s recall of events that occurred many years previously, thus recall bias,(systematic differences in accuracy or completeness of recall) can affect the study results.
Prospective Cohort Study
In a prospective cohort study, individuals are followed forward in time with subsequent evaluations to determine which individuals develop into cases. The relationship of specific risk factors that were measured at baseline with the subsequent outcome is assessed. The cohort study may consist of one or more samples with particular risk factors, called cohorts. It is possible to control some sources of bias in a prospective cohort study by following standard procedures in collecting data and ascertaining endpoints. Since the subjects are not assigned risk factors in a randomized manner, however, there may remain covariates that are confounded with a risk factor. Sometimes, a particular treatment group (or groups) from a randomized trial is followed as a cohort, providing a cohort in which the treatment was assigned at random.
Prospective studies tend to have fewer design problems and less bias than retrospective studies, but they are more expensive with respect to time and cost.
An example of a case-control study: A cardiologist identifies 36 patients currently in his practice with a specific form of cardiac valve disease. He identifies another group of relatively healthy patients and matches two of them to each of the patients with cardiac valve disease according to age (± 5years) and BMI (± 2.5). He plans to interview all 36 + 72 = 108 patients to assess their use of diet drugs during the past ten years.
A classic example of a cohort study: U.S. National Heart Lung and Blood Institute Framingham Heart Study
Piantodosi (2005) lists the following conditions for convincing non-experimental comparative studies:
- The treatment of interest occurs naturally.
- The study subjects provide valid observations for the biological question.
- The natural history of the disease with standard therapy, or in the absence of therapy, is known.
- The effect of the treatment is large enough to overshadow random error and bias.
- Evidence of efficacy is consistent with biological knowledge.
Controlled Clinical Trial
A controlled clinical trial contains all of the key components of a true experimental design. Treatments are assigned by design; administration of treatment and endpoint ascertainment follows a protocol. When properly designed and conducted, especially with the use of randomization and masking, the controlled clinical trial instills confidence that bias has been minimized. Replication of a controlled clinical trial, if congruent with the results of the first clinical trial, provides verification.
3.3 - Experimental Design Terminology3.3 - Experimental Design Terminology
In experimental design terminology, the "experimental unit" is randomized to the treatment regimen and receives the treatment directly. The "observational unit" has measurements taken on it. In most clinical trials, the experimental units and the observational units are one and the same, namely, the individual patient
One exception to this is a community intervention trial in which communities, e.g., geographic regions, are randomized to treatments. For example, communities (experimental units) might be randomized to receive different formulations of a vaccine, whereas the effects are measured directly on the subjects (observational units) within the communities. The advantages here are strictly logistical - it is simply easier to implement in this fashion. Another example occurs in reproductive toxicology experiments in which female rodents are exposed to a treatment (experimental units) but measurements are taken on the pups (observational units).
In experimental design terminology, factors are variables that are controlled and varied during the course of the experiment. For example, treatment is a factor in a clinical trial with experimental units randomized to treatment. Another example is pressure and temperature as factors in a chemical experiment.
Most clinical trials are structured as one-way designs, i.e., only one factor, treatment, with a few levels.
Temperature and pressure in the chemical experiment are two factors that comprise a two-way design in which it is of interest to examine various combinations of temperature and pressure. Some clinical trials may have a two-way factorial design, such as in oncology where various combinations of doses of two chemotherapeutic agents comprise the treatments. An incomplete factorial design may be useful if it is inappropriate to assign subjects to some of the possible treatment combinations, such as no treatment (double placebo). We will study factorial designs in a later lesson.
A parallel design refers to a study in which patients are randomized to a treatment and remain on that treatment throughout the course of the trial. This is a typical design. In contrast, with a crossover design patients are randomized to a sequence of treatments and they cross over from one treatment to another during the course of the trial. Each treatment occurs in a time period with a washout period in between. Crossover designs are of interest since with each patient serving as their own control, there is potential for reduced variability. However, there are potential problems with this type of design. There should be investigation into possible carry-over effects, i.e. the residual effects of the previous treatment affecting subject’s response in the later treatment period. In addition, only conditions that are likely to be similar in both treatment periods are amenable to crossover designs. Acute health problems that do not recur are not well-suited for a crossover study. We will study crossover design in a later lesson.
Randomization is used to remove systematic error (bias) and to justify Type I error probabilities in experiments. Randomization is recognized as an essential feature of clinical trials for removing selection bias.
Selection bias occurs when a physician decides treatment assignment and systematically selects a certain type of patient for a particular treatment.. Suppose the trial consists of an experimental therapy and a placebo. If the physician assigns healthier patients to the experimental therapy and the less healthy patients to the placebo, the study could result in an invalid conclusion that the experimental therapy is very effective.
Blocking and stratification are used to control unwanted variation. For example, suppose a clinical trial is structured to compare treatments A and B in patients between the ages of 18 and 65. Suppose that the younger patients tend to be healthier. It would be prudent to account for this in the design by stratifying with respect to age. One way to achieve this is to construct age groups of 18-30, 31-50, and 51-65 and to randomize patients to treatment within each age group.
|Age||Treatment A||Treatment B|
|18 - 30||12||13|
|31 - 50||23||23|
It is not necessary to have the same number of patients within each age stratum. We do, however, want to have a balance in the number on each treatment within each age group. This is accomplished by blocking, in this case, within the age strata. Blocking is a restriction of the randomization process that results a balance of numbers of patients on each treatment after a prescribed number of randomizations. For example, blocks of 4 within these age strata would mean that after 4, 8, 12, etc. patients in a particular age group had entered the study, the numbers assigned to each treatment within that stratum would be equal.
If the numbers are large enough within a stratum, a planned subgroup analysis may be performed. In the example, the smaller numbers of patients in the upper and lower age groups would require care in the analyses of these sub-groups specifically. However, with the primary question as to the effect of treatment regardless of age, the pooled data in which each sub-group is represented in a balanced fashion would be utilized for the main analysis.
Even ineffective treatments can appear beneficial in some patients. This may be due to random fluctuations, or variability in the disease. If, however, the improvement is due to the patient’s expectation of a positive response, this is called a "placebo effect". This is especially problematic when the outcome is subjective, such as pain or symptom assessment. The placebo effect is widely recognized and must be removed in any clinical trial. For example, rather than constructing a nonrandomized trial in which all patients receive an experimental therapy, it is better to randomize patients to receive either the experimental therapy or a placebo. A true placebo is an inert or inactive treatment that mimics the route of administration of the real treatment, e.g., a sugar pill.
Placebos are not acceptable ethically in many situations, e.g., in surgical trials. (Although there have been instances where 'sham' surgical procedures took place as the 'placebo' control.) When an accepted treatment already exists for a serious illness such as cancer, the control must be an active treatment. In other situations, a true placebo is not physically possible to attain. For example, a few trials investigating dimethyl sulfoxide (DMSO) for providing muscle pain relief were conducted in the 1970’s and 1980’s. DMSO is rubbed onto the area of muscle pain but leaves a garlicky taste in the mouth, so it was difficult to develop a placebo.
Treatment masking or blinding is an effective way to ensure objectivity of the person measuring the outcome variables. Masking is especially important when the measurements are subjective or based on self-assessment. Double-masked trials refer to studies in which both investigators and patients are masked to the treatment. Single-masked trials refer to the situation when only patients are masked. In some studies, statisticians are masked to treatment assignment when performing the initial statistical analyses, i.e., not knowing which group received the treatment and which is the control until analyses have been completed. Even a safety-monitoring committee may be masked to the identity of treatment A or B, until there is an observed trend or difference that should evoke a response from the monitors. In executing a masked trial great care will be taken to keep the treatment allocation schedule securely hidden from all except those with a need to know which medications are active and which are placebo. This could be limited to the producers of the study medications, and possibly the safety monitoring board before study completion. There is always a caveat for breaking the blind for a particular patient in an emergency situation.
As with placebos, masking, although highly desirable, is not always possible. For example, one could not mask a surgeon to the procedure he is to perform. Even so, some have gone to great lengths to achieve masking. For example, a few trials with cardiac pacemakers have consisted of every eligible patient undergoing a surgical procedure to be implanted with the device. The device was "turned on" in patients randomized to the treatment group and "turned off" in patients randomized to the control group. The surgeon was not aware of which devices would be activated.
Investigators often underestimate the importance of masking as a design feature. This is because they believe that biases are small in relation to the magnitude of the treatment effects (when the converse usually is true), or that they can compensate for their prejudice and subjectivity.
Confounding is the effect of other relevant factors on the outcome that may be incorrectly attributed to the difference between study groups.
Here is an example: An investigator plans to assign 10 patients to treatment and 10 patients to control. There will be a one-week follow-up on each patient. The first 10 patients will be assigned treatment on March 01 and the next 10 patients will be assigned control on March 15. The investigator may observe a significant difference between treatment and control, but is it due to different environmental conditions between early March and mid-March? The obvious way to correct this would be to randomize 5 patients to treatment and 5 patients to control on March 01, followed by another 5 patients to treatment and the 5 patients to control on March 15.
A trial is said to possess internal validity if the observed difference in outcome between the study groups is real and not due to bias, chance, or confounding. Randomized, placebo-controlled, double-blinded clinical trials have high levels of internal validity.
External validity in a human trial refers to how well study results can be generalized to a broader population. External validity is irrelevant if internal validity is low. External validity in randomized clinical trials is enhanced by using broad eligibility criteria when recruiting patients .
Large simple and pragmatic trials emphasize external validity. A large simple trial attempts to discover small advantages of a treatment that is expected to be used in a large population. Large numbers of subjects are enrolled in a study with simplified design and management. There is an implicit assumption that the treatment effect is similar for all subjects with the simplified data collection. In a similar vein, a pragmatic trial emphasizes the effect of a treatment in practices outside academic medical centers and involves a broad range of clinical practices.
Studies of equivalency and noninferiority have different objectives than the usual trial which is designed to demonstrate superiority of a new treatment to a control. A study to demonstrate non-inferiority aims to show that a new treatment is not worse than an accepted treatment in terms of the primary response variable by more than a pre-specified margin. A study to demonstrate equivalence has the objective of demonstrating the response to the new treatment is within a prespecified margin in both directions. We will learn more about these studies when we explore sample size calculations.
3.4 - Clinical Trial Phases3.4 - Clinical Trial Phases
When a drug, procedure, or treatment appears safe and effective based on preclinical studies, it can be considered for trials in humans. Clinical studies of experimental drugs, procedures, or treatments in humans have been classified into four phases (Phase I, Phase II, Phase III, and Phase IV) based on the terminology used when pharmaceutical companies interact with the U.S. FDA. Greater numbers of patients are assigned to treatment in each successive phase.
Phase 0 represents pre-clinical testing in animals to obtain pharmacokinetic information.
Phase I trials investigate the effects of various dose levels on humans, The studies are usually done in a small number of volunteers (sometimes persons without the disease of interest or patients with few remaining treatment options) who are closely monitored in a clinical setting. The purpose is to determine a safe dosage range and to identify any common side effects or readily apparent safety concerns. Data may be collected to provide a description of the pharmacokinetics and pharmacodynamics of the compound, estimate the maximum tolerated dose (MTD), or evaluate the effects of multiple dose levels. Many trials in the early stage of therapy development either investigate treatment mechanism (TM) or incorporate dose-finding (DF) strategies.
To a pharmacologist, a TM trial is a pharmacokinetics study in which an attempt is made to investigate the bioavailability of the drug at various sites in the human system. To a surgeon, a TM study investigates the operative procedure. A DF trial usually tries to determine the maximum tolerated dose, or the minimum effective dose, etc. Thus, phase I (drug) trials can be considered TM and DF trials.
A Phase II trial typically investigates preliminary evidence of efficacy and continues to monitor safety. A Phase II trial may be the first time that the agent is administered to patients with the disease of interest to answer questions such as: What is the correct dosage for efficacy and safety in patients of this type? What is the probability a patient treated with the compound will benefit from the therapy or experience an adverse effect? Most trials in the middle stage of therapy development investigate safety and efficacy (SE). The experimental drug or treatment is administered to as many as several hundred patients in Phase II trials.
At the end of Phase II, a decision will be made as to whether or not the drug is promising and development should continue. In the U.S. there will be an ‘End of Phase II’ meeting between the pharmaceutical company and the FDA to discuss safety and plans for Phase III studies. Ineffective or unsafe compounds should not proceed into Phase III trials.
A Phase III trial is a rigorous clinical trial with randomization, one or more control groups, and definitive clinical endpoints. Phase III trials are often multi-center, accumulating the experience of thousands of patients. Phase III trials address questions of comparative treatment efficacy (CTE). A CTE trial involves a placebo and/or active control group so that precise and valid estimates of differences in clinical outcomes attributable to the investigational therapy can be assessed.
If things go well during Phase III, the company with the license for the compound will submit an application for approval.to market the drug. U.S. FDA approval hinges on ‘adequate and well-controlled’ pivotal Phase III studies that are convincing of safety and efficacy.
A phase IV trial or expanded safety trial occurs after the regulatory approval of the new therapy. As the usage of the new drug becomes widespread, there is an opportunity to learn about rare side effects and interactions with other therapies. An expanded safety (ES) study can provide important information that was not apparent during drug development. For example, a few thousand patients might be involved in all of the SE and CTE trials for a particular therapy. An ES study, however, could involve >10,000 patients. Such large sample sizes can detect more subtle safety problems for the therapy if such problems exist. Some Phase IV studies will have a marketing objective for the company as well as collect safety data.
The terminology of phase I, II, III, and IV trials do not work well for non-pharmacologic treatments and does not account for translational trials
Most trials in the early stage of therapy development either investigate treatment mechanism (TM) or incorporate dose-finding (DF) strategies.
Some studies performed prior to large scale clinical trials are characterized as translational studies. Translational studies have as their primary outcome a biological measurement or target that has been derived from an accepted model of the disease process. The results of the translational study may provide evidence of a mechanism of action for a compound. Target validation can be an objective of such a study. Large effects on the target are sought. For example, a large change in the level of a protein, or the activity of an enzyme might support the therapeutic activity of a compound. There is an understanding that translational work may cycle from preclinical lab to a clinical setting and back again. Although the translational studies have a written protocol, the treatment may be modified during the study. The protocol should clearly define what would be considered ‘lack of effect’ and the next experimental step for any possible outcome of the trial.
3.5 - Other Considerations3.5 - Other Considerations
Some therapies are not developed in the same manner as drugs, such as disease prevention therapies, vaccines, biologicals, surgical techniques, medical devices, and diagnostic agents.
Prevention trials are conducted in:
- healthy individuals to determine if the therapy prevents the onset of disease,
- patients with early-stage disease to determine if the therapy prevents progression, or
- patients with the disease to determine if the therapy prevents additional episodes of disease expression.
Vaccine investigations are a type of primary prevention trial They require large numbers of patients and are very costly because of the numbers and the length of follow-up that is required.
The objective of a diagnostic or screening trial is to determine if an agent can “diagnose” the presence of disease. Usually, the agent is compared to a “gold standard” diagnostic that assumed to be perfectly accurate in its diagnosis. The advantage of the newer diagnostic agent is less expense or a less invasive procedure.
3.6 - Importance of the Research Protocol3.6 - Importance of the Research Protocol
A protocol is a document that specifies the research plan for the clinical trial. It is the single-most-important quality control tool for all aspects of a clinical trial. (Piantadosi 2005) This is especially true in a multi-center clinical trial, which requires collaboration in the research activities of many investigators and their staffs at multiple institutions.
Every clinical trial experiences violations of the protocol. Some violations are due to differences in interpretation, some are due to carelessness, and some are due to unforeseen circumstances. Some protocol deviations are inconsequential but others can affect the validity of the trial. For instance, a patient might be unaware of a condition that is present in its early or latent stage or a patient may mislead a researcher intentionally, thinking they will receive special treatment from participating in a study – both result in violations of the patient exclusion criteria established in the research protocol. Protocol amendments are common as a long-term multi-center study progresses. The most serious violations are those which may affect the conclusions of the study.
The SPIRIT 2013 statement by an international collaboration of persons or groups responsible for funding, conducting, and publishing results of clinical trials, along with ethicists, sets forth minimal elements that should be included in a clinical trial protocol and provides a checklist. The US NIH has its own template for a phase 2 or 3 clinical trial protocol.
If the conduct of a particular trial is particularly difficult, and especially if it is a multi-center study, the investigators will construct a manual of operations (MOP). The MOP has more detailed explanations than the protocol for how the measurements should be taken, how the data collection forms should be completed, etc.
3.7 - Summary3.7 - Summary
In this lesson, among other things, we learned:
- the 6 objectives that will be met with proper trial design.
- the sources of potential bias in clinical studies.
- the design strategies to reduce bias, variability, and ‘placebo effects’ in a proposed clinical study.
- to compare and contrast the following study designs with respect to the ability of the investigator to minimize bias: Case report or case series, database analysis, prospective cohort study, case-control study, parallel design clinical trial, crossover clinical trial.
- to identify the experimental unit in a proposed study.
- to differentiate between Phases I-IV trials.
- to recognize features that should be described in a written protocol for a clinical trial
- Identify the characteristics and purposes of translational studies.
References for Lesson 3
Dansinger et al. Comparison of the Atkins, Ornish, Weight Watchers, and Zone Diets for Weight Loss and Heart Disease Risk Reduction: A Randomized Trial. JAMA. 2005;293:43-53.
Ellison, N.M. et al. Special Report on Laetrile : The NCI Laetrile Review. NEnglJMed. 1978; 229; 549-552.
Milazzo S, Ernst E, Lejeune S, Schmidt K. Laetrile treatment for cancer. Cochrane Database of Systematic Reviews 2006, Issue 2. Art. No.: CD005476. DOI: 10.1002/14651858.CD005476.pub2 This version first published online: April 19. 2006; Accessed 7/25/07
Piantadosi Steven. (2005) Clinical Trials as Experimental Designs, Random Error and Bias, Objectives and Outcomes, Translational Clinical Trials, Dose-Finding Designs. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hobaken, NJ: John Wiley and Sons, Inc.