Unit 2: Study Designs and Internal Validity for Health-Related Studies

Unit 2: Study Designs and Internal Validity for Health-Related Studies

Designs for Health-Related Studies

A proper study design means that the approach and methods will yield results that are as valid and as precise as possible. It also means that the study design is appropriate for the current scientific thinking on the topic. Each of the study designs listed below can provide useful information when applied in the appropriate situation and with the proper methods. The designs are listed in increasing order of their ability to demonstrate causality, with the stronger designs being those in which the researcher controls the administration of the treatment. However, the study design alone does not ensure that the results are valid and precise, and generalizable. Epidemiologists must develop a keen ability to recognize the strengths and limitations of any study.

Observational Studies

Case Study
A study of one diseased individual. Typically, an uncommon disease or set of symptoms. The study design would not require a comparison.
Case Series
A study of multiple occurrences of unusual cases that have similar characteristics. Investigators can calculate the frequency of symptoms or characteristics among the cases. Results may generate causal hypotheses. Neither a case study nor a case series would include a comparison group.
Ecological Study
A study in which at least one variable, either exposure or the outcome, is measured at the group (not individual) level. Examples of group-level measures include…
  • the incidence rate of cancer among a specific population,
  • the mean level of blood pressure of patients seen at a clinic,
  • the average sunlight exposure at a specific geographic location on the earth, or
  • a preventive service included in a health insurance plan.

The occurrence of disease is compared between groups that have different levels of exposure, which affords this study design to have at least one comparison group.

Cross-Sectional Study
A study with individual-level variables that measures exposure and disease at one point in time. A snapshot of the study population. This study design provides weak evidence of a causal association between exposure and outcome because we may not be certain that the exposure preceded the disease. A patient survey is an example of a cross-sectional study.
Case-Control Study
A study that identifies individuals who develop the disease (cases) and individuals without the disease (controls), and then determines the previous exposure for each case and control. The case group is composed only of individuals known to have the disease or outcome; the control group is drawn from a comparable population who do NOT have the disease or outcome. We then compare the odds of exposure between cases and controls. The measure of association for a case-control study is typically an odds ratio. A case-control study is stronger than a cross-sectional study in establishing individual-level causality because we are more certain that exposure preceded the disease outcome.
Cohort Study
A study that begins with persons who do not have the disease but with a known level of exposure to the putative risk factor. The known level is often no exposure. Thus, the study sample is drawn only from individuals at risk of developing the disease or outcome. Individuals are followed through time until some of them develop the disease. We then compare the rate of the outcome for the exposed group to the rate of the outcome for the non-exposed group. The measure of association is a relative risk, attributable risk, or depicted with survival analysis. Incidence density rates can be calculated. A cohort study takes more time, money, and subjects than does a case-control study, but will also provide stronger evidence of individual-level causation because we are measuring incidence rates of the disease. Longitudinal surveys may be considered a cohort study.

Experimental Studies

Some epidemiologic studies are interventional, intended to test methods to reduce the incidence or severity of the disease.

Community-based epidemiologic studies
Instead of randomizing individuals, communities may be randomly selected to receive treatment. For instance, in the 1950s, certain communities were randomly selected to have fluoride added to the drinking water; in other communities, no fluoride was added. The incidence of dental caries between the fluoridated communities and the non-fluoridated communities were then compared.
Clinical (experimental) study
In this type of study, the researcher controls the exposure that individuals receive. A prime example is a clinical trial, in which patients may be randomized to receive a specific treatment. Measurements are made on the individual, but these studies do not typically measure the effect on study participants that might come from a group-level exposure.

To distinguish between observational and experimental designs, ask whether the investigator (or the study) is controlling the primary exposure. For instance, suppose you volunteer for a study in which a specified diet is prescribed and provided. You are randomly selected to receive one specific dietary plan, while another person is randomly selected to receive a different diet. You are to eat the prescribed food and keep a food diary. This would be an experimental study. A limitation to experimental studies is that participants tend to be more homogeneous, for example, they may be highly-educated and live in urban areas. This can limit the generalizability of the results if education or residence is related to the outcome.

If the study did not prescribe which diet you ate but let you choose what you ate, it would be an observational study. Observational studies can not control for all the things that people do or happen to them, so there is the possibility of uncontrolled confounding. However, these studies tend to enroll people from broader backgrounds, possibly strengthening the generalizability of their results.

Internal Validity for Health-Related Studies

Once a study is conducted, the investigators must assess the internal validity of the results.  A study is considered valid only when the three following alternative explanations have been eliminated.  

  1. Bias
  2. Confounding
  3. Random Error

Unlike bias and confounding, which are problems that the investigator needs to eliminate, effect modification is a natural phenomenon of scientific interest that the investigator aims to describe and understand.  Effect modification will also be covered in this unit.  

Lesson 5 - Case Control Studies

Lesson 5 - Case Control Studies

Lesson 5 Objectives

Upon completion of this lesson, you should be able to:

  • Recognize the case-control study design, and settings where it is useful
  • Apply different methods for defining exposures
  • Differentiate between direct and indirect methods for measuring exposure
  • Be familiar with sources to use for defining cases
  • Create the appropriate 2x2 table and calculate an odds ratio for data from a case-control study
  • Distinguish between matching and non-matching methods to select controls

5.1 - Rationale & Designs

5.1 - Rationale & Designs

Case-control studies are useful when epidemiologists investigate an outbreak of a disease because the study design is powerful enough to identify the cause of the outbreak especially when the sample size is small. Attributable risks may also be calculated.

Case-control study designs are used to estimate the risk for a disease from a specific risk factor. The estimate is the odds ratio, which is a good estimate of the relative risk especially when the disease is rare.

While a case-control study design offers less support for a causation hypothesis than the longer and more expensive cohort design, it does provide stronger evidence than a cross-sectional study.

Case-control studies are useful when:

  • Exposure data is difficult or expensive to obtain
  • The disease is rare
  • The disease has a long induction and latent period
  •  Little is known about the disease
  • The underlying population is dynamic

Case-Control Study Design

The approach for a case-control study is straightforward. Case-control studies begin by enrolling persons based on their current disease status. Previous exposure status is subsequently determined for each case and control. However, because these studies collect data after the disease has already occurred, they are considered retrospective, which is a limitation.

  Stop and Think!

Come up with an answer to this question and then click on the button below to reveal the answer.

Why can't we determine the incidence rate from a case-control study?

We have selected cases and controls from a population, often an unknown population. For example, we might enroll patients in a hospital, but we don't really know the size of the general population that would have come to the hospital. Also, we have not followed persons at risk to monitor the development of disease. Furthermore, the investigator selects the number of cases relative to the number of controls.

A most critical and often controversial component of a case-control study is the selection of the controls. Controls must be comparable to cases in every way except that they do not have the disease. Preferably controls are drawn from the same population as the cases. Some studies, though, draw the controls from a different data source. For example, cases may be detected from a disease registry but the controls are selected randomly from another data source. Controls should be selected without regard to their exposure status (e.g., exposed/non-exposed), but may be sampled proportional to their time at risk (which is called density sampling).

There are two basic types of case-control studies, distinguished by the method used to select controls.

  • Non-matched case-control study:  The first is a non-matched case-control study in which we enroll controls without regard to the number, or characteristics of the cases. In this study design, the number of controls does not necessarily equal the number of cases.
  • Matched case-control study: In a matched study, we enroll controls based on some characteristic(s) of the case. For example, we might match the sex of the control to the sex of the case. The idea in matching is to match upon a potential confounding variable in order to remove the confounding effect. (We will look at how matching occurs in the example below.)

    There are two basic types of matched designs:

    • one-to-n matching (i.e., one case to one control, or one case to a specific number of controls) and
    • frequency-matching, where matching is based upon the distributions of the characteristics among the cases. For example, 40% of the cases are females so we choose the controls such that 40% of the controls are females.

Case-Crossover Study Design

This design is useful when the risk factor/exposure is transient. For example, cell phone use or sleep disturbances are transitory occurrences. Each case serves as its own control, i.e the study is self-matched. For each person, there is a 'case window', the period of time during which the person was a case, and a 'control window', a period of time associated with not being a case. Risk exposure during the case window is compared to risk exposure during the control window.

Advantages of Case-crossover

  • Efficient – self-matching
  • Efficient – select only cases
  • Can use multiple control windows for one case window

Disadvantages of Case-crossover

  • Information bias – the inaccurate recall of exposure during the control window (can be overcome by choosing the control window to occur after the case window)
  • Requires careful selection of the time period during which the control window occurs (circumstance associated with the control window should be similar to circumstances associated with the case window; e.g., traffic volume)
  • Requires careful selection of the length and timing of the windows (e.g., in an investigation of the risk of cell phone usage in auto accidents, cell phone usage that ceases 30 minutes before an accident is unlikely to be relevant to the accident)

Example & Guidance Material


The first decade of experience with case-crossover studies has shown that the design applies best if the exposure is intermittent, the effect on risk is immediate and transient, and the outcome is abrupt. However, this design has been used to study single changes in exposure level, gradual effects on risk, and outcomes with insidious onsets. To estimate relative risk, the exposure frequency during a window just before outcome onset is compared with exposure frequencies during control times rather than in control persons. One or more control times are supplied by each of the cases themselves, to control for confounding by constant characteristics and self-confounding between the trigger's acute and chronic effects. This review of published case-crossover studies is designed to help the reader prepare a better research proposal by understanding triggers and deterrents, target person times, alternative study bases, crossover cohorts, induction times, effect and hazard periods, exposure windows, the exposure opportunity fallacy, a general likelihood formula, and control crossover analysis.

Read the Article


Because of the belief that the use of cellular telephones while driving may cause collisions, several countries have restricted their use in motor vehicles, and others are considering such regulations. We used an epidemiologic method, the case-crossover design, to study whether using a cellular telephone while driving increases the risk of a motor vehicle collision.


We studied 699 drivers who had cellular telephones and who were involved in motor vehicle collisions resulting in substantial property damage but no personal injury. Each person's cellular-telephone calls on the day of the collision and during the previous week were analyzed through the use of detailed billing records.


A total of 26,798 cellular telephone calls were made during the 14-month study period. The risk of a collision when using a cellular telephone was four times higher than the risk when a cellular telephone was not being used (relative risk, 4.3; 95 percent confidence interval, 3.0 to 6.5). The relative risk was similar for drivers who differed in personal characteristics such as age and driving experience; calls close to the time of the collision were particularly hazardous (relative risk, 4.8 for calls placed within 5 minutes of the collision, as compared with 1.3 for calls placed more than 15 minutes before the collision; P<0.001); and units that allowed the hands to be free (relative risk, 5.9) offered no safety advantage over hand-held units (relative risk, 3.9; P not significant). Thirty-nine percent of the drivers called emergency services after the collision, suggesting that having a cellular telephone may have had advantages in the aftermath of an event.


The use of cellular telephones in motor vehicles is associated with quadrupling the risk of a collision during the brief period of a call. Decisions about the regulation of such telephones, however, need to take into account the benefits of the technology and the role of individual responsibility.

Read the Article

5.2 - Basic Concepts of Exposure

5.2 - Basic Concepts of Exposure

In a general sense, exposure can be defined as any of a subject's attributes (association) or any agent (effect) with which the subject may come into contact. These attributes or agents may be relevant to his or her health (Armstrong et al., 1998).

This definition would include smoking, drinking, exposure through an occupation (farmers, pesticide applicators, etc.), or age (e.g., menopause >> endogenous estrogen levels) as exposures.

For an environmental factor, exposure can be more precisely defined as contact with some agent at the boundary between humans and the environment, at a specific concentration, over an interval of time (Wallace, 1995). Exposures can be harmful or beneficial.

Harmful - Environmental Tobacco Smoke, (ETS), Asbestos, ...
Beneficial Factors - Vitamin D intake, Colonoscopy as a preventive measure for colon cancer, …

Exposure Assessment

Exposure Assessment
The science that describes how an individual or population comes in contact with a risk factor, including quantification of the amount of the risk factor across space and time (Lioy, 1990)
  1. Exposure intensity: the agent/risk factor concentration in the medium that is in contact with the body
  2. Exposure frequency: designates how often the exposure occurs
  3. Exposure duration: the length of the time that the exposure occurs
  4. Microenvironment: defined as any location or activity in which a distinct exposure occurs.

When investigating whether exposure is related to the risk of disease, the epidemiologist must consider many possibilities. Could the effect be related to the concentration? the frequency of exposure? duration? or is there something peculiar to a certain micro-environment?

Example: BaP

Benzo[a]pvrene (BaP) in a hamburger that was cooked in a flame broiling process is a carcinogen. To assess exposure, you may ask: How burned is the burger? How frequently does the person eat burned hamburgers? How long have they been eating burned hamburgers? Is there an additional micro-environmental change that occurs within BaP, the molecule carrying the carcinogen? All these factors could be related to the ability of the exposure to cause a particular outcome.

Environmental tobacco smoke or second-hand smoke is another example. What questions might you ask?

(Consider: Are children exposed to environmental tobacco smoke in their home environment? How many hours are smokers smoking in the house? How many smokers? How frequently is the child in the house? How long have smokers lived in the house?)

Here's another example.

Example: Tobacco

Look at Tables 1 and 2 in Paskett et al 2007. What types of tobacco exposure are considered in this evaluation of the relationship of smoking with colorectal cancer? What is the effect of current smoking on the incidence of colorectal cancer?

From Table 1:

Smoking Status (Never, Past, Current)
Age at Smoking Initiation (Never, <20, ≥ 20)
Cigarettes per Day (Never, <25, ≥ 25)
Duration (Never, <20, 20-29, 30-39, ≥ 40)
Passive smoking status (Never, Ever)

From Table 2:

We can see that current smoking increased the risk of invasive colon cancer only 1.03 times (95% confidence interval for HR, 0.77 to 1.38) but that current smoking increased the risk of invasive rectal cancer 1.95 times (95% confidence interval, 1.10 to 3.47).

Other factors may result in differential effects of exposure. For example, serious adverse health outcomes may be limited to individuals in a specific setting, region, worksite, or community. Individual susceptibilities may differ. There may be differences in exposure due to the type of agent (chemical, biological or physical), the medium through which the exposure occurs (air, water, food), or the route of exposure (inhaled, ingested, or dermal). In considering a potential association of BaP in grilled burgers with a certain particular cancer, we might ask how much carcinogen is in the charred meat? how much is in the smoke arising from the meat drippings? etc.

Potentially important aspects of environmental exposure include:

Agent(s) biological, chemical, physical, single agent, multiple agents, mixtures
Source(s) anthropogenic/non-anthropogenic, area/point, stationary/mobile, indoor/outdoor
Transport/carrier medium air, water, soil, dust, food, product/item
Exposure pathway(s) eating contaminated food, breathing contaminated workplace air touching residential surface
Exposure concentration mg/kg (food), mg/litre (water), μg/m3 (air), μg/cm2 contaminated surface), % by weight, fibres/m3 (air)
Exposure route(s) inhalation, dermal contact, ingestion, multiple routes
Exposure duration seconds, minutes, hours, days, weeks, months, years, lifetime
Exposure frequency continuous, intermittent, cyclic, random, rare
Exposure setting(s) occupational/non-occupational, residential/non-residential, indoors/outdoors
Exposed population the general population, population subgroups, individuals
Geographic scope site/source-specific, local, regional, national, international, global
Time frame past, present, future, trends

As you can see, exposure assessment requires considering many factors that can affect exposure. Let's look further at some of these factors.

Exposure Measurements

Data can be collected by directly monitoring an individual or indirectly, each method with advantages and disadvantages.

Direct Method:

Monitor individuals using some measurement device on their person or by taking biological samples.

  • Personal monitoring: personal exposure monitors for particulate matter (PM), patches worn under clothing for pesticide applicators
  • Biological monitoring: lead concentrations in blood, biomarkers in urine, blood, breath, hair, nails


  • Provides exposure values with minimal assumptions
  • Assesses exposure and collects data at the individual level


  • May not be practical in a large epidemiological study because of the expense and effort required.
  • Heavily dependent upon ½ life of compound (how long it lasts in the body); may have affinity for certain tissues
  • Can be affected by inter-individual differences in metabolism, inter-laboratory variation and intra-individual variability due to diurnal variation, diet, season, etc.
  • If the chemical does not persist in the body, level will not reflect long-term exposure

Indirect Method:

Various possibilities

  • Questionnaires/diaries
  • Job exposure matrix (JEM)
  • Environmental monitoring/modeling - monitor the environment and assume that people who live in that environment are exposed at the level observed at the monitoring site
  • Calculate the concentration of an agent in all locations/activities and multiply by the duration spent in each location/activity, (Examples: Indoor and outdoor exposure to ambient air particles, exposure to  on the road)


  • Practical and less expensive, the indirect method has been used extensively in epidemiological studies
  • Often involves gathering data with a questionnaire.


  • Potential problems of a questionnaire: low response rate, non-responsiveness to an individual question, recall bias (not being able to remember), social desirability (false answers), low question validity and reliability.

Environmental Biomarker Research Example

The US EPA is working to connect levels of environmental contamination to the risk of adverse effects in the public's health with the ultimate goal of reducing this risk. Check out the Chemical Safety for Sustainability Strategic Research Action Plan 2016-2019 to see how the EPA is working toward this goal. In particular, see Topic 2: Life Cycle Analytics, beginning on page 19.

5.3 - Case Definitions

5.3 - Case Definitions

Basic Concepts of Case Definition:

Suppose you are asked to estimate the population prevalence of attention deficit hypersensitivity disorder (ADHD) among U.S. school-age children. How will you identify the children who should be counted as cases having ADHD? What defines a ‘case’?

The definition of a ‘case’ is critical in planning an epidemiologic investigation. The case definition must be carefully formulated to meet the objectives of the investigation, while also permitting valid comparisons with results from other studies. In this example, it may be of interest to consider whether the proportion of school-age children with ADHD has changed over a period of time. If the case definition changes significantly from one time period to the next, comparisons with previous years are problematic.
Suppose instead of estimating prevalence, the task is to define cases for a case-control study that is examining the risk from exposure. If the case definition is broad, it will be easier to include prospective cases, hastening the enrollment of study participants. However, variability among the cases will be greater than if the case definition was more narrow. A narrow case definition can slow the identification of sufficient numbers of cases but has the potential to reduce false positives.

Just as a clinical diagnosis for an individual requires meeting specific clinical and laboratory criteria, measuring disease frequency in populations requires the prior stipulation of which clinical, laboratory, epidemiologic, or quantitative criteria indicate the presence of the disease. Case definitions can include a degree of certainty (e.g. probable or confirmed, etc.) or specify the method to be used in assessing whether or not criteria are met.

In the United States, disease surveillance is not a responsibility directly given to the federal government. This means each state in the US may establish its own requirements for reporting diseases. Can you imagine the confusion if all 50 states had used their own definitions for different diseases? To assist the states, the U.S. Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists have published a set of uniform criteria for reporting cases of  Surveillance Case Definitions for Current and Historical Conditions.

Example of Change in Case Definitions:

CDC cuts US SARS Case Count in Half

During 2003, there was a worldwide epidemic of SARS. The news release below is titled “CDC cuts US SARS case count in half”. How was CDC able to cut the rates for this disease in half from one reporting period to the next? Read the news release....

July 17, 2003 (CIDRAP News) – Because of a change in the case definition for SARS (severe acute respiratory syndrome), the United States has had only half as many suspected and probable cases of the illness as previously reported, federal health officials said today.
The Centers for Disease Control and Prevention (CDC) said the total case count is now 211 instead of 418, a 49.5% reduction. The official tally now is 175 suspected and 36 probable cases, down from 344 suspected and 74 probable cases.

The change is a result of excluding all cases in which convalescent blood samples—those collected more than 21 days after illness onset — tested negative for the SARS coronavirus, the CDC said in a news release. "Exclusion of these cases with negative convalescent serum provides a more accurate accounting of the epidemic in the U.S.," the agency said.

The Council of State and Territorial Epidemiologists recommended changing the SARS case definition to exclude cases with negative convalescent serum tests. The recommendation is based on evidence that 95% of SARS patients mount a detectable antibody response in the convalescent phase, the CDC said.
The revised case definition and case count are detailed in the Jul 18 issue of Morbidity and Mortality Weekly Report, published online today.

"Serologic testing results suggest that a small proportion of persons who had illness consistent with the clinical and epidemiologic criteria for a U.S. case of suspect or probable SARS actually had SARS," the article states. "The case definition captures an array of respiratory illnesses that cannot be easily distinguished from SARS until laboratory testing results for SARS and other agents are performed." The sensitive case definition allowed for rapid investigation of possible cases and public health steps to prevent spread of the disease, the report adds.

So what happened? The CDC changed the case definition once lab results were available. With the new definition, the 21-day serum had to test positive for the SARS coronavirus or it was no longer considered a SARS case. 418 cases were diminished to 211 cases just by changing the case definition. The case definition for a disease has a substantial impact.

Importance of clear case definitions

As you proceed through this course it will be important that you provide specific case definitions for diseases for which you are conducting an epidemiologic investigation.. You may use a medical reference or ICD codes that classify diseases. The ICD (International Classification of Disease), is an international organization that meets to assign three or four-digit codes to every possible cause of death. These are very specific.

Physicians or medical investigators may belong to professional organizations or societies that define cases somewhat differently than the CDC. It is important to be explicit about the case definition used in a study and to what else it is comparable. Below are further sources of case definitions that may be helpful to you in epidemiologic investigations for this course.

Levels of Certainty in Case Assignment

Cases of disease can be categorized as follows:

  1. Clinically compatible case
    A clinical syndrome is generally compatible with the disease, as described in the clinical description. A general clinical impression is that this is a case of disease.
  2. Confirmed case
    A case that is classified as confirmed for reporting purposes. The case meets established criteria.
  3. Epidemiologically linked case

    A case in which…

    1. the patient has had contact with one or more persons who either have/had the disease or have been exposed to a point source of infection (i.e., a single source of infection, such as an event leading to a foodborne disease outbreak, to which all confirmed case-patients were exposed) and
    2. transmission of the agent by the usual modes of transmission is plausible. A case may be considered epidemiologically linked to a laboratory-confirmed case if at least one case in the chain of transmission is laboratory confirmed
  4. Laboratory-confirmed case
    A case that is confirmed by one or more of the laboratory methods listed in the case definition under laboratory criteria for diagnosis. Although other laboratory methods can be used in clinical diagnosis, only those listed are accepted as laboratory confirmation for national reporting purposes
  5. Probable case
    A case that is classified as probable for reporting purposes. Supportive or presumptive laboratory results: specified laboratory results that are consistent with the diagnosis, yet do not meet the criteria for laboratory confirmation
  6. Suspected case
    A case that has a lower certainty; is classified as suspected for reporting purposes
  7. Clinically compatible case
    A clinical syndrome is generally compatible with the disease, as described in the clinical description. A general clinical impression is that this is a case of disease.

Each type has utility in different settings. To investigate a highly infectious, transmissible, or serious and deadly disease, casting a broad net will capture all suspected and probable cases. On the other hand, if declaring an individual to be a 'case' is likely to result in imposing severe restrictions, such as closing schools or removing a product from the market, the case definition should be more stringent. For example, a series of suspected cases of disease would not be sufficient to support a product recall. The epidemiologist would prefer to have a confirmed case or a laboratory-confirmed case to justify such action.

Sources of Case Definitions

Check out these nationally recognized sources of case definitions

5.4 - Examples

5.4 - Examples

Case- Control Study Example

Source: Obesity is associated with decreased risk of microscopic colitis in women



Microscopic colitis is a leading cause of diarrhea in the older adults. There is limited information about risk factors. We hypothesized that obesity would be associated with microscopic colitis.


To examine the association between obesity and microscopic colitis in men and women undergoing colonoscopy.


We conducted a case-control study at the University of North Carolina Hospitals. We identified and enrolled men and women referred for elective, outpatient colonoscopy for chronic diarrhea. We excluded patients with a past diagnosis of Crohn’s disease or ulcerative colitis. A research pathologist reviewed biopsies on every patient and classified them as microscopic colitis cases or non-microscopic colitis controls. Patients provided information on body weight, height, and exposure to medications via structured interviews or Internet-based forms. The analysis included 110 patients with microscopic colitis (cases) and 252 non-microscopic colitis controls. Multivariable analyses were performed using logistic regression to estimate odds ratios and 95% confidence intervals.


Cases were older and more likely than controls to be white race. Study subjects were well educated, but cases were better educated than controls. Cases with microscopic colitis had lower body mass index than controls and reported more weight loss after the onset of diarrhea. Compared to patients who were normal or under-weight, obese (BMI > 30 kg/m2) patients were substantially less likely to have microscopic colitis after adjusting for age and education, adjusted OR (aOR) 0.35, 95% confidence interval (CI) 0.18-0.66). When stratified by sex, the association was limited to obese women, aOR 0.21, 95%CI: 0.10-0.45. Patients with microscopic colitis were more likely to report weight loss after the onset of diarrhea. After stratifying by weight loss, there remained a strong inverse association between obesity and microscopic colitis, aOR 0.33, 95%CI: 0.10 – 1.11 among the patients who did not lose weight. Ever use of birth control pills was associated with lower risk of microscopic colitis after adjusting for age, education, and BMI, aOR 0.38, 95%CI: 0.17-0.84.


Compared to controls also seen for diarrhea, microscopic colitis cases were less likely to be obese. Mechanisms are unknown but could involve hormonal effects of obesity or the gut microbiome.

Case-Crossover Study Example

Source: Valent F, Brusaferro S, Barbone F. A case-crossover study of sleep and childhood injury. Pediatrics 2001;107; E23. in Woodward M. Epidemiology: Study Design and Data Analysis. 2nd Ed. London: Chapman and Hall. 2005.

In this Italian case-crossover study of sleep disturbance and injury amongst children (Valent et al., 2001), each child was asked about her or his sleep in the 24 hours before the injury occurred (the case window) and in the 24 hours before that (the control window). Amongst 181 boys, 40 had less than 10 hours sleep on both the days concerned; 111 had less than 10 hours sleep on neither day; 21 had less than 10 hours sleep only on the day before the injury; and 9 had less than 10 hours sleep only on the penultimate day before the injury. The odds ratio (95% confidence interval) for injury, comparing days without and with 10 hours or more sleep, is 2.33 (95% confidence interval; 1.02, 5.79).

5.5 - Analysis

5.5 - Analysis

Analytic Methods for Non-Matched Case-Control Studies

With case-control studies, we essentially work down the columns of the 2 × 2 table. Cases are identified first, then controls. The investigator then determines whether cases and controls were exposed or not exposed to the risk factor.

2 × 2 Table for Non-Matched Case-Control Data:
Category Case
Total Exposure
Exposed A B TotalExposed
Not Exposed C D TotalNotExposed
Total TotalCases TotalControls Total

We calculate the odds of exposure among cases (A/C) and the odds of exposure among controls (B/D). The odds ratio is then (A/C)/(B/D), which simplifies after cross-multiplication to (A*D)/(B*C).

For case-control studies, since the ratio of cases to controls is not necessarily representative of the ratio in the population, the odds ratio must be used as the summary measure.  The relative risk is not an accurate measure in this type of study.

Analytic methods for non-matched case-control studies include:

  • Chi-square 2 × 2 analysis;
  • Mantel-Hanszel statistic (This test takes into account the possibility that there are different effects for the different strata (e.g., effect modification))
  • Fisher’s Exact test (This test is used if the expected cell size is <5)
  • Unconditional logistic regression (The method is used to simultaneously adjust for multiple confounders; a multivariable analysis).


For the obesity and microscopic colitis example (Obesity is associated with decreased risk of microscopic colitis in women), the data from table 2 can be used to construct this 2x2 table for the comparison of microscopic colitis between those with low and high BMI.

Comparison of Microscopic Colitis between those with Low and High BMI
Category Case
Total Exposure
(BMI >=30)
22 105 TotalExposed (127)
Not Exposed
(BMI < 25)
50 73 TotalNotExposed (123)
Total TotalCases
Total (250)

OR = (22*73)/(50*105) = 0.31
As we see in the text: As shown in Table Table2, the risk for microscopic colitis was lower for … BMI > 30 kg/m2 (OR 0.31, 95%CI: 0.17-0.55) compared to under- or healthy weight (BMI < 25 kg/m2) as the reference.

To review, for a simple non-matched case-control study, you find a case, then determine whether the person is exposed or not. Find a control; determine their exposure status.

Analytic Methods of Matched Case-Control Studies

In an analysis of a matched study design, only discordant pairs are used. A discordant pair occurs when the exposure status of the case is different from the exposure status of the control. The most commonly used analytic method for matched case-control studies is conditional logistic regression, conditioned upon the matching.

The matched case-control study has linked a case to a control based on the matching of one or more variables. The summary table will differ for a matched case-control study

2 × 2 Table for Matched Case-Control Data:
Controls Cases
(Concordant Pair)
(Discordant Pair)
(Discordant Pair)
(Concordant Pair)
TotalNot ExposedControls
Total TotalExposedCases TotalNot ExposedCases Total


Let's look at an example. Suppose we plan to match cases to controls by gender and age (+/- 5 years). We first identify the following case:


Male, 45 years of age (Patient 1);
Exposure status: Exposed
If this was a non-matched study, the case would be counted in cell A in the non matched 2x2 table because he is exposed. However, in the age- and gender-matched case-control study we must also find a male control within five years of age. Searching in the appropriate control population, we locate the following control:


Male 48 years of age (Person 47);
Exposure status: Exposed
If Person 47 were counted in an unmatched study, he would belong in cell B of the preceding table. In a matched case-control study, however, we are interested in results for the matched pair. The data from Patient 1 and Person 47 are linked for the duration of the study. The appropriate table for the matched study is depicted below. Where do Patient 1 and Person 47 belong?

Patient 1 is a case and he is exposed so he fits into either cell A or cell C. Based upon his control's status we determine which cell is the correct placement for this pair. Patient 1's control is exposed, therefore Patient 1 and Person 47 fit into cell A as a pair. This is a concordant pair because both are exposed. Concordancy is based upon exposure status. In a matched case-control study, the cell counts represent pairs, not individuals. In the statistical analysis, only the discordant pairs are important. Cells B and C contribute to the odds ratio in a matched design. Cells A and D do not contribute to the odds ratio. If the risk for disease is increased due to exposure, C will be greater than B. The odds ratio is then (B/C).

Comparing Matched and Non-Matched Case-Control Studies

  Stop and Think!

Come up with an answer to the questions and then click on the button below to reveal the answer.

  1. Can you think of more than one reason why a matched case-control study could take longer to complete than an unmatched study?

    First, you must identify matched controls, sometimes more than one per case. Second, since only the discordant pairs contribute to the statistical analysis, achieving a desired statistical power depends on obtaining a particular number of discordant pairs.
  2. Why bother with matching if it means a longer case-control study?

    When performing statistical analysis, the matched variables are not included in the statistical model.
    (In a cohort study, confounding is dealt with by including the terms in the model to adjust for their effects. In a matched case-control study, the adjustment for this confounding can be made through the matching.)

5.6 - Lesson 5 Summary

5.6 - Lesson 5 Summary

Lesson 5 Summary

Case-control studies are one of the two main types of observational study designs in epidemiology. This design is desirable when exposure data are difficult to obtain, the disease is rare or has a long induction period, or when the underlying population is dynamic. In a case-control study, first patients are selected based on their disease (outcome) status and classified as cases or controls. Then investigators obtain and compare the exposure histories between groups. A case-crossover study is a specific type of case-control study where patients serve as their own controls. This is appropriate for transient exposures, where a patient contributes time as both a case and a control.

Lesson 6 - Cohort Studies

Lesson 6 - Cohort Studies

Lesson 6 Objectives

Upon completion of this lesson, you should be able to:

  • Distinguish between prospective and retrospective cohort studies
  • Identify advantages and disadvantages of large continuously running prospective cohort studies
  • Calculate cumulative incidence, person-time, and incidence rates
  • Compare cohort and case-control studies
  • Describe combination studies including nested case-control and case-cohort designs


6.1 - Rationale & Design

6.1 - Rationale & Design

Cohort studies are useful for estimating disease risk, incidence rates, and/or relative risks. Non-cases may be enrolled from a well-defined population, their current exposure status (at t0) determined, and their disease onset observed over time. Disease status at t1 can be compared to exposure status at t0.

There are two main types of cohort studies: prospective and retrospective.  In general, the descriptor, 'prospective' or 'retrospective', indicates when the cohort is identified relative to the initiation of the study.
Sometimes investigators enter into an ongoing prospective cohort study before the cases are determined. An investigator may pose a new question during an intermediate time period of an ongoing study. The cohort is already determined.

Exposure and case status information is available from the beginning of the ongoing study; subjects can be followed forward to collect cases. For example, suppose genotyping is performed in a cohort study. Later, an investigator may decide to use these data and subsequent case status to consider the relationship of a genetic factor to a particular disease. The investigator may wish to follow the subjects for several more years to ascertain more cases. This would be a mixture of a prospective and retrospective cohort.

Prospective Cohort Design (concurrent; longitudinal study)

In a prospective cohort study, the investigators identify the study population at the beginning of the study and accompanies the subjects through time. When proposing a prospective cohort study, the investigator first identifies the characteristics of the group of people he/she wishes to study. The investigator then determines the present case status of individuals, selecting only non-cases to follow forward in time. Exposure status is determined at the beginning of the study. A member of the cohort reaches the endpoint either by dying, becoming a case, or reaching the end of the study period. A subject can also be lost to follow-up over the course of the study.


  • loss to follow up;
  • differential nonresponse;
  • loss of funding support;
  • continually improving methods for detecting exposure (leading to greater misclassification than would be expected in current practice)

An advantage of a well-run cohort study is the multiple outcomes that can be considered. A group well-characterized and followed over a long period of time provides much useful information. For example, the Framingham study has studied 3 generations and added to our understanding of the roles of obesity, HDL lipids, and hypertension in heart disease and stroke as well as contributing an algorithm for predicting CHD risk and identifying 8 genetic loci associated with hypertension. The use of sub-cohorts for specific purposes can minimize cost and the length of a study.

Retrospective Cohort Study (Historical cohort; Non-concurrent Prospective Cohort)

An investigator accesses a historical roster of all exposed and non-exposed persons and then determines their current case/non-case status. The investigator initiates the study when the disease is already established in the cohort of individuals, long after the original measurement of exposure. Doing a retrospective cohort study requires good data on exposure status for both cases and non-cases at a designated earlier time point.

  Stop and Think!

How does a retrospective cohort study differ from a case-control study?

Both types of studies identify present cases and non-cases.

The case-control study identifies the cases and then selects appropriate controls. An entire cohort is not used. If you were investigating an environmentally-related cancer among university students with a case-control study, you would identify students within certain years who met the case definition for the cancer. You would select controls among students who were not a case of cancer, but matched on characteristics such as age, gender, and graduation year, then determine their exposure status (perhaps the proximity of their campus address to the identified toxin) and compare exposures between cases and non-cases.

A retrospective cohort study uses the entire cohort; all cases and non-cases within the identified group. A retrospective cohort design might designate the cohort to be students enrolled at the university over a 5 year time span. The present case status of all these students is determined and historical data about their exposure status accessed, in order to assess the relationship between being a case of the cancer with the exposure.

Examples of Cohort Studies

Examples of large prospective cohort studies that have been ongoing for an extended period of time are provided in this section.
Research is still being conducted with these established cohorts.

The Framingham study began in 1948 by recruiting an Original Cohort of 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease or suffered a heart attack or stroke. Since that time the Study has added an Offspring Cohort in 1971, the Omni Cohort in 1994, a Third Generation Cohort in 2002, a New Offspring Spouse Cohort in 2003, and a Second Generation Omni Cohort in 2003.
The Nurses' Health Study (NHS) and the Nurses' Health Study II (NHS II) are among the largest prospective investigations into the risk factors for major chronic diseases in women. The primary motivation for the study was to investigate the potential long-term consequences of oral contraceptives, which were being prescribed to hundreds of millions of women. Married registered nurses, aged 30 to 55 in 1976, who lived in the 11 most populous states, and whose nursing boards agreed to supply NHS with their members' names and addresses, were eligible to be enrolled in the cohort if they responded to the NHS baseline questionnaire In addition to reading about the history of the studies, watch a series of short videos about the studies on the Nurses’ Health Study 3 YouTube page.
The NHANES I Epidemiologic Follow-up Study (NHEFS) is a national longitudinal study that was jointly initiated by the National Center for Health Statistics and the National Institute on Aging in collaboration with other agencies of the Public Health Service. The NHEFS was designed to investigate the relationships between clinical, nutritional, and behavioral factors assessed in the first National Health and Nutrition Examination Survey NHANES I and subsequent morbidity, mortality, and hospital utilization, as well as changes in risk factors, functional limitation, and institutionalization. The NHEFS cohort includes all persons 25-74 years of age who completed a medical examination at NHANES I in 1971-75 (n = 14,407). It consists of a series of follow-up studies, four of which have been conducted to date. The first wave of data collection was conducted for all members of the NHEFS cohort from 1982 through 1984. It included tracing the cohort; conducting personal interviews with subjects or their proxies; measuring pulse rate, weight, and blood pressure of surviving participants; collecting hospital and nursing home records of overnight stays; and collecting death certificates of decedents.

6.2 - Analysis

6.2 - Analysis

Cohort studies often aim to estimate disease occurrence by cumulative incidence or incidence rates.

An important component of calculating the incidence rate is the calculation of person-time. For each person in the study, the time they contribute is the time from study enrollment until becoming a case, or the time until study completion or dropping out of the study.  

The incidence rate is the number of persons who newly experience the outcome during a specified period of time divided by the sum of the time that each member of the population is at risk.

Since all people in the cohort study are non-cases at the start, if there is a specific exposure of interest, the relative risk of becoming a case can be calculated for the exposed versus non-exposed.  More details regarding analysis methods will be seen later in the course. 

6.3 - Comparing & Combining Case-Control and Cohort Studies

6.3 - Comparing & Combining Case-Control and Cohort Studies

Comparison of Cohort and Case-control Studies

Comparative Term Cohort Study Case-Control Study
Estimates Can calculate incidence rate, risk, and relative risk Only estimates odds ratio
Causality Potentially greater strength for causal investigations Potentially weaker causal investigation
Cost Expensive Less expensive
Time to complete Long-term study Short-term study
Sample size A large sample size often required, especially for rare outcomes Can be powered with a small sample of cases
Efficient designs Efficient design for rare exposure & multiple outcomes Efficient design for rare diseases & multiple exposures
Recall bias Less potential for recall bias More potential for recall bias
Loss to followup More potential for loss to followup Less potential for loss to followup
The natural course of the disease Yes No

Nested Case-Control Study Design

This is a case-control study within a cohort study. At the beginning of the cohort study, (t0), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time t1. The control group is selected from the risk set (cohort members who do not meet the case definition at t1.) Typically, the nested case-control study is less than 20% of the parent cohort.

Advantages of nested case-control

  • Efficient – not all members of the parent cohort require diagnostic testing
  • Flexible – allows testing of hypotheses not anticipated when the cohort was drawn (at t0)
  • Reduces selection bias – cases and controls sampled from the same population
  • Reduces information bias – risk factor exposure can be assessed with an investigator blind to case status


  • Reduces power (from parent cohort) because of reduced sample size by 1/(c+1), where c = number of controls per case

Nested case-control studies can be matched, not matched, or counter-matched. Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect of confounding variables.

A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.


Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study. In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!

Case-Cohort Study Design

A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time t1, after baseline. In a case-cohort study, the cohort members were assessed for risk factors at any time prior to t1. Non-cases are randomly selected from the parent cohort, forming a subcohort. No matching is performed.

Advantages of Case-Cohort Study:

Similar to nested case-control study design:

  • Efficient– not all members of the parent cohort require diagnostic testing
  • Flexible– allows testing hypotheses not anticipated when the cohort was drawn (t0)
  • Reduces selection bias – cases and non-cases sampled from the same population
  • Reduced information bias – risk factor exposure can be assessed with an investigator blind to case status

Other advantages, as compared to nested case-control study design:

  • The subcohort can be used to study multiple outcomes
  • Risk can be measured at any time up to t1  (e.g. elapsed time from a variable event, such as menopause, or birth)
  • Subcohort can be used to calculate person-time risk

Disadvantages of Case-Cohort Study:

As compared to nested case-control study design  –  Increased potential for information bias because subcohort may have been established after t0 exposure information collected at different times (e.g. potential for sample deterioration)


6.4 - Lesson 6 Summary

6.4 - Lesson 6 Summary

Cohort studies are the second main type of observational study design in epidemiology.  This design is desirable when there are many potential outcomes you want to investigate, when the goal is a direct measure of incidence or risk, and for rare exposures.  Cohort studies can be prospective, where the investigators assemble the cohort and then follow them to observe outcomes, or retrospective when the investigators identify the cohort based on past exposures and evaluate outcomes that have already occurred. Prospective cohort studies are less vulnerable to bias and can evaluate the temporal relationship between exposure and outcome.  Retrospective studies are good for diseases with long induction and latent periods.  Some weaknesses of cohort studies are that they are inefficient for rare outcomes, can be very expensive and time-consuming, and retrospective cohort studies can be more prone to bias. There are many examples of large and small cohort studies, some of which have been going on for decades, and there are still many questions that can be answered from these studies! 

Lesson 7 - Other Types of Study Designs: Cross-Sectional, Ecologic, Experimental

Lesson 7 - Other Types of Study Designs: Cross-Sectional, Ecologic, Experimental

Lesson 7 Objectives

Upon completion of this lesson, you should be able to:

  • Compare advantages/ disadvantages of cross-sectional and ecological studies
  • Describe ecological fallacy
  • Describe the main difference between observational and experimental studies
  • Identify design considerations unique to intervention studies including equipoise, randomization, and masking

7.1 - Cross sectional studies

7.1 - Cross sectional studies

Rationale and Design

A cross-sectional study is a study with individual-level variables that measures exposure and disease at one point in time. In other words, cross-sectional studies take a snapshot of a population. These types of studies are often used for public health planning.

Advantages & Disadvantages of cross-sectional studies


  • Highly generalizable when based on a sample of the general population
  • Low cost and short time period needed to conduct


  • Cannot infer temporal sequence between exposure and disease
  • Identify a high proportion of prevalent cases of long duration
  • Can suffer from the “healthy worker effect” - where those who remain employed tend to be healthier than those who leave employment.

Examples in Research

This includes the National Center for Health Statistics study on Products - Data Briefs - Number 443 - August 2022 (cdc.gov)
A key finding = The percentages of both men and women who met the guidelines for both aerobic and muscle-strengthening activities decreased with age.

a graph showing the percentages of both men and women who met the guidelines for both aerobic and muscle-strengthening activities


This includes Talking With Children About Prognosis: The Decisions and Experiences of Mothers With Metastatic Cancer

A Key finding = Nearly 80% (n = 176) of mothers with metastatic cancer reported they had discussed their prognosis with at least one of their children; 79% identified at least one barrier to these discussions.

a pie chart showing the percentage of women with breast cancer and if they told their children about their diagnosis

This includes Occupational health nurses’ personal attitudes toward smoking: A cross‐sectional study

A key finding =

Table 2

Relationship between the training experience in smoking cessation and the perceived harmfulness
of smoking and social responsibility of healthcare professionals

Opinion Training Experience a
(n = 67)
No Training Experience b
(n = 41)

n % n % n %
Smoking is harmful to health
Strongly Agree 62 92.5% 31 75.6% 93 86.1% .024
Agree 5 7.5% 9 22.0% 14 13.0%  
Undecided 0 0.0% 1 2.4% 1 0.9%  
Disagree &
Strongly Disagree
0 0.0% 0 0.0% 0 0.0%  
Healthcare professionals have a social responsibility to warn smokers of the harmful effects of smoking.
Strongly Agree 42 62.7% 15 36.6% 57 52.8% .017
Agree 22 32.8% 20 48.8% 42 38.9%  
Undecided 3 4.5% 6 14.6% 9 8.3%  
Disagree &
Strongly Disagree
0 0.0% 0 0.0% 0 0.0%  

a Nurses with training experience in interventions to stop smoking.

b Nurses without training experience in interventions to stop smoking.

* Fischers exact test.


7.2 - Ecological studies

7.2 - Ecological studies

Rationale and Ecological Variables

An ecological study is an observational study in which at least one variable is measured at the group level. An ecological study is especially appropriate for the initial investigation of causal hypotheses.
So...why conduct an ecological study? Several reasons support using an ecological study design.

  • The hypothesis is relatively new
  • Adequate measurement of individual-level variables is not possible
  • Adequate design of an individual-level study is not possible (i.e., not ethical)
  • We are interested in the effect of ecological variables, for which there is no correlation at the individual level
  • We have limited funds or limited time to do the study

Three types of ecological variables:

Aggregate Variables
A summary or composite measure derived from values collected from individual members of a population. Aggregate variables can measure exposure (e.g., mean blood pressure) or outcome (e.g., rate of disease) variables. One limitation with aggregate measures is that there is variation within the population - not all the individuals in the population have the average blood pressure.
Environmental Variables
A measure of the physical characteristics of the environment in which people reside, work, recreate or attend school. For example, we might hypothesize that rainfall is a risk for a fungal disease or that the content of minerals in drinking water is protective against a certain disease. Therefore, environmental variables would be the mean rainfall in a geographic area or the mean level of minerals in drinking water. Environmental variables measure exposure, not outcomes. One limitation of an environmental variable is that there is variation in exposure levels for individuals in the population.
Global Variables (Measure Exposure)
A measure of the attributes of groups, organizations, or places for which there is no analog at the individual level. For example, the procedures or treatments that are covered in a health insurance plan might affect the rate of disease or adverse health outcomes. Additionally, population density would be another global variable because crowding might be an important exposure. There is no individual population density! Global variables are used to measure exposures, not outcomes.

Advantages & Disadvantages of Ecological Studies


  • Can be done quickly and inexpensively bc rely on pre-existing data
  • Analysis and presentation are relatively simple
  • Can achieve a wider range of exposure levels than can be expected from an individual-level study
  • Help explain population-level associations


  • Ecological fallacy - the possibility of making incorrect conclusions about individual-level associations when only using group-level data
  • Lack of information on important variables

Analysis of Ecologic Studies

Analytic models in ecologic studies are of different forms:

Completely Ecologic
All variables (outcome, exposure, and covariates) are ecological.
Partially Ecologic
Some, but not all, variables are ecological.
Analyses may simultaneously include individual and ecological variables on the same construct (e.g., income). This could be called multilevel modeling, hierarchical regression, or mixed-effects modeling.

7.2.1 - Sample Ecological Data and Analysis

7.2.1 - Sample Ecological Data and Analysis

The following data illustrate a problem with the interpretation of ecological studies. The data include the numbers in an exposed and non-exposed group and the disease rate per 100,000 person-years within each of the three different groups.
With the data given, we can calculate the exposure rates per group as:

\(\text { Exposure rate }=\dfrac{P Y_{\text {of exposed }}}{\text { total } P Y}\)


Exposure Group 1 Group 2 Group 3
  Cases PY


Cases PY Rate/
Cases PY Rate/
Exposed (x=1) 20 7000   20 10000   20 13000  
Unexposed (x=0) 13 13000   10 10000   7 7000  
Total 33 20000 165 30 20000 150 27 20000 135
Exposure Rate   35%     50%     65%  

What is the relationship between exposure level and disease rate per 100,000 person-years?
Once we can calculate the exposure rate in each group, we see that as exposure rates increase, disease rates decrease.
The natural conclusion would seem to be that exposure protects individuals from the disease by decreasing the rate of disease.

So...would you want to be exposed to this factor in order to cut your disease risk? Or would you like to ask further questions?

What about the fact that we have no data measured at the individual level? For example, do we know the exposure level and the disease outcome for each person in the study? NO! In fact, all the cases could have actually occurred among the exposed individuals. This would be a problem if our hypothesis was that a biological process was responsible for the increased risk.
Consider these tables:


T otal Sample Non-cases Case A C D B T otal T Non-Cases T Cases T Exposed T Non-exposed Exposed Not Exposed Non-cases Case A 1 ?? B 1 ?? C 1 ?? D 1 ?? A 2 ?? B 2 ?? C 2 ?? D 2 ?? T otal T Non-Cases T Cases T Exposed T Non-exposed Exposed Not Exposed Non-cases Case T otal T Non-Cases T Cases T Exposed T Non-exposed Exposed Not Exposed Stratum 1 Stratum 2


Stratum 1 and Stratum 2 are similar to the groups, of which there were 3, in the previous example. We don't know the numbers for each cell within any stratum, nor do we know A, B, C, or D for the combined data. Only the marginal counts are known - the number exposed and unexposed, and the numbers of cases and non-cases within each stratum. So, if our hypothesis for the risk pathway is biological, then we run the risk of an ecological fallacy. An ecological fallacy is possible when we use group-level data as evidence for risk pathways that operate at the individual level because we are ascribing group observations to the individual! (Note: Group-level data are appropriate if our hypothesis is that the disease pathway is from a group-level exposure. Group-level exposures are recognized as important in disease causation models with both individual and group processes).

Individual-level Data and Analysis

To demonstrate the ecological fallacy, let's look at the individual-level data from the same example. We will fill in the number of cases within each cell for each group. For instance, in group 1, there were 20 cases in 7,000 person-years of being at-risk.
Then we can calculate the rates per 100,000PY for each exposure level in each group as:

\(\text { Rate per } 100,000 \mathrm{PY}=\left(\dfrac{\text { #of cases }}{\text { total person-years }}\right) * 100,000 \mathrm{PY}\)
Exposure Group 1 Group 2 Group 3
  Cases PY


Cases PY Rate/
Cases PY Rate/
Exposed (x=1) 20 7000 286 20 10000 200 20 13000 154
Unexposed (x=0) 13 13000 100 10 10000 100 7 7000 100
Total 33 20000 165 30 20000 150 27 20000 135
Exposure Rate   35%     50%     65%  

Next, we can calculate the Rate Difference and Rate Ratio within each group as

\(\text { Rate Difference }=\text { Rate }_{\text {Exposed }}-\text { Rate }_{\text {Unexposed }}\)

\(\text { Rate Ratio }=\dfrac{\text { Rate }_{\text {Exposed }}}{\text { Rate }_{\text {Unexposed }}}\)

Exposure Group 1 Group 2 Group 3
  Cases PY


Cases PY Rate/
Cases PY Rate/
Exposed (x=1) 20 7000 286 20 10000 200 20 13000 154
Unexposed (x=0) 13 13000 100 10 10000 100 7 7000 100
Total 33 20000 165 30 20000 150 27 20000 135
Exposure Rate   35%     50%     65%  
Rate Difference 186     100     54
Rate Ratio 2.86     2.00     1.54

When we look at each group separately, we see that exposure is related to a higher rate of disease!

So, we would conclude that exposure increases the risk of this outcome, which is the opposite of what we concluded previously! We also observe that the rate of disease among the non-exposed was the same for all groups. Across groups, the rate of disease among the exposed was higher than the unexposed, but the rate seems to vary among the exposed groups.

Recall, that when we used the group-level (ecological) data we saw that this exposure appeared to be protective. HOWEVER, given the individual-level data, exposure appears to increase the risk of disease! This is an example of an ecological fallacy (or ecological bias)... using group-level data to support an individual pathway.

Can an ecological study produce results without ecological bias? Yes, under certain conditions...

If the rate difference is the same - If the rate difference is the same across the groups, there will be no ecological fallacy.

7.3 - Experimental Studies

7.3 - Experimental Studies

Rationale and Design

Experimental studies are used to investigate the role of some agent or intervention in the causation, prevention, or treatment of a disease or outcome.  The investigators assign patients to either receive (or not receive) the intervention, which is the most distinguishing differentiation between an experimental study and an observational study.

More can also be learned about experimental studies in both STAT 503 and STAT 509

Design considerations


In order to conduct an experimental study, equipoise must exist.  Equipoise exists when there is legitimate uncertainty about the benefit of the intervention.  

Type of intervention (note: this is not an exhaustive list!)

  • Drug
  • Educational
  • Behavioral
  • Environmental
  • Organizational (clinic level, hospital level)

Treatment assignment and randomization

Random assignment is the ideal way to assign treatment conditions because it is less prone to bias.  With a large enough sample size, researchers can feel confident that randomization controls for both known and unknown confounders
  • Methods
    • Simple randomization - if the planned sample size for a study is n=100, a randomization scheme for assignment can be created. But it is possible that the computer-generated sample assignment could be that the first 50 patients are assigned control, and the last 50 are assigned to the treatment arm.  This is not an ideal randomization scheme.  To combat this, one can use blocking and/or stratification.  
    • Blocking - for the planned sample size of 100, one might choose to use block randomization with blocks of size 20.  So, for patients 1-20, we’d know that there would be 10 patients assigned to the control arm and 10 to the treatment arm.  This would be repeated for the next 4 sets of 20 enrolled patients.  This method keeps randomization balanced over the course of the study enrollment.  
    • Stratification - There may be one variable that we believe is likely to be a strong confounder and we want to make sure it is balanced between arms.  Thus, we perform randomization separately within each stratum.
  • Assignment - the intervention can be assigned at either the individual level or at a cluster level
    • Individual level - Each participant in the study is assigned to their experimental condition separately.   When the treatment is a specific medication or educational/behavioral material given to the participant, this usually works fine.  If, however, the study is being conducted in a medical center, and clinicians see patients in both the treatment and control arms of a study, there might be a spillover of the intervention into all patients.
    • Cluster level - experimental conditions are assigned at a cluster/group level.  In a hospital, this may be when clinicians are assigned to a certain condition, and thus all patients they see receive the same intervention. Or when many hospitals are selected for a trial, and the entire hospital does or does not receive the intervention.  In a school, this could be at the classroom, grade, or even school level.   
  • Randomization
    Random assignment is the ideal way to assign treatment conditions because it is less prone to bias. With a large enough sample size, researchers can feel confident that randomization controls for both known and unknown confounders
    • Methods
      1. Simple randomization - if the planned sample size for a study is n=100, a randomization scheme for assignment can be created. But it is possible that the computer-generated sample assignment could be that the first 50 patients are assigned control, and the last 50 are assigned to the treatment arm.  This is not an ideal randomization scheme.  To combat this, one can use blocking and/or stratification. 
      2. Blocking - for the planned sample size of 100, one might choose to use block randomization with blocks of size 20.  So, for patients 1-20, we’d know that there would be 10 patients assigned to the control arm and 10 to the treatment arm.  This would be repeated for the next 4 sets of 20 enrolled patients.  This method keeps randomization balanced over the course of the study enrollment. 
      3. Stratification - There may be one variable that we believe is likely to be a strong confounder and we want to make sure it is balanced between arms.  Thus, we perform randomization separately within each stratum.


Masking can be done in various forms to help ensure that experimental conditions are as identical as possible between study arms.  If participants and/or investigators know who is receiving which interventions they can be biased in their assessment of its effect.  
  • Methods
    • Drug trials often use placebo pills which look exactly the same, but do not contain the active ingredient of interest
    • Sham procedures can be used that resemble the treatment, but this can be hard based on the nature of the procedure (ie. surgery)
    • Trials that are evaluating educational interventions can provide the treatment group with all the official study material, but also provide the control group a basic pamphlet that may already be around the clinic so that both groups receive some reading material.  
  • Outcome Assessment - when an assessment of outcomes is subjective, it is very important to employ masking, and this can be done at different levels
    • Single-masked: participants but not investigators are masked.  If patients are the ones providing the assessment (via self-reports), this is usually sufficient.
    • Double-masked: both participants and investigators are masked.  If investigators are also tasked with providing some subjective report of efficacy it is best to keep investigators masked as well
    • Triple-masked: participants, investigators, and monitoring committee members are all masked.  This situation arises for larger trials when it is important to monitor serious adverse or beneficial events and possibly recommend an early ending for the study. 

Advantages & Disadvantages


  • Can control study conditions to isolate effect of interest
  • Random assignment to study conditions can eliminate baseline differences between groups
  • Use of placebo controls can allow masking which prevents biased ascertainment of outcome measures


  • Can be very expensive
  • Patients may be unwilling to be “guinea pigs” for science
  • Ethical issues - only permissible when there is a state of equipoise in the medical community
  • Placebo controls vs another effective treatment as control
  • Patients may be non-compliant with treatment regimen
  • Eligibility can be strict and not allow generalization back to all real-world scenarios

Examples in Research

Effect of Clinical Decision Support With Audit and Feedback on Prevention of Acute Kidney Injury in Patients Undergoing Coronary Angiography: A Randomized Clinical Trial | Acute Kidney Injury | JAMA | JAMA Network

Design elements include:

  • Educational and organizational intervention
  • Cluster randomized
  • No masking

Five-Year Outcomes in Patients With Fully Magnetically Levitated vs Axial-Flow Left Ventricular Assist Devices in the MOMENTUM 3 Randomized Trial | Cardiology | JAMA | JAMA Network

Protocol for this study

Design elements include:

  • Medical device intervention (intervention arm= fully magnetically levitated centrifugal-flow, control arm = axial-flow LVAD
  • Stratified randomization within each center, 1:1 randomization
  • No masking

Evidence - PROSPER in PA (psu.edu)

7.4 - Lesson 7 Summary

7.4 - Lesson 7 Summary

Additional study designs included in this lesson were cross-sectional, ecological, and experimental. Along with case-control and cohort study designs, these give a good overview of typical study designs used for epidemiologic research. There are also other study designs as well that are not covered in this class, including specific types of experimental study designs. The most important thing to keep in mind is the research question. Only once the investigators are clear about the research question, can the team decide the best study design to use. The study design must fit and be able to answer the research question and not the other way around.  

Cross-sectional studies take a snapshot of a population and are often used for public health planning. The National Health Interview Survey is an example of a cross-sectional survey that has been done over the years to monitor the health of the US population. Although it is done yearly, the same individuals are not included, so it is different from a cohort study. Ecological studies include at least one study variable measured at the group level. These can be very useful for the initial investigation of causal hypotheses, but it is important to be aware and cautious of the possibility of an ecological fallacy. Finally, experimental studies are in contrast to observational studies in that the investigators assign participants to certain study conditions. Both observational and experimental studies are important for the advancement of public health, and it is important to consider the pros and cons of each when deciding how best to study the research/policy/health question of interest. 

Lesson 8 - Bias, Confounding, Random Error, & Effect modification

Lesson 8 - Bias, Confounding, Random Error, & Effect modification

Lesson 8 Objectives:

Upon completion of this lesson, you should be able to:

  • Identify different types of bias
  • Describe 2 main sources of random error
  • List the pros and cons of using p-values and confidence intervals for hypothesis testing
  • Describe methods for assessing the presence of confounding and effect modification
  • Distinguish between confounding and effect modification
  • Describe methods for controlling for confounding
  • Describe methods for presenting results when effect modification is present

8.1 - Bias

8.1 - Bias

Bias is a systematic error in the design, recruitment, data collection, or analysis that results in a mistaken estimation of the true effect of the exposure and the outcome.

Bias occurs when the method used to select subjects or collect data results in an incorrect association.  Bias is something to consider while designing the study - it usually cannot simply be “corrected” in the analysis stage of the study.  

Two main types of bias are selection and information bias.

Selection Bias
Selection bias is the systematic error in the selection or retention of participants.

Examples of selection bias:

  • Suppose you are selecting cases of rotator cuff tears (a shoulder injury). Many older people have experienced this injury to some degree, but have never been treated for it. Persons who are treated by a physician are far more likely to be diagnosed (and identified as cases) than persons who are not treated by a physician. If a study only recruits cases among patients receiving medical care, there will be selection bias.
  • Some investigators may identify cases predicated upon previous exposure. Suppose a new outbreak is related to a particular exposure, for example, a particular pain reliever. If a press release encourages people taking this pain reliever to report to a clinic to be checked to determine if they are a case and these people then become the cases for the study, a bias has been created in sample selection. Only those taking the medication were assessed for the problem. Ascertaining a case based upon previous exposure creates a bias that cannot be removed once the sample is selected.
  • Exposure may affect the selection of controls – e.g, hospitalized patients are more likely to have been smokers than the general population. If controls are selected among hospitalized patients, the relationship between an outcome and smoking may be underestimated because of the increased prevalence of smoking in the control population.
  • In a cohort study, people who share similar characteristics may be lost to follow-up. For example, people who are mobile are more likely to change their residence and be lost to follow-up. If the length of residence is related to the exposure then our sample is biased toward subjects with less exposure.
  • In a cross-sectional study, the sample may have been non-representative of the general population. This leads to bias. For example, suppose the study population includes multiple racial groups but members of one race participate less frequently in the type of study.
Information Bias
Information bias, (also known as misclassification bias) is the systematic error due to inaccurate measurement or classification of disease, exposure, or other variables.

Examples of information bias

  • Instrumentation - an inaccurately calibrated instrument creating a systematic error
  • Misdiagnosis - if a diagnostic test is consistently inaccurate, then information bias would occur
  • Recall bias - if individuals can't remember exposures accurately, then information bias would occur
  • Missing data - if certain individuals consistently have missing data, then information bias would occur
  • Socially desirable response - if study participants consistently give the answer that the investigator wants to hear, then information bias would occur

Misclassification can be differential or non-differential.

Differential misclassification

The probability of misclassification varies for the different study groups, i.e., misclassification is conditional upon exposure or disease status.
Are we more likely to misclassify cases than controls? For example, if you interview cases in person for a long period of time, extracting exact information while the controls are interviewed over the phone for a shorter period of time using standard questions, this can lead to differential misclassification of exposure status between controls and cases.

Nondifferential misclassification

The probability of misclassification does not vary for the different study groups; is not conditional upon exposure or disease status, but appears random. Using the previous example, if half the subjects (cases and controls) were randomly selected to be interviewed by phone and the other half were interviewed in person, the misclassification would be nondifferential.

Either type of misclassification can produce misleading results.

8.2 - Random Error

8.2 - Random Error
Random Error
Random error is the false association between exposure and disease that arises from chance and can arise from two sources - measurement error and sampling variability.

Measurement error

Measurement error occurs when there is a mistake in assessing the exposure or the outcome.  
Consider the figure below. If the true value is the center of the target, the measured responses in the first instance are the goal. There is a negligible random error (high precision), and the measurements are accurate (reliable). The second target has a negligible random error, but the measurements are not accurate. The third has a random error but is still accurate. The fourth has a random error and is inaccurate.  

Accuracy and Precision
Accuracy and Imprecision
Inaccuracy and Precision
Inaccuracy and Imprecision

Methods to increase precision and reduce random error

  1. Increase the sample size of the study
  2. Repeat a measurement with a study

Sampling variability

Sampling variability refers to the fact that there are a huge number of possible samples that can be drawn from any single population, and there will be variation among the different possible samples that could be selected. We take a sample because it is not feasible to measure the entire population and we hope that the sample we select is representative of the population. It is best to choose a random sample rather than a non-random one, but a random sample can still be unrepresentative of the population simply by chance. Selecting a large enough sample size can help minimize the chance of selecting an unrepresentative sample.  

P-values and confidence intervals

Epidemiologists use hypothesis testing to assess the role of random error and to make statistical inferences. P-values can be useful in understanding relationships, but they cannot be the only tools used to make inferences.  It is very important to provide good estimates along with confidence intervals in order to make good scientific conclusions.

P-value:  the probability of obtaining the test statistic you got, or one more extreme, assuming that the null hypothesis is true.
  • Probability between 0-1
  • p<0.05 is a typical cut-off for significance
  • Small – evidence to suggest a difference in groups
  • Large – no evidence to suggest a difference in groups

Issues with p-values

  • Dependent on both the magnitude of the association and the sample size
  • Can be viewed as providing black/white conclusions.  If p<0.05 claim significance, but p>0.05 is non-significant.  How different really is a p-value of 0.04 versus 0.06?
  • statistical significance does not imply clinical significance

Read The ASA Statement on p-Values: Context, Process, and Purpose (tandfonline.com).  The ASA (American Statistical Association) concludes that:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.

Confidence Intervals

Confidence Intervals provide a way to quantify the amount of random error in an estimate. Once the estimate of interest is calculated (ie. cumulative incidence, incidence rate, etc), there are many formulas (depending on the measure) that can be used to calculate the confidence interval.

General Formula for a 95% CI
The general formula for a 95% CI is estimated ± 1.96*sn, where s is the standard deviation and n is the sample size. The 1.96 is from the normal distribution, to estimate a 95% CI.

You can see that the width of the CI will decrease as the sample size increases, and as the standard deviation decreases. 

The true parameter from the population is unknown (because we can’t measure the entire population), so we calculate our estimate from the sample we selected. Once we put a CI around our estimate, it either does or does not contain the true estimate - we don’t know.  The idea of the confidence interval is that if we repeated the exercise (select a sample, calculate the estimate, calculate the CI) that 95% of the CIs we constructed would contain the true estimate. It does NOT mean that we are 95% confident that the CI contains the true mean. As stated above, it either does or does not, we can’t know which.  

8.3 - Confounding

8.3 - Confounding

Confounding is a situation in which the effect or association between an exposure and outcome is distorted by the presence of another variable. Positive confounding (when the observed association is biased away from the null) and negative confounding (when the observed association is biased toward the null) both occur.

If an observed association is not correct because a different (lurking) variable is associated with both the potential risk factor and the outcome, but it is not a causal factor itself, confounding has occurred. This variable is referred to as a confounder. A confounder is an extraneous variable that wholly or partially accounts for the observed effect of a risk factor on disease status. The presence of a confounder can lead to inaccurate conclusions.


A confounder meets each of the following three criteria:

Putative risk factor Disease P ossible confounder ? ?
  1. It is a risk factor for the disease, independent of the putative risk factor.
  2. It is associated with putative risk factor.
  3. It is not in the causal pathway between exposure and disease.

The first two of these conditions can be tested with data. The third is more biological and conceptual.

Confounding masks the true effect of a risk factor on a disease or outcome due to the presence of another variable. We identify potential confounders from our:

  1. Knowledge
  2. Prior experience with data
  3. Three criteria for confounders

We will talk more about this later, but briefly here are some methods to control for a confounding variable (if the confounder is suspected a priori):

  • randomize individuals into different groups (use an experimental approach)
  • restrict/filter for certain groups
  • match in case-control studies
  • analysis (stratify, adjust)

Controlling potential confounding starts with a good study design including anticipating potential confounders.

Example: Coronary Heart Diseas and Diabetes

Suppose as part of the cross-sectional study we survey patients to find out whether they have coronary heart disease (CHD) and if they are diabetic. We generate a 2 × 2 table (below):

Category CHD No CHD Total
Diabetes 26 (12%) 190 216
No Diabetes 90 (3.9%) 2241 2331
Total 116 2431 2547

The prevalence of coronary heart disease among people without diabetes is 90 divided by 2331, or 3.9% of all people with diabetes have coronary heart disease. By a similar calculation, the prevalence among those with diabetes is 12%. A chi-squared test shows that the p-value for this table is p<0.001. The large sample size results in a significant p-value, and the magnitude of the difference is fairly large (12% v 3.9%).

Prevalence Ratio (PR):
The prevalence ratio, considering whether diabetes is a risk factor for coronary heart disease is 12 / 3.9 = 3.1. Thus, people with diabetes are 3.1 times as likely to have CHD than those without diabetes.
Odds Ratio (OR):
The odds ratio, considering whether the odds of having CHD is higher for those with versus without diabetes is ( 2241 × 26) / ( 90 × 190) = 3.41. The odds of having CHD among those with diabetes is 3.41 times as high as the odds of having CHD among those who do not have diabetes.

Which of these do you use? They come up with slightly different estimates.
It depends upon your primary purpose. Is your purpose to compare prevalences? Or, do you wish to address the odds of CHD as related to diabetes?

Now, let's add hypertension as a potential confounder. There are 3 criteria to evaluate to assess if hypertension is a confounder.

  1. "Is hypertension (confounder) associated with CHD (outcome)?" also could be thought of as “Is hypertension a risk factor for CHD, independent of diabetes?”

    First of all, prior knowledge tells us that hypertension is related to many heart related diseases. Prior knowledge is an important first step but let's test this with data.  We look at this relationship just among the non-diabetics, so as to not complicate the relationship between the confounder and the outcome.

    Consider the 2 × 2 table below:

    Category CHD No CHD Total
    Hypertension 39 (5.5%) 669 708
    No Hypertension 51 (3.1%) 1572 1623
    Total 90 2241 2331

    PR = 1.75
    OR = 1.80

    The prevalence of coronary heart disease among people without hypertension is 51 divided by 1623, or 3.1% of all people with hypertension have coronary heart disease. By a similar calculation, the prevalence among those with hypertension is 5.5%. A chi-squared test shows that the p-value for this table is p=0.006.  The large sample size results in a significant p-value, even if the magnitude of the difference is not large.  But yes, we see that hypertension is associated with higher rates of CHD. 

  2. This leads us to our next question, "Is hypertension (confounder) associated with diabetes (exposure)?"
    Category Diabetes No Diabetes Total
    Hypertension 133(15.8%) 708 841
    No Hypertension 83(4.9%) 1623 1706
    Total 216 2331 2547

    PR = 3.25
    OR = 3.67

    The prevalence of diabetes among people without hypertension is 83 divided by 1706, or 4.9% of all people with hypertension have diabetes. By a similar calculation, the prevalence among those with hypertension is 15.8%. A chi-squared test shows that the p-value for this table is p<0.001.  The large sample size results in a significant p-value, and the magnitude of the difference is fairly large.  

  3. A final question, "Is hypertension an intermediate pathway between diabetes (exposure) and development of CHD?"

    – or, vice versa, does diabetes cause hypertension which then causes coronary heart disease? Based on biology, that is not the case. Diabetes in and of itself can cause coronary heart disease. Using the data and our prior knowledge, we conclude that hypertension is a major confounder in the diabetes-CHD relationship.

    What do we do now that we know that hypertension is a confounder?

    Stratify....let's consider some stratified assessments...
    Among hypertensives:
    Category CHD No CHD Total
    Diabetes 20 (15%) 113 133
    No Diabetes 39 (5.5%) 669 708
    Total 59 782 841

    PR = 2.73
    OR = 3.04

    Among non-hypertensives:
    Category CHD No CHD Total
    Diabetes 6 (7%) 77 83
    No Diabetes 51 (3.1%) 1572 1623
    Total 57 1649 1706

    PR = 2.30
    OR = 2.40

    Both estimates of the odds ratio (hypertensives OR=3.04, non-hypertensives OR= 2.40) are lower than the odds ratio based on the entire sample (OR=3.41). If you stratify a sample, without losing any data, wouldn't you expect to find the crude odds ratio to be a weighted average of the stratified odds ratios?  A similar phenomenon occurs with the prevalence ratios: (hypertensives PR=2.73, non-hypertensives PR= 2.30) when the PR for the entire sample was 3.1.

This is an example of confounding - the stratified results are both on the same side of the crude odds ratio. This is positive confounding because the unstratified estimate is biased away from the null hypothesis. The null is 1.0. The true odds ratio, accounting for the effect of hypertension, is 2.84 from the Maentel Hanzel test. The crude odds ratio of 3.41 was biased away from the null of 1.0. (In some studies you are looking for a positive association; in others, a negative association, a protective effect; either way, differing from the null of 1.0). The adjusted prevalence ratio is 2.60.

This is one way to demonstrate the presence of confounding. You may have a priori knowledge of confounded effects, or you may examine the data and determine whether confounding exists. Either way, when confounding is present, as, in this example, the adjusted odds ratio should be reported. In this example, we report the odds ratio for the association of diabetes with CHD = 2.84, adjusted for hypertension.  Accordingly, the prevalence ratio for the association of diabetes with CHD is 2.60, adjusted for hypertension. 

8.4 - Effect Modification

8.4 - Effect Modification

Effect modification is not a problem that investigators need to protect against, instead, it is a natural phenomenon that the investigators wish to describe and understand. Different groups may have different risk estimates when effect modification is present.  

Effect modification occurs when the effect of a factor is different for different groups. We see evidence of this when the crude estimate of the association (odds ratio, rate ratio, risk ratio) is very close to a weighted average of group-specific estimates of the association. Effect modification is similar to statistical interaction, but in epidemiology, effect modification is related to the biology of disease, not just a data observation.

In the hypertension example, we saw both stratum-specific estimates of the odds ratio went to one side of the crude odds ratio. With effect modification, we expect the crude odds ratio to be between the estimates of the odds ratio for the stratum-specific estimates.
Why study effect modification? Why do we care?

  • to define high-risk subgroups for preventive actions,
  • to increase the precision of effect estimation by taking into account groups that may be affected differently,
  • to increase the ability to compare across studies that have different proportions of effect-modifying groups, and
  • to aid in developing a causal hypothesis for the disease

If you do not identify and handle properly an effect modifier, you will get an incorrect crude estimate. The (incorrect) crude estimator (e.g., RR, OR) is a weighted average of the (correct) stratum-specific estimators. If you do not sort out the stratum-specific results, you miss an opportunity to understand the biological or psychosocial nature of the relationship between risk factors and outcomes.

Planning for effect modification investigation

To consider effect modification in the design and conduct of a study:

  1. Collect information on potential effect modifiers.
  2. Power the study to test potential effect modifiers - if a priori you think that the effect may differ depending on the stratum, power the study to detect a difference.
  3. Don't match on a potentially important effect modifier - if you do, you can't examine its effect.
  4. To consider effect modification in the analysis of data:
  5. Again, consider what potential effect modifiers might be.
  6. Stratify the data by potential effect modifiers and calculate stratum-specific estimates of the effect of the risk on the outcome; determine if effect modification is present. If so, present stratum-specific estimates.


Continuing the use of our example for confounding, part of our research hypothesis may be that the relationship between diabetes and CHD is different for males and females.  Stratifying results by sex shows:

Category CHD No CHD Total
Diabetes 13 (12.3%) 93 106
No Diabetes 25 (2.1%) 1191 1216
Total 38 (2.9%) 1284 1322

PR = 5.97
OR = 6.66

Category CHD No CHD Total
Diabetes 13 (11.8%) 97 110
No Diabetes 65 (5.8%) 1050 1115
Total 78 (6.3%) 1147 1225

PR = 2.03
OR = 2.16

The prevalence ratio for females is 5.97, while it is only 2.03 for males.  The overall estimate is closer to a weighted average of the two stratum-specific estimates and thus sex does not seem to be a confounder.  Sex does modify the effect of diabetes on coronary heart disease.

Both groups have an increased risk of CHD for those with diabetes, but for females, those with diabetes are almost 6 times as likely to develop CHD.  This is in comparison to males, where those with diabetes are only about 2 times as likely to develop CHD.  Notice that the overall rates of CHD differ by sex as well.  Overall males have higher incidence of CHD (6.3%), but the differential risk for those with and without diabetes is not as large as in the females.  For females, the overall incidence of CHD is lower, at 2.9%, but the differential risk for those with and without diabetes is larger. 

Summary of confounding v effect modification

To review, confounders mask a true effect, and effect modifiers mean that there is a different effect of the exposure on the outcome for different groups.
In summary, the process is as follows:

  1. Estimate a crude (unadjusted) estimate between exposure and outcome.
  2. Stratify the analysis by any potential major confounders to produce stratum-specific estimates.
  3. Compare the crude estimator with stratum-specific estimates and examine the kind of relationships exhibited.

With a Confounder:

  • The crude estimator (e.g. RR, OR) is outside the range of the two stratum-specific estimators ( in the hypertension example - the crude odds ratio was higher than both of the stratum specific ratios).
  • If the adjusted estimator is importantly (not necessarily statistically) different (often 10%) from the crude estimator, the “adjusted variable” is a confounder. In other words, if including the potential confounder changes the estimate of the risk by 10% or more, we consider it important and leave it in the model.
  • Do not report the crude overall estimate (RR, OR).  Instead an adjusted estimator should be reported.  This can be done using the Mantel-Haenszel method or statistical modeling.  


With Effect modifiers:

  • The crude estimator (e.g. RR, OR) is closer to a weighted average of the stratum-specific estimators.
  • The two stratum-specific estimators differ from each other.
  • Report separate stratified models or report an interaction term.

8.5 - Lesson 8 Summary

8.5 - Lesson 8 Summary

In addition, to studying design considerations, investigators should also plan to maximize the validity of the study results by minimizing bias and random error.  Bias is a systematic error in the design, recruitment, data collection, or analysis that results in a mistaken estimation of the true effect of the exposure and the outcome, and it usually cannot be “corrected” during analysis.  Random error is the false association between exposure and disease that arises from chance and can arise from two sources - measurement error and sampling variability.  Ways to minimize random error include using large sample sizes, repeating measurements, and selecting random samples to hopefully select a sample that is representative of the population.  

Potential confounding and/or effect modification by variables other than the main exposure/outcome of interest are also very important to consider.  A confounder is a third variable that masks the true relationship between the exposure and outcome, and an effect modifier is a third variable that alters the relationship between the exposure and outcome.  It is important to consider these additional variables before data collection starts to make sure that all important variables are being measured.  Once the data have been collected, variables can be evaluated as potential confounders and/or effect modifiers.

Has Tooltip/Popover
 Toggleable Visibility