Lesson 5 - Case Control Studies

Lesson 5 - Case Control Studies

Lesson 5 Objectives

Upon completion of this lesson, you should be able to:

  • Recognize the case-control study design, and settings where it is useful
  • Apply different methods for defining exposures
  • Differentiate between direct and indirect methods for measuring exposure
  • Be familiar with sources to use for defining cases
  • Create the appropriate 2x2 table and calculate an odds ratio for data from a case-control study
  • Distinguish between matching and non-matching methods to select controls

5.1 - Rationale & Designs

5.1 - Rationale & Designs

Case-control studies are useful when epidemiologists investigate an outbreak of a disease because the study design is powerful enough to identify the cause of the outbreak especially when the sample size is small. Attributable risks may also be calculated.

Case-control study designs are used to estimate the risk for a disease from a specific risk factor. The estimate is the odds ratio, which is a good estimate of the relative risk especially when the disease is rare.

While a case-control study design offers less support for a causation hypothesis than the longer and more expensive cohort design, it does provide stronger evidence than a cross-sectional study.

Case-control studies are useful when:

  • Exposure data is difficult or expensive to obtain
  • The disease is rare
  • The disease has a long induction and latent period
  •  Little is known about the disease
  • The underlying population is dynamic

Case-Control Study Design

The approach for a case-control study is straightforward. Case-control studies begin by enrolling persons based on their current disease status. Previous exposure status is subsequently determined for each case and control. However, because these studies collect data after the disease has already occurred, they are considered retrospective, which is a limitation.

  Stop and Think!

Come up with an answer to this question and then click on the button below to reveal the answer.

Why can't we determine the incidence rate from a case-control study?

We have selected cases and controls from a population, often an unknown population. For example, we might enroll patients in a hospital, but we don't really know the size of the general population that would have come to the hospital. Also, we have not followed persons at risk to monitor the development of disease. Furthermore, the investigator selects the number of cases relative to the number of controls.

A most critical and often controversial component of a case-control study is the selection of the controls. Controls must be comparable to cases in every way except that they do not have the disease. Preferably controls are drawn from the same population as the cases. Some studies, though, draw the controls from a different data source. For example, cases may be detected from a disease registry but the controls are selected randomly from another data source. Controls should be selected without regard to their exposure status (e.g., exposed/non-exposed), but may be sampled proportional to their time at risk (which is called density sampling).

There are two basic types of case-control studies, distinguished by the method used to select controls.

  • Non-matched case-control study:  The first is a non-matched case-control study in which we enroll controls without regard to the number, or characteristics of the cases. In this study design, the number of controls does not necessarily equal the number of cases.
  • Matched case-control study: In a matched study, we enroll controls based on some characteristic(s) of the case. For example, we might match the sex of the control to the sex of the case. The idea in matching is to match upon a potential confounding variable in order to remove the confounding effect. (We will look at how matching occurs in the example below.)

    There are two basic types of matched designs:

    • one-to-n matching (i.e., one case to one control, or one case to a specific number of controls) and
    • frequency-matching, where matching is based upon the distributions of the characteristics among the cases. For example, 40% of the cases are females so we choose the controls such that 40% of the controls are females.

Case-Crossover Study Design

This design is useful when the risk factor/exposure is transient. For example, cell phone use or sleep disturbances are transitory occurrences. Each case serves as its own control, i.e the study is self-matched. For each person, there is a 'case window', the period of time during which the person was a case, and a 'control window', a period of time associated with not being a case. Risk exposure during the case window is compared to risk exposure during the control window.

Advantages of Case-crossover

  • Efficient – self-matching
  • Efficient – select only cases
  • Can use multiple control windows for one case window

Disadvantages of Case-crossover

  • Information bias – the inaccurate recall of exposure during the control window (can be overcome by choosing the control window to occur after the case window)
  • Requires careful selection of the time period during which the control window occurs (circumstance associated with the control window should be similar to circumstances associated with the case window; e.g., traffic volume)
  • Requires careful selection of the length and timing of the windows (e.g., in an investigation of the risk of cell phone usage in auto accidents, cell phone usage that ceases 30 minutes before an accident is unlikely to be relevant to the accident)

Example & Guidance Material


The first decade of experience with case-crossover studies has shown that the design applies best if the exposure is intermittent, the effect on risk is immediate and transient, and the outcome is abrupt. However, this design has been used to study single changes in exposure level, gradual effects on risk, and outcomes with insidious onsets. To estimate relative risk, the exposure frequency during a window just before outcome onset is compared with exposure frequencies during control times rather than in control persons. One or more control times are supplied by each of the cases themselves, to control for confounding by constant characteristics and self-confounding between the trigger's acute and chronic effects. This review of published case-crossover studies is designed to help the reader prepare a better research proposal by understanding triggers and deterrents, target person times, alternative study bases, crossover cohorts, induction times, effect and hazard periods, exposure windows, the exposure opportunity fallacy, a general likelihood formula, and control crossover analysis.

Read the Article


Because of the belief that the use of cellular telephones while driving may cause collisions, several countries have restricted their use in motor vehicles, and others are considering such regulations. We used an epidemiologic method, the case-crossover design, to study whether using a cellular telephone while driving increases the risk of a motor vehicle collision.


We studied 699 drivers who had cellular telephones and who were involved in motor vehicle collisions resulting in substantial property damage but no personal injury. Each person's cellular-telephone calls on the day of the collision and during the previous week were analyzed through the use of detailed billing records.


A total of 26,798 cellular telephone calls were made during the 14-month study period. The risk of a collision when using a cellular telephone was four times higher than the risk when a cellular telephone was not being used (relative risk, 4.3; 95 percent confidence interval, 3.0 to 6.5). The relative risk was similar for drivers who differed in personal characteristics such as age and driving experience; calls close to the time of the collision were particularly hazardous (relative risk, 4.8 for calls placed within 5 minutes of the collision, as compared with 1.3 for calls placed more than 15 minutes before the collision; P<0.001); and units that allowed the hands to be free (relative risk, 5.9) offered no safety advantage over hand-held units (relative risk, 3.9; P not significant). Thirty-nine percent of the drivers called emergency services after the collision, suggesting that having a cellular telephone may have had advantages in the aftermath of an event.


The use of cellular telephones in motor vehicles is associated with quadrupling the risk of a collision during the brief period of a call. Decisions about the regulation of such telephones, however, need to take into account the benefits of the technology and the role of individual responsibility.

Read the Article

5.2 - Basic Concepts of Exposure

5.2 - Basic Concepts of Exposure

In a general sense, exposure can be defined as any of a subject's attributes (association) or any agent (effect) with which the subject may come into contact. These attributes or agents may be relevant to his or her health (Armstrong et al., 1998).

This definition would include smoking, drinking, exposure through an occupation (farmers, pesticide applicators, etc.), or age (e.g., menopause >> endogenous estrogen levels) as exposures.

For an environmental factor, exposure can be more precisely defined as contact with some agent at the boundary between humans and the environment, at a specific concentration, over an interval of time (Wallace, 1995). Exposures can be harmful or beneficial.

Harmful - Environmental Tobacco Smoke, (ETS), Asbestos, ...
Beneficial Factors - Vitamin D intake, Colonoscopy as a preventive measure for colon cancer, …

Exposure Assessment

Exposure Assessment
The science that describes how an individual or population comes in contact with a risk factor, including quantification of the amount of the risk factor across space and time (Lioy, 1990)
  1. Exposure intensity: the agent/risk factor concentration in the medium that is in contact with the body
  2. Exposure frequency: designates how often the exposure occurs
  3. Exposure duration: the length of the time that the exposure occurs
  4. Microenvironment: defined as any location or activity in which a distinct exposure occurs.

When investigating whether exposure is related to the risk of disease, the epidemiologist must consider many possibilities. Could the effect be related to the concentration? the frequency of exposure? duration? or is there something peculiar to a certain micro-environment?

Example: BaP

Benzo[a]pvrene (BaP) in a hamburger that was cooked in a flame broiling process is a carcinogen. To assess exposure, you may ask: How burned is the burger? How frequently does the person eat burned hamburgers? How long have they been eating burned hamburgers? Is there an additional micro-environmental change that occurs within BaP, the molecule carrying the carcinogen? All these factors could be related to the ability of the exposure to cause a particular outcome.

Environmental tobacco smoke or second-hand smoke is another example. What questions might you ask?

(Consider: Are children exposed to environmental tobacco smoke in their home environment? How many hours are smokers smoking in the house? How many smokers? How frequently is the child in the house? How long have smokers lived in the house?)

Here's another example.

Example: Tobacco

Look at Tables 1 and 2 in Paskett et al 2007. What types of tobacco exposure are considered in this evaluation of the relationship of smoking with colorectal cancer? What is the effect of current smoking on the incidence of colorectal cancer?

From Table 1:

Smoking Status (Never, Past, Current)
Age at Smoking Initiation (Never, <20, ≥ 20)
Cigarettes per Day (Never, <25, ≥ 25)
Duration (Never, <20, 20-29, 30-39, ≥ 40)
Passive smoking status (Never, Ever)

From Table 2:

We can see that current smoking increased the risk of invasive colon cancer only 1.03 times (95% confidence interval for HR, 0.77 to 1.38) but that current smoking increased the risk of invasive rectal cancer 1.95 times (95% confidence interval, 1.10 to 3.47).

Other factors may result in differential effects of exposure. For example, serious adverse health outcomes may be limited to individuals in a specific setting, region, worksite, or community. Individual susceptibilities may differ. There may be differences in exposure due to the type of agent (chemical, biological or physical), the medium through which the exposure occurs (air, water, food), or the route of exposure (inhaled, ingested, or dermal). In considering a potential association of BaP in grilled burgers with a certain particular cancer, we might ask how much carcinogen is in the charred meat? how much is in the smoke arising from the meat drippings? etc.

Potentially important aspects of environmental exposure include:

Agent(s) biological, chemical, physical, single agent, multiple agents, mixtures
Source(s) anthropogenic/non-anthropogenic, area/point, stationary/mobile, indoor/outdoor
Transport/carrier medium air, water, soil, dust, food, product/item
Exposure pathway(s) eating contaminated food, breathing contaminated workplace air touching residential surface
Exposure concentration mg/kg (food), mg/litre (water), μg/m3 (air), μg/cm2 contaminated surface), % by weight, fibres/m3 (air)
Exposure route(s) inhalation, dermal contact, ingestion, multiple routes
Exposure duration seconds, minutes, hours, days, weeks, months, years, lifetime
Exposure frequency continuous, intermittent, cyclic, random, rare
Exposure setting(s) occupational/non-occupational, residential/non-residential, indoors/outdoors
Exposed population the general population, population subgroups, individuals
Geographic scope site/source-specific, local, regional, national, international, global
Time frame past, present, future, trends

As you can see, exposure assessment requires considering many factors that can affect exposure. Let's look further at some of these factors.

Exposure Measurements

Data can be collected by directly monitoring an individual or indirectly, each method with advantages and disadvantages.

Direct Method:

Monitor individuals using some measurement device on their person or by taking biological samples.

  • Personal monitoring: personal exposure monitors for particulate matter (PM), patches worn under clothing for pesticide applicators
  • Biological monitoring: lead concentrations in blood, biomarkers in urine, blood, breath, hair, nails


  • Provides exposure values with minimal assumptions
  • Assesses exposure and collects data at the individual level


  • May not be practical in a large epidemiological study because of the expense and effort required.
  • Heavily dependent upon ½ life of compound (how long it lasts in the body); may have affinity for certain tissues
  • Can be affected by inter-individual differences in metabolism, inter-laboratory variation and intra-individual variability due to diurnal variation, diet, season, etc.
  • If the chemical does not persist in the body, level will not reflect long-term exposure

Indirect Method:

Various possibilities

  • Questionnaires/diaries
  • Job exposure matrix (JEM)
  • Environmental monitoring/modeling - monitor the environment and assume that people who live in that environment are exposed at the level observed at the monitoring site
  • Calculate the concentration of an agent in all locations/activities and multiply by the duration spent in each location/activity, (Examples: Indoor and outdoor exposure to ambient air particles, exposure to  on the road)


  • Practical and less expensive, the indirect method has been used extensively in epidemiological studies
  • Often involves gathering data with a questionnaire.


  • Potential problems of a questionnaire: low response rate, non-responsiveness to an individual question, recall bias (not being able to remember), social desirability (false answers), low question validity and reliability.

Environmental Biomarker Research Example

The US EPA is working to connect levels of environmental contamination to the risk of adverse effects in the public's health with the ultimate goal of reducing this risk. Check out the Chemical Safety for Sustainability Strategic Research Action Plan 2016-2019 to see how the EPA is working toward this goal. In particular, see Topic 2: Life Cycle Analytics, beginning on page 19.

5.3 - Case Definitions

5.3 - Case Definitions

Basic Concepts of Case Definition:

Suppose you are asked to estimate the population prevalence of attention deficit hypersensitivity disorder (ADHD) among U.S. school-age children. How will you identify the children who should be counted as cases having ADHD? What defines a ‘case’?

The definition of a ‘case’ is critical in planning an epidemiologic investigation. The case definition must be carefully formulated to meet the objectives of the investigation, while also permitting valid comparisons with results from other studies. In this example, it may be of interest to consider whether the proportion of school-age children with ADHD has changed over a period of time. If the case definition changes significantly from one time period to the next, comparisons with previous years are problematic.
Suppose instead of estimating prevalence, the task is to define cases for a case-control study that is examining the risk from exposure. If the case definition is broad, it will be easier to include prospective cases, hastening the enrollment of study participants. However, variability among the cases will be greater than if the case definition was more narrow. A narrow case definition can slow the identification of sufficient numbers of cases but has the potential to reduce false positives.

Just as a clinical diagnosis for an individual requires meeting specific clinical and laboratory criteria, measuring disease frequency in populations requires the prior stipulation of which clinical, laboratory, epidemiologic, or quantitative criteria indicate the presence of the disease. Case definitions can include a degree of certainty (e.g. probable or confirmed, etc.) or specify the method to be used in assessing whether or not criteria are met.

In the United States, disease surveillance is not a responsibility directly given to the federal government. This means each state in the US may establish its own requirements for reporting diseases. Can you imagine the confusion if all 50 states had used their own definitions for different diseases? To assist the states, the U.S. Centers for Disease Control and Prevention (CDC) and the Council of State and Territorial Epidemiologists have published a set of uniform criteria for reporting cases of  Surveillance Case Definitions for Current and Historical Conditions.

Example of Change in Case Definitions:

CDC cuts US SARS Case Count in Half

During 2003, there was a worldwide epidemic of SARS. The news release below is titled “CDC cuts US SARS case count in half”. How was CDC able to cut the rates for this disease in half from one reporting period to the next? Read the news release....

July 17, 2003 (CIDRAP News) – Because of a change in the case definition for SARS (severe acute respiratory syndrome), the United States has had only half as many suspected and probable cases of the illness as previously reported, federal health officials said today.
The Centers for Disease Control and Prevention (CDC) said the total case count is now 211 instead of 418, a 49.5% reduction. The official tally now is 175 suspected and 36 probable cases, down from 344 suspected and 74 probable cases.

The change is a result of excluding all cases in which convalescent blood samples—those collected more than 21 days after illness onset — tested negative for the SARS coronavirus, the CDC said in a news release. "Exclusion of these cases with negative convalescent serum provides a more accurate accounting of the epidemic in the U.S.," the agency said.

The Council of State and Territorial Epidemiologists recommended changing the SARS case definition to exclude cases with negative convalescent serum tests. The recommendation is based on evidence that 95% of SARS patients mount a detectable antibody response in the convalescent phase, the CDC said.
The revised case definition and case count are detailed in the Jul 18 issue of Morbidity and Mortality Weekly Report, published online today.

"Serologic testing results suggest that a small proportion of persons who had illness consistent with the clinical and epidemiologic criteria for a U.S. case of suspect or probable SARS actually had SARS," the article states. "The case definition captures an array of respiratory illnesses that cannot be easily distinguished from SARS until laboratory testing results for SARS and other agents are performed." The sensitive case definition allowed for rapid investigation of possible cases and public health steps to prevent spread of the disease, the report adds.

So what happened? The CDC changed the case definition once lab results were available. With the new definition, the 21-day serum had to test positive for the SARS coronavirus or it was no longer considered a SARS case. 418 cases were diminished to 211 cases just by changing the case definition. The case definition for a disease has a substantial impact.

Importance of clear case definitions

As you proceed through this course it will be important that you provide specific case definitions for diseases for which you are conducting an epidemiologic investigation.. You may use a medical reference or ICD codes that classify diseases. The ICD (International Classification of Disease), is an international organization that meets to assign three or four-digit codes to every possible cause of death. These are very specific.

Physicians or medical investigators may belong to professional organizations or societies that define cases somewhat differently than the CDC. It is important to be explicit about the case definition used in a study and to what else it is comparable. Below are further sources of case definitions that may be helpful to you in epidemiologic investigations for this course.

Levels of Certainty in Case Assignment

Cases of disease can be categorized as follows:

  1. Clinically compatible case
    A clinical syndrome is generally compatible with the disease, as described in the clinical description. A general clinical impression is that this is a case of disease.
  2. Confirmed case
    A case that is classified as confirmed for reporting purposes. The case meets established criteria.
  3. Epidemiologically linked case

    A case in which…

    1. the patient has had contact with one or more persons who either have/had the disease or have been exposed to a point source of infection (i.e., a single source of infection, such as an event leading to a foodborne disease outbreak, to which all confirmed case-patients were exposed) and
    2. transmission of the agent by the usual modes of transmission is plausible. A case may be considered epidemiologically linked to a laboratory-confirmed case if at least one case in the chain of transmission is laboratory confirmed
  4. Laboratory-confirmed case
    A case that is confirmed by one or more of the laboratory methods listed in the case definition under laboratory criteria for diagnosis. Although other laboratory methods can be used in clinical diagnosis, only those listed are accepted as laboratory confirmation for national reporting purposes
  5. Probable case
    A case that is classified as probable for reporting purposes. Supportive or presumptive laboratory results: specified laboratory results that are consistent with the diagnosis, yet do not meet the criteria for laboratory confirmation
  6. Suspected case
    A case that has a lower certainty; is classified as suspected for reporting purposes
  7. Clinically compatible case
    A clinical syndrome is generally compatible with the disease, as described in the clinical description. A general clinical impression is that this is a case of disease.

Each type has utility in different settings. To investigate a highly infectious, transmissible, or serious and deadly disease, casting a broad net will capture all suspected and probable cases. On the other hand, if declaring an individual to be a 'case' is likely to result in imposing severe restrictions, such as closing schools or removing a product from the market, the case definition should be more stringent. For example, a series of suspected cases of disease would not be sufficient to support a product recall. The epidemiologist would prefer to have a confirmed case or a laboratory-confirmed case to justify such action.

Sources of Case Definitions

Check out these nationally recognized sources of case definitions

5.4 - Examples

5.4 - Examples

Case- Control Study Example

Source: Obesity is associated with decreased risk of microscopic colitis in women



Microscopic colitis is a leading cause of diarrhea in the older adults. There is limited information about risk factors. We hypothesized that obesity would be associated with microscopic colitis.


To examine the association between obesity and microscopic colitis in men and women undergoing colonoscopy.


We conducted a case-control study at the University of North Carolina Hospitals. We identified and enrolled men and women referred for elective, outpatient colonoscopy for chronic diarrhea. We excluded patients with a past diagnosis of Crohn’s disease or ulcerative colitis. A research pathologist reviewed biopsies on every patient and classified them as microscopic colitis cases or non-microscopic colitis controls. Patients provided information on body weight, height, and exposure to medications via structured interviews or Internet-based forms. The analysis included 110 patients with microscopic colitis (cases) and 252 non-microscopic colitis controls. Multivariable analyses were performed using logistic regression to estimate odds ratios and 95% confidence intervals.


Cases were older and more likely than controls to be white race. Study subjects were well educated, but cases were better educated than controls. Cases with microscopic colitis had lower body mass index than controls and reported more weight loss after the onset of diarrhea. Compared to patients who were normal or under-weight, obese (BMI > 30 kg/m2) patients were substantially less likely to have microscopic colitis after adjusting for age and education, adjusted OR (aOR) 0.35, 95% confidence interval (CI) 0.18-0.66). When stratified by sex, the association was limited to obese women, aOR 0.21, 95%CI: 0.10-0.45. Patients with microscopic colitis were more likely to report weight loss after the onset of diarrhea. After stratifying by weight loss, there remained a strong inverse association between obesity and microscopic colitis, aOR 0.33, 95%CI: 0.10 – 1.11 among the patients who did not lose weight. Ever use of birth control pills was associated with lower risk of microscopic colitis after adjusting for age, education, and BMI, aOR 0.38, 95%CI: 0.17-0.84.


Compared to controls also seen for diarrhea, microscopic colitis cases were less likely to be obese. Mechanisms are unknown but could involve hormonal effects of obesity or the gut microbiome.

Case-Crossover Study Example

Source: Valent F, Brusaferro S, Barbone F. A case-crossover study of sleep and childhood injury. Pediatrics 2001;107; E23. in Woodward M. Epidemiology: Study Design and Data Analysis. 2nd Ed. London: Chapman and Hall. 2005.

In this Italian case-crossover study of sleep disturbance and injury amongst children (Valent et al., 2001), each child was asked about her or his sleep in the 24 hours before the injury occurred (the case window) and in the 24 hours before that (the control window). Amongst 181 boys, 40 had less than 10 hours sleep on both the days concerned; 111 had less than 10 hours sleep on neither day; 21 had less than 10 hours sleep only on the day before the injury; and 9 had less than 10 hours sleep only on the penultimate day before the injury. The odds ratio (95% confidence interval) for injury, comparing days without and with 10 hours or more sleep, is 2.33 (95% confidence interval; 1.02, 5.79).

5.5 - Analysis

5.5 - Analysis

Analytic Methods for Non-Matched Case-Control Studies

With case-control studies, we essentially work down the columns of the 2 × 2 table. Cases are identified first, then controls. The investigator then determines whether cases and controls were exposed or not exposed to the risk factor.

2 × 2 Table for Non-Matched Case-Control Data:
Category Case
Total Exposure
Exposed A B TotalExposed
Not Exposed C D TotalNotExposed
Total TotalCases TotalControls Total

We calculate the odds of exposure among cases (A/C) and the odds of exposure among controls (B/D). The odds ratio is then (A/C)/(B/D), which simplifies after cross-multiplication to (A*D)/(B*C).

For case-control studies, since the ratio of cases to controls is not necessarily representative of the ratio in the population, the odds ratio must be used as the summary measure.  The relative risk is not an accurate measure in this type of study.

Analytic methods for non-matched case-control studies include:

  • Chi-square 2 × 2 analysis;
  • Mantel-Hanszel statistic (This test takes into account the possibility that there are different effects for the different strata (e.g., effect modification))
  • Fisher’s Exact test (This test is used if the expected cell size is <5)
  • Unconditional logistic regression (The method is used to simultaneously adjust for multiple confounders; a multivariable analysis).


For the obesity and microscopic colitis example (Obesity is associated with decreased risk of microscopic colitis in women), the data from table 2 can be used to construct this 2x2 table for the comparison of microscopic colitis between those with low and high BMI.

Comparison of Microscopic Colitis between those with Low and High BMI
Category Case
Total Exposure
(BMI >=30)
22 105 TotalExposed (127)
Not Exposed
(BMI < 25)
50 73 TotalNotExposed (123)
Total TotalCases
Total (250)

OR = (22*73)/(50*105) = 0.31
As we see in the text: As shown in Table Table2, the risk for microscopic colitis was lower for … BMI > 30 kg/m2 (OR 0.31, 95%CI: 0.17-0.55) compared to under- or healthy weight (BMI < 25 kg/m2) as the reference.

To review, for a simple non-matched case-control study, you find a case, then determine whether the person is exposed or not. Find a control; determine their exposure status.

Analytic Methods of Matched Case-Control Studies

In an analysis of a matched study design, only discordant pairs are used. A discordant pair occurs when the exposure status of the case is different from the exposure status of the control. The most commonly used analytic method for matched case-control studies is conditional logistic regression, conditioned upon the matching.

The matched case-control study has linked a case to a control based on the matching of one or more variables. The summary table will differ for a matched case-control study

2 × 2 Table for Matched Case-Control Data:
Controls Cases
(Concordant Pair)
(Discordant Pair)
(Discordant Pair)
(Concordant Pair)
TotalNot ExposedControls
Total TotalExposedCases TotalNot ExposedCases Total


Let's look at an example. Suppose we plan to match cases to controls by gender and age (+/- 5 years). We first identify the following case:


Male, 45 years of age (Patient 1);
Exposure status: Exposed
If this was a non-matched study, the case would be counted in cell A in the non matched 2x2 table because he is exposed. However, in the age- and gender-matched case-control study we must also find a male control within five years of age. Searching in the appropriate control population, we locate the following control:


Male 48 years of age (Person 47);
Exposure status: Exposed
If Person 47 were counted in an unmatched study, he would belong in cell B of the preceding table. In a matched case-control study, however, we are interested in results for the matched pair. The data from Patient 1 and Person 47 are linked for the duration of the study. The appropriate table for the matched study is depicted below. Where do Patient 1 and Person 47 belong?

Patient 1 is a case and he is exposed so he fits into either cell A or cell C. Based upon his control's status we determine which cell is the correct placement for this pair. Patient 1's control is exposed, therefore Patient 1 and Person 47 fit into cell A as a pair. This is a concordant pair because both are exposed. Concordancy is based upon exposure status. In a matched case-control study, the cell counts represent pairs, not individuals. In the statistical analysis, only the discordant pairs are important. Cells B and C contribute to the odds ratio in a matched design. Cells A and D do not contribute to the odds ratio. If the risk for disease is increased due to exposure, C will be greater than B. The odds ratio is then (B/C).

Comparing Matched and Non-Matched Case-Control Studies

  Stop and Think!

Come up with an answer to the questions and then click on the button below to reveal the answer.

  1. Can you think of more than one reason why a matched case-control study could take longer to complete than an unmatched study?

    First, you must identify matched controls, sometimes more than one per case. Second, since only the discordant pairs contribute to the statistical analysis, achieving a desired statistical power depends on obtaining a particular number of discordant pairs.
  2. Why bother with matching if it means a longer case-control study?

    When performing statistical analysis, the matched variables are not included in the statistical model.
    (In a cohort study, confounding is dealt with by including the terms in the model to adjust for their effects. In a matched case-control study, the adjustment for this confounding can be made through the matching.)

5.6 - Lesson 5 Summary

5.6 - Lesson 5 Summary

Lesson 5 Summary

Case-control studies are one of the two main types of observational study designs in epidemiology. This design is desirable when exposure data are difficult to obtain, the disease is rare or has a long induction period, or when the underlying population is dynamic. In a case-control study, first patients are selected based on their disease (outcome) status and classified as cases or controls. Then investigators obtain and compare the exposure histories between groups. A case-crossover study is a specific type of case-control study where patients serve as their own controls. This is appropriate for transient exposures, where a patient contributes time as both a case and a control.

Has Tooltip/Popover
 Toggleable Visibility