Lesson 7: Etiologic Studies (2) Outbreak Investigation; Advanced Case-Control Design

Lesson 7: Etiologic Studies (2) Outbreak Investigation; Advanced Case-Control Design


In this course, we have often assumed that investigators have knowledge of a potentially harmful exposure coincidentally with or prior to observing the disease or illness. In other situations, the first indication of harmful exposure is a report of a potential outbreak of disease or illness. Increased numbers of cases of disease or illness may necessitate an outbreak investigation. Questions to be answered in an outbreak investigation include the following:

  1. Are there an unusual number of adverse health outcomes in this community?
  2. If so, how many? Is the number increasing, decreasing, or stable?
  3. What type of exposure may have caused the increase?
  4. What is the anticipated future course and spread of this outbreak?

Basic case-control studies are very useful when investigating an outbreak of disease. Last week we studied the basic case-control study. This week we will see some more advanced case-control designs, but these are rarely used in outbreak investigations because they take longer to implement and are more complex.

Note! The material for outbreak investigation has been adapted from CDC source materials

Let's get started!


Upon completion of this lesson, you should be able to:

  • Use common terms in outbreak investigation appropriately
  • Develop an outbreak investigation plan
  • Describe a potential outbreak with regard to person, place, and time.
  • Construct and interpret an epi-curve to describe the course of an outbreak
  • Differentiate between a nested case-control study, a case-cohort design, and a case-crossover design

7.1 - Investigating a Potential Outbreak

7.1 - Investigating a Potential Outbreak

When an increase in the number of cases of a disease is reported, a speedy response is critical. At the same time, it is also of utmost importance to end up with an answer that will appropriately protect public health and safety. A systematic approach to outbreak investigation helps assure timely and accurate answers:

  1. Prepare for fieldwork
  2. Establish the existence of an outbreak
  3. Verify the diagnosis
  4. Define and identify cases
  5. Measure the frequency of adverse outcome and describe the data in terms of time, place, and person
  6. Develop hypotheses
  7. Evaluate hypotheses
  8. Refine hypotheses and carry out additional studies
  9. Implement control and prevention measures
  10. Communicate findings

Preparing for a possible outbreak investigation

Upon receiving initial reports of a possible outbreak, investigators should review the epidemiology, risk factors, clinical signs and symptoms, and prevention and control of adverse health outcomes similar to the reported cases. Lines of communication between investigators and health care providers, policymakers, and the press should be reviewed and understood by each involved party. Communication must be clear, open, and productive.

Read this example of an outbreak investigation and consider the steps of the investigation:

Multistate outbreak of Salmonella infections associated with frozen pot pies in the United States 2007. JAMA. 2009;301(3):264-266 (doi:10.1001/jama.301.3.264)

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

In the Salmonella example, what steps were followed in the initial evaluation of the 4 reported cases of Salmonella?

The PA (state) Department of Health initiated an investigation after 4 cases were reported, utilizing PulseNet laboratory capabilities. PulseNet was created to assist epidemiologists in separating outbreak-associated cases from sporadic cases, to rapidly identify sources of outbreaks and to ensure timely and effective communication among public health laboratories. Over several months, state and local health departments collaborated with the CDC in an unsuccessful attempt to locate the source of the Salmonella. Initially, steps 1-5 were covered.

Confirming the possibility of an outbreak

Once prepared for investigation, the existence of the outbreak should be confirmed. Iterative examination of the evidence may be required to support or refute the existance of an outbreak.

Determination of an outbreak requires that more cases of a specific adverse health outcome have occurred than would be expected for the surveilled population in a specific geographic area or time period. First, the investigator must determine whether the reported initial cases are worthy of further investigation. A question to be answered is whether the cases could share a common cause. For example, clusters of chronic obstructive lung disease are sometimes suspected to be an outbreak, but upon closer examination, investigators often learn the cases are not a single type of disease. Similarly, reported clusters of cancer may turn out to be different types of cancer or cancers that do not share common causes. Increased surveillance, resulting in increased probability that a disease will be diagnosed, may lead to a suspected outbreak and an investigation.

Next the investigator must determine the number expected in the population. We can determine the number expected from different sources.

  • For a notifiable disease (one that, by law, must be reported), health department surveillance records are available.
  • For other diseases and conditions, look for local sources such as hospital discharge records, death (mortality) records, and cancer or birth defect registries.
  • If local data are not available, estimate using data from neighboring states or national data
  • An epidemiologist might survey clinicians asking whether how many cases of the disease they have seen recently.

There are mathematical and statistical methods that can be used to assess the significance of an increase in the number of cases because an increase does not always indicate that an outbreak has occurred or is occurring. Changes in reporting procedures, changes in the case definition, increased awareness, or changes in diagnostic procedures can also lead to increased detection of cases. In some areas, population fluctuates with the season (e.g. college towns, areas utilizing seasonal workers, etc.). Additional considerations include the severity of the illness, how contagious the disease is, political considerations and certainly, available resources.

7.1.1 - Verifying the Diagnosis

7.1.1 - Verifying the Diagnosis

It is also important early in an investigation to identify as accurately as possible the specific nature of the disease. You want to verify that both clinical and laboratory diagnoses are correct. Review the clinical findings and lab processes and results for the cases. If there is a need for specialized laboratory analysis, collect the required specimens and materials. It may be necessary to sequence the genome of the bacteria. (for example, in a hospital-based outbreak of antibiotic-resistant disease: Kupferschmidt, Kai. "Genome Study Helps Contain MRSA Outbreak--And Breeds New Questions. Science 23 Nov 2012 )

It is also often a good idea for a qualified clinician to visit some of the cases to confirm the initial reports. Try to gain a better understanding of the disease and those affected by it. Ask: What were their exposures before becoming ill? What do they think caused their illness? Do they know anyone else with the disease? Do they have anything in common with others who have the disease? Conversations with patients can be quite helpful in generating hypotheses about the cause, source, and spread of disease.

Establishing the Case Definition

Specifying the definition of a case of an adverse health outcome is one of the most important steps for the successful investigation of a potential outbreak. Since the case definition is the standard for determining which individuals are cases and which are not, a case definition should be established early in the investigation.

As you learned in Lesson 2, cases can be defined by laboratory tests, clinical signs and symptoms, and/or a physician's diagnosis. They can also be defined as being epidemiologically-linked, or even just exposed. A case definition usually includes four components:

  1. clinical information about the disease,
  2. characteristics of the people who are affected,
  3. information about the location or place, and
  4. specification of the time during which the outbreak occurred.

Cases can be classified according to the level of confidence the investigator holds regarding the individual's case status, such as confirmed, probable, or suspect. Under certain special circumstances, individuals who have only been exposed to a contagious or contaminated agent and who remain free of symptoms may be included as a case.

The initial case definition may be quite broad with a risk of mistakenly identifying as cases some individuals without the condition. Ideally, the case definition catches all cases without picking up 'false positives' (when the case definition is met, but the person actually does not have the disease). A broad case definition can be helpful early in the outbreak when there is a goal of reducing the spread of the disease. Using a broad case definition may help prevent the investigator from having to go back to clinics for additional data, illustrating the field epidemiology axiom: "Get it while you can."

As the disease spreads, the case definition may be refined, dropping 'possible' cases. Did you notice the case definitions in the 2009 H1N1 pandemic changing as the numbers of affected communities increased and the goals of health agencies were modified?

Using case definitions already established by prominent scientific or medical organizations or agencies is recommended. If no case definitions are recommended by professional organizations, investigators may use definitions published in the scientific literature. Using an established case definition yields more legitimate comparisons of cases and outbreaks between regions or over different time periods. We have already explored some sources of standard case definitions in Lesson 2.

Measuring the Frequency of Adverse Health Outcomes

In all outbreak situations, we will count the number of cases of the outcome. In many situations, we desire more information. For example, knowing the number of persons in the at-risk population would allow calculation of the proportion of those at-risk who developed the condition or disease. We may also wish to incorporate a time period over which the at-risk developed disease to produce a rate or risk of disease.

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

What was the case definition used In the Salmonella example? How was the frequency measured?

"An outbreak case was defined as infection with a Salmonella strain with the specific outbreak PFGE pattern and illness onset during January 1–December 31, 2007. During this period, a total of 401 outbreak cases from 41 states were identified." (count)

When identifying cases, use as many sources as possible, starting with health care facilities where the diagnosis is likely to be made. If an outbreak affects a population in a restricted setting, such as a school or worksite, you may decide to survey and/or collect samples from the entire population, particularly if asymptomatic cases are expected. Ask cases if they know anyone else with the disease. (Does this remind you of the survey in San Pablo from Lesson 5?)

Collect the following types of information on each case:

  • Limited identifying information: to allow the investigators to contact patients to ask additional questions and to notify them of laboratory results and the outcome of the investigation. Addresses allow mapping the outbreak.
  • Demographic and risk factor information: used to characterize the population at risk.
  • Clinical information: allows verification of case status. The date of onset allows you to create a graph of the outbreak. Supplementary clinical information may include whether the person was hospitalized or died.

7.1.2 - Orient in Terms of Time, Place, and Person

7.1.2 - Orient in Terms of Time, Place, and Person

Describe what has happened to the population under study. Show the trend over time for the potential outbreak, its geographic extent (place), and the populations (people) affected by the disease. Begin to assess the outbreak in light of what is known about the disease (e.g., the usual source, mode of transmission, risk factors, and populations affected) and start developing causal hypotheses. Familiarizing yourself with the data also helps sort out which information is reliable and informative (e.g., the same unusual exposure reported by many of the people affected) vs. what may not be as reliable (e.g., many missing or "don't know" responses to a particular question).

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

How did the early exploration of the data in the Salmonella example lead to finding the source of the outbreak?

When the MN DOH reviewed data from interviewing four cases, it was noted that all 4 had consumed Banquet pot pies during the week prior to illness. This information was passed on to the outbreak team and specific questions about pot pie consumption were then included in the case-control study.

Evaluate Hypotheses.

Continue to update the descriptive epidemiology, looking for errors and clues, as data are collected. Keep the investigation moving quickly and headed in the right direction.

Characterizing by time: Constructing an Epi-Curve

An epidemic curve, frequently referred to as an 'epi-curve', is used to examine and characterize the occurrence of a possible outbreak. By constructing and examining an accurate epi-curve, an investigator can consider questions such as:

  • Is there an outbreak? If so, when did the outbreak begin?
  • Has the outbreak peaked? If so, when was the peak?
  • What might be the source of the exposure? Is there one source or multiple sources for exposure of cases? Is person-to-person transmission occurring?
  • Have the attempts to control the outbreak coincided with a decrease in the occurrence of the disease?

An epi-curve is a histogram with the number of cases of the adverse health outcome on the y-axis (ordinate) and dates of onset of the outcome on the x-axis (abscissa). Dates of onset may be grouped by days, weeks, or months, depending on the nature of the potential outbreak. A typical time period used is 1/4 to 1/3 the incubation period for the disease. If the incubation or lag time from exposure to outcome is unknown, it is valuable to experiment with different lengths of time.

A typical epi-curve is a simple chart with one series of data, the onset of cases. In other situations, several layers of data are displayed on the curve. For example, the investigator may want to examine the date of onset in more than one location (e.g. 2 or more cities, states or countries) or in different groups of people (e.g. stratified by age or race).

Another variation of the epi-curve is stacking the bars in order to show different characteristics of the cases. For example, you may decide to separate confirmed cases from suspect cases, using stacked bars to assess whether an outbreak is truly occurring.

Interpreting an Epi-Curve

The following example depicts the first outbreak of Legionnaires’ disease, in Philadelphia, Pennsylvania, in 1976.

Example of a graph showing an epidemic curve.

The first consideration is the overall shape of the curve which is determined by the pattern of the outbreak (common source or person-to-person transmission). The shape also indicates the period of time over which susceptible people are exposed and the minimum, average and maximum incubation periods for the disease.

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

Consider the figure above describing the first outbreak of Legionnaire's disease among the conventioneers. The curve has a steep upward slope and a gradual downward slope. Do you think this was a single source (point source) epidemic or person-to-person transmission? What is the incubation period?

Single source. Conventioneers were exposed to the same source over a relatively brief period. Any sudden rise in the number of cases suggests sudden exposure to a common source. In a point source epidemic, all the cases occur within one incubation period. The graph supports an incubation period for Legionnaire's that is less than 2 weeks.

If the duration of exposure is prolonged, the epidemic is called a "continuous common source epidemic," and the epidemic curve will have a plateau instead of a peak. Person-to-person spread (a "propagated" epidemic) should have a series of progressively taller peaks one incubation period apart.

Cases that stand apart ("outliers") provide valuable information. An early case can represent a background (unrelated) case, a source of the epidemic, or a person who was exposed earlier than others. Similarly, late cases may be unrelated to the outbreak, may have especially long incubation periods, may indicate exposure later than most of the people affected, or may be secondary cases (the person who becomes ill after being exposed to someone who was part of the initial outbreak). Examine any outliers that are part of the outbreak carefully because they may point directly to the source. For example, a prep chef could be the first case of strep in an epidemic among party-goers eating food prepared by this person.

In a point-source epidemic of a disease with a known incubation period, the epidemic curve can also identify the likely period of exposure.

Go through this exercise from CDC to learn more about constructing and interpreting epi-curves.

Characterizing by place

A simple technique for looking at geographic patterns is to plot on a 'spot map' the locations where the affected people live, work, or may have been exposed. A map of cases in a community may show clusters or patterns that reflect water supplies, wind currents, or proximity to a restaurant or grocery store. A classic example is John Snow's detection of the Broad St. water pump as the source of a cholera epidemic. On a spot map of a hospital, nursing home, or another residential facility, clustering may indicate either a primary source or person-to-person spread. The scattering of cases throughout a facility is more consistent with a common source such as a dining hall.

If the size of the overall population varies among the areas being compared, the spot map with the number of cases can be misleading. Indicating the proportion affected or the attack rate for each area would be a better approach.

Characterizing by person

Define the populations at risk for the disease by characterizing an outbreak by personal characteristics such as age, race, sex, medical status, etc, and/or by exposures (e.g., occupation, leisure activities, use of medications, tobacco, drugs, etc.). Age and sex are characteristics often strongly related to exposure and risk; thus these factors are often assessed first. Other factors to be assessed are those possibly related to susceptibility to the disease and to opportunities for exposure for the disease being investigated and in the setting of the outbreak.

After characterizing an outbreak by time, place, and person, summarize and consider whether your initial hypotheses explain the outbreak or whether new hypotheses must be developed.

7.1.3 - Carrying Out Additional Studies

7.1.3 - Carrying Out Additional Studies

Additional epidemiological studies

When analytic epidemiological studies do not confirm your hypotheses, you need to reconsider your hypotheses and look for new vehicles or modes of transmission. This is the time to meet with case patients to look for common links and to visit their homes to look at the products on their shelves.

An investigation of an outbreak of Salmonella Muenchen in Ohio illustrates this point. A case-control study failed to turn up a food source as a common vehicle. Interestingly, people 15 to 35 years of age lived in all of the households with cases, but in only 41% of control households. This difference caused the investigators to consider vehicles of transmission to which young adults might be exposed. By asking about drug use in a second case-control study, the investigators found that illegal use of marijuana was the likely vehicle. Laboratory analysts subsequently isolated the outbreak strain of S. Muenchen from several samples of marijuana provided by case patients.

Even when the analytic study identifies an association between exposure and a disease, you often will need to refine your hypotheses. Sometimes you will need to obtain more specific exposure histories or a more specific control group. For example, in a large community outbreak of botulism in Illinois, investigators used three sequential case-control studies to identify the vehicle. In the first study, investigators compared exposures of case patients and controls from the general public and implicated a restaurant. In a second study, they compared the menu items eaten by the case patients with those eaten by healthy restaurant patrons and identified a specific menu item, a meat and cheese sandwich. In a third study, appeals were broadcast over the radio to identify healthy restaurant patrons who had eaten the sandwich. It turned out that controls were less likely than case patients to have eaten the onions that came with the sandwich. Type A Clostridium botulinum was then identified from a pan of leftover sautéed onions used only to make that particular sandwich.

When an outbreak occurs, whether it is routine or unusual, you should consider what questions remain unanswered about the disease and what kind of study you might use in the particular setting to answer some of these questions. The circumstances may allow you to learn more about the disease, its modes of transmission, the characteristics of the agent, and host factors.

Laboratory and environmental studies

While epidemiology can implicate vehicles and guide appropriate public health action, laboratory evidence can clinch the findings. The laboratory was essential in the outbreak of salmonellosis linked to the use of contaminated marijuana. The investigation of the outbreak of Legionnaires' disease in Philadelphia mentioned earlier was not considered complete until the new organism was isolated in the laboratory over 6 months after the outbreak actually had occurred. Environmental studies often help explain why an outbreak occurred and may be very important in some settings. For example, in an investigation of an outbreak of shigellosis among swimmers in the Mississippi River, a local sewage plant was identified as the cause of the outbreak.

Implementing Control and Prevention Measures

In a real investigation, control and prevention measures should be implemented as soon as possible. Control measures should be aimed at specific links in the chain of infection: the agent, the source, or the reservoir. For example, an outbreak might be controlled by destroying contaminated foods, sterilizing contaminated water, destroying mosquito breeding sites, or requiring an infectious food handler to stay away from work until he or she is well.

In other situations, you might direct control measures at interrupting transmission or exposure. For example, to limit the airborne spread of an infectious agent among residents of a nursing home, you could use the method of "cohorting" by putting infected people together in a separate area to prevent exposure to others. You could instruct people wishing to reduce their risk of acquiring Lyme disease to avoid wooded areas or to wear insect repellent and protective clothing. Finally, in some outbreaks, you would direct control measures at reducing susceptibility. Two such examples are immunization against rubella and malaria chemoprophylaxis (prevention by taking antimalarial medications) for travelers.

7.1.4 - Developing and Evaluating Hypotheses

7.1.4 - Developing and Evaluating Hypotheses

Developing Hypotheses

After interviewing affected individuals, gathering data to characterize the outbreak by time, place, and person, and consulting with other health officials, a disease detective will have more focused hypotheses about the source of the disease, its mode of transmission, and the exposures which cause the disease. Hypotheses should be stated in a manner that can be tested.

Hypotheses are developed in a variety of ways. First, consider the known epidemiology for the disease: What is the agent's usual reservoir? How is it usually transmitted? What are the known risk factors? Consider all the 'usual suspects.'

Open-ended conversations with those who fell ill or even visiting homes to look for clues in refrigerators and shelves can be helpful. If the epidemic curve points to a short period of exposure, ask what events occurred around that time. If people living in a particular area have the highest attack rates, or if some groups with a particular age, sex, or other personal characteristics are at greatest risk, ask "why?". Such questions about the data should lead to hypotheses that can be tested.

Evaluating Hypotheses

There are two approaches to evaluating hypotheses: comparison of the hypotheses with the established facts and analytic epidemiology, which allows testing hypotheses.

A comparison with established facts is useful when the evidence is so strong that the hypothesis does not need to be tested. A 1991 investigation of an outbreak of vitamin D intoxication in Massachusetts is a good example. All of the people affected drank milk delivered to their homes by a local dairy. Investigators hypothesized that the dairy was the source, and the milk was the vehicle of excess vitamin D. When they visited the dairy, they quickly recognized that far more than the recommended dose of vitamin D was inadvertently being added to the milk. No further analysis was necessary.

Analytic epidemiology is used when the cause is less clear. Hypotheses are tested, using a comparison group to quantify relationships between various exposures and disease. Case-control, occasionally cohort studies, are useful for this purpose.

Case-control studies

As you recall from last week's lesson, in a case-control study case-patients and controls are asked about their exposures. An odds ratio is calculated to quantify the relationship between exposure and disease.

In general, the more case patients (and controls) you have, the easier it is to find an association. Often, however, an outbreak is small. For example, 4 or 5 cases may constitute an outbreak. An adequate number of potential controls is more easily located. In an outbreak of 50 or more cases, 1 control per case-patient will usually suffice. In smaller outbreaks, you might use 2, 3, or 4 controls per case-patient. More than 4 controls per case-patient are rarely worth the effort because the power of the study does not increase much when you have more than 4 controls per case-patient (we will talk more on power and sample size in epidemiologic studies later in this course!).

Testing statistical significance

The final step in testing a hypothesis is to determine how likely it is that the study results could have occurred by chance alone. Is the exposure the study results suggest as the source of the outbreak related to the disease after all? The significance of the odds ratio can be assessed with a chi-square test. We will also discuss statistical tests that control for many possible factors later in the course.

Cohort studies

If the outbreak occurs in a small, well-defined population a cohort study may be possible. For example, if an outbreak of gastroenteritis occurs among people who attended a particular social function, such as a banquet, and a complete list of guests is available, it is possible to ask each attendee the same set of questions about potential exposures and whether he or she had become ill with gastroenteritis.

After collecting this information from each guest, an attack rate can be calculated for people who ate a particular item (were exposed) and an attack rate for those who did not eat that item (were not exposed). For the exposed group, the attack rate is found by dividing the number of people who ate the item and became ill by the total number of people who ate that item. For those who were not exposed, the attack rate is found by dividing the number of people who did not eat the item but still became ill by the total number of people who did not eat that item.

To identify the source of the outbreak from this information, you would look for an item with:

  • high attack rate among those exposed and
  • a low attack rate among those not exposed (so the difference or ratio between attack rates for the two exposure groups is high); in addition
  • most of the people who became ill should have consumed the item, so that the exposure could explain most, if not all, of the cases.

We will learn more about cohort studies in Week 9 of this course.

7.1.5 - Communicating Findings

7.1.5 - Communicating Findings

Finally, the findings of the investigation must be communicated to those who need to know. This communication usually takes two forms:

  1. an oral briefing for local health authorities, and
  2. a written report.

The oral briefing should be attended by the local health authorities and people responsible for implementing control and prevention measures. This presentation allows the investigator to describe what was done, what was found, and what should be done. Findings are presented in a scientifically objective fashion. Be prepared to defend your conclusions and recommendations.

A written report that follows the usual scientific format of introduction, background, methods, results, discussion, and recommendations will also be produced. By formally presenting recommendations, the report provides a blueprint for action. It also serves as a record of performance, a document for potential legal issues, and a reference if the health department encounters a similar situation in the future. Finally, a report that finds its way into the public health literature serves the broader purpose of contributing to the scientific knowledge base of epidemiology and public health.

You can learn about past and ongoing Salmonella outbreaks that involve CDC. Outbreaks are more frequent than one might think!

7.2 - Advanced Case-Control Designs

7.2 - Advanced Case-Control Designs

Nested Case-Control Study:

This is a case-control study within a cohort study. At the beginning of the cohort study \((t_0)\), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time \(t_1\). The control group is selected from the risk set (cohort members who do not meet the case definition at \(t_1\).) Typically, the nested case-control study is less than 20% of the parent cohort.

Advantages of nested case-control

  • Efficient – not all members of the parent cohort require diagnostic testing
  • Flexible – allows testing of hypotheses not anticipated when the cohort was drawn (at \(t_0\))
  • Reduces selection bias – cases and controls sampled from the same population
  • Reduces information bias – risk factor exposure can be assessed with investigator blind to case status


  • Reduces power (from parent cohort) because of reduced sample size by 1/(c+1), where c = number of controls per case

Nested case-control studies can be matched, not matched, or counter-matched.

Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect from confounding variables. A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.

Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study. In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!

7.2.1 - Case-Cohort Study Design

7.2.1 - Case-Cohort Study Design

A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time \(t_1\), after baseline. In a case-cohort study, the cohort members were assessed for risk factors at any time prior to \(t_1\). Non-cases are randomly selected from the parent cohort, forming a subcohort. No matching is performed.

Advantages of Case-Cohort Study:

Similar to nested case-control study design:

  • Efficient– not all members of the parent cohort require diagnostic testing
  • Flexible– allows testing hypotheses not anticipated when the cohort was drawn \((t_0)\)
  • Reduces selection bias – cases and noncases sampled from the same population
  • Reduced information bias – risk factor exposure can be assessed with investigator blind to case status

Other advantages, as compared to nested case-control study design:

  • The subcohort can be used to study multiple outcomes
  • Risk can be measured at any time up to \(t_1\) (e.g. elapsed time from a variable event, such as menopause, birth)
  • Subcohort can be used to calculate person-time risk

Disadvantages of Case-Cohort Study:

As compared to nested case-control study design:

  • Increased potential for information bias because
    • subcohort may have been established after \(t_0\)
    • exposure information collected at different times (e.g. potential for sample deterioration)

Statistical Analysis for Case-Cohort Study:

Weighted Cox proportional hazards regression model (we will look at proportional hazards regression later in this course)

7.2.2 - Case-Crossover Study Design

7.2.2 - Case-Crossover Study Design

This design is useful when the risk factor/exposure is transient. For example, cell phone use or sleep disturbances are transitory occurrences. Each case serves as its own control, i.e the study is self-matched. For each person, there is a 'case window', the period of time during which the person was a case, and a 'control window', a period of time associated with not being a case. Risk exposure during the case window is compared to risk exposure during the control window.

Advantages of Case-crossover

  • Efficient – self-matching
  • Efficient – select only cases
  • Can use multiple control windows for one case window

Disadvantages of Case-crossover

  • Information bias – the inaccurate recall of exposure during control window (can be overcome by choosing control window to occur after case window)
  • Requires careful selection of time period during which the control window occurs (circumstance associated with the control window should be similar to circumstances associated with case window; e.g., traffic volume)
  • Requires careful selection of the length and timing of the windows (e.g., in an investigation of the risk of cell phone usage on auto accidents, cell phone usage that ceases 30 minutes before an accident is unlikely to be relevant to the accident)

Analysis of Case-crossover

  • Matched case-control analysis

Example of a Case-crossover study

Valent F, Brusaferro S, Barbone F. A case-crossover study of sleep and childhood injury. Pediatrics 2001;107; E23. in Woodward M. Epidemiology: Study Design and Data Analysis. 2nd Ed. London: Chapman and Hall. 2005.

In this Italian case-crossover study of sleep disturbance and injury amongst children (Valent et al., 2001), each child was asked about her or his sleep in the 24 hours before the injury occurred (the case window) and in the 24 hours before that (the control window). Amongst 181 boys, 40 had less than 10 hours sleep on both the days concerned; 111 had less than 10 hours sleep on neither day; 21 had less than 10 hours sleep only on the day before the injury; and 9 had less than 10 hours sleep only on the penultimate day before the injury. The odds ratio (95% confidence interval) for injury, comparing days without and with 10 hours or more sleep, is 2.33 (95% confidence interval; 1.02, 5.79).

You have now completed reading the material for Lesson 7. You are ready to complete the homework.

Has Tooltip/Popover
 Toggleable Visibility