Lesson 4  Comparing Groups In Terms of Disease Occurrence and Frequency
Lesson 4  Comparing Groups In Terms of Disease Occurrence and FrequencyLesson 4 Objectives
 Organize disease frequency data into a 2 x 2 epidemiological table.
 Calculate and describe Risk Ratios and Odds Ratios to compare groups.
 Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
 Recognize situations in which direct or indirect standardization should be considered.
 Given the required data, standardize a rate with a direct and indirect method.
4.1  Example Research Hypotheses & Measurement Calculations
4.1  Example Research Hypotheses & Measurement CalculationsResearch Hypotheses
Suppose our goal is to compare two populations with regard to disease or exposuredisease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.
There are many examples of hypotheses that are comparative in nature, such as:
 High salt intake increases the incidence of heart disease.
→Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.
 The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.
→Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.
 The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.
→Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.
 A high intake of Vitamin C reduces the prevalence of colds.
→Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.
 Dizziness is associated more frequently with therapeutic agent A than with agent B.
→Compare the incidence of dizziness for patients receiving A with those receiving B.
 The administration of presurgical antibiotics decreases the rate of wound infections.
→Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.
Data Organization: 2x2 table
Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities. The researchers were interested in the exposure of air pollution and the outcome of mortality.
Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome. Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those that do not are referred to as noncases.
Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:
Category  Case (Number) 
NonCases (Number) 
Total Exposure (Number) 

Exposed  A  B  Total_{Exposed} 
Not Exposed  C  D  Total_{NotExposed} 
Total  Total_{Cases}  Total_{NonCases}  Total 
For the air pollution cohort study, the following tables can be constructed.
Category  Dead  Alive*  Total 

High Pollution (Ohio)  291  1060  1351 
Low Pollution (Wisconsin)  232  1399  1631 
Total  523  2459  2982 
*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See...
Category  Dead  Alive*  PersonYears 

High Pollution (Ohio)  291  17914  
Low Pollution (Wisconsin)  232  21618  
Total  523  29532 
*For incidence rate, the number of nondiseased (i.e. alive) participants is not necessary. Instead we need personyears for all people who experienced the outcome.
Measures of Disease Frequency
Disease Prevalence [by Exposure Status]
 For Exposed: A/(A+B)
 In our pollution example, this would be 291/1351= 0.215. Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.
 For Not Exposed: C/(C+D)
 In our pollution example, this would be 232/1631=0.142. Thus, 14.2% of the participants from the low pollution city (ie the nonexposed group) died.
Exposure Prevalence [by Disease Status]
 For Cases: A/(A+C)
 In our pollution example, this would be 291/523= 0.556. Thus, 55.6% of the participants who died were from the high pollution city.
 Fon Noncases: B/(B+D)
 In our pollution example, this would be 1060/2459= 0.431. Thus, 43.1% of the participants who did not die were from the high pollution city.
Odds of Disease [by Exposure Status]
 For Exposed: A/B
 In our pollution example, this would be 291/1060. Thus, the odds of dying in the high pollution city are 291:1060  which can be simplified to 1:3.64. (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)
 For NonExposed: C/D
 In our pollution example, this would be 232/1399. Thus, the odds of dying in the low pollution city are 232:1399  which can be simplified to 1:6.03.
Odds of Exposure [by Disease Status]
 For Cases: A/C
 In our pollution example, this would be 291/232= 1.25:1. Thus, the odds of being from the high pollution city are 1.25:1 for those who died.
 For Noncases: B/D
 In our pollution example, this would be 1060/1399= 0.76:1. Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.
There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, personyears were similar across groups, resulting in similar estimates.
4.2  Using Ratios to Compare Two Populations
4.2  Using Ratios to Compare Two PopulationsA ratio may be used to convey the strength of an effect or association between two population groups or the relative 'risk' of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.
 Ratio
 \(\dfrac{\text { Disease Frequency }(\text { Population } A)}{\text { Disease Frequency }(\text { Population } B)}\)
A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson. When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.
Ratio Calculations
Risk Ratios
Cumulative incidence ratio
 More generally can be thought of as:
 Ratio of Disease Incidence= [A/(A+B)] / [C/(C+D)]
 In our pollution example, this would be [291/1351] / [232/1631] = 1.51. Thus, participants from highpollution cities are 1.51 times as likely as those from lowpollution cities to die. This makes sense since we saw that the cumulative incidence of death was about 21% in the high pollution city, and 14% in the lowpollution cities.
Incidence rate ratio
 Follows the same general formula, but instead of comparing incidences, we are comparing incidence rates. So first, we need to calculate the incidence rate in each city.
 High pollution city incidence rate of death = 291 deaths/ 17917 personyears, simplifies to 16.24 deaths per 1000 personyears
 Low pollution city incidence rate of death = 232 deaths/ 21618 person years, simplifies to 10.73 deaths per 1000 person years
 In our pollution example this would be (16.24/1000 personyears) / (10.72/1000 personyears) = 1.51
Odds Ratio
 Exposure Odds Ratio [A/C] / [B/D] = [A*D] / [B*C]
 Disease Odds Ratio [A/B] / [C/D] = [A*D] / [B*C]
 Both simplify to the same OR
 In our pollution example, this would be 291*1399 / 1060*232 = 1.66. Thus, participants from the highpollution cities have 1.66 times higher odds of dying than those from lowpollution cities.
4.3  Using Differences to Compare Two Populations
4.3  Using Differences to Compare Two PopulationsAlternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds. If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.
Differences: Disease Frequency(Population A)  Disease Frequency(Population B)
Difference Calculations
Risk Difference
Cumulative incidence difference
 More generally can be thought of as:
 Difference of Disease Incidence= [A/(A+B)]  [C/(C+D)]
 In our pollution example, this would be [291/1351]  [232/1631] = 21.5%  14.2% = 7.3%. Thus, participants from highpollution cities have a 7.3% higher risk of death than participants from lowpollution cities.
Incidence rate difference
 In our pollution example, this would be (16.24/1000 personyears)  (10.72/1000 personyears) = 5.51/1000 personyears. Thus, there are 5.51 excess deaths per 1000 personyears among those in the high pollution city. Alternatively, the number of deaths could be reduced by 5.51 per 1000 personyears, if the pollution level in the highpollution city was reduced to that of the lowpollution city.
Attributable Proportion among the Total Population (AP_{t})
(Also known as population attributable risk (PAR))
The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.
General Formula
 \(\mathrm{AP}_{\mathrm{t}}=\dfrac{\text { Risk(study population)Risk(unexposed group) }}{\text { Risk(study population) }}\)
Incidence rate AP_{t}
 First, we need to calculate the incidence rate in the entire population. This would be the sum of all the deaths (1430) divided by the sum of all the personyears (111076) = 12.87 deaths per 1000 personyears.
 In our example, this would be [(12.87/1000 personyears  10.73/1000 personyears)] / (12.87/1000 personyears) = 0.166. Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.
4.4  Standardization
4.4  StandardizationWhen comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.
Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population atrisk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.
 A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population atrisk, adjusted to remove the effect of potential confounder [e.g., age]).
 Agespecific rates (i.e.., x cases in a specific age group/(population atrisk in same age group) are also useful in summarizing the health status of a population.
Types of Standardization
There are two different approaches to standardizing a rate:
Direct standardization
Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted groupspecific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of agespecific rates.
\(I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}\)
where I_{i }is a groupspecific rate and \(\sum W_{i}=1\)
The necessary data for direct standardization is the groupspecific disease rates for the study population and the population distribution from the standard population.
Stop and Think!
Do you understand how direct adjustment is a weighted average of agespecific rates?
Indirect standardization
Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate groupspecific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.
The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and groupspecific rates for the standard population.
Example
Consider the below data for which the researcher could not obtain the genderspecific rates.
From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.
If we also know that:
 Group 1 male 60%, female 40%
 Group 2 male 80%, female 20%
What is the expected crude rate for group 1?
It is (2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000. The observed crude rate is 1.68.
Stop and Think!
Come up with an answer to this question by yourself and then click on the button below to reveal the solution.
What is the expected crude rate for group 2?
Answer: the expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.
Try it! Comparison of Direct and Indirect Standardization Methods
Try the following true/false questions to test your knowledge.

Ageadjusted rates are measures of mortality risk, conveying the magnitude of a health problem.
False

Ageadjusted rates can be compared regardless of the standard population used.
False
Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates. 
Direct age adjustment is an average of the observed agespecific rates, weighting each agespecific rate by the proportion of that same age group in a standard population.
True

Indirect adjustment is preferred if there are only a few cases across all age groups.
True

If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.
True

To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.
True

To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.
False
The ageadjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.
4.5  Lesson 4 Summary
4.5  Lesson 4 SummaryLesson 4 Summary
Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3. The next step of comparing outcomes between groups was introduced in Lesson 4. We saw some examples of situations when the goal is to compare groups, and learned the two main ways to do so: absolute (differences) and relative (ratios). An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons.