# Lesson 4 - Comparing Groups In Terms of Disease Occurrence and Frequency

Lesson 4 - Comparing Groups In Terms of Disease Occurrence and Frequency

## Lesson 4 Objectives

Upon completion of this lesson, you should be able to:

• Organize disease frequency data into a 2 x 2 epidemiological table.
• Calculate and describe Risk Ratios and Odds Ratios to compare groups.
• Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
• Recognize situations in which direct or indirect standardization should be considered.
• Given the required data, standardize a rate with a direct and indirect method.

# 4.1 - Example Research Hypotheses & Measurement Calculations

4.1 - Example Research Hypotheses & Measurement Calculations

## Research Hypotheses

Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.

There are many examples of hypotheses that are comparative in nature, such as:

• High salt intake increases the incidence of heart disease.

→Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.

• The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.

→Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.

• The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.

→Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.

• A high intake of Vitamin C reduces the prevalence of colds.

→Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.

• Dizziness is associated more frequently with therapeutic agent A than with agent B.

→Compare the incidence of dizziness for patients receiving A with those receiving B.

• The administration of pre-surgical antibiotics decreases the rate of wound infections.

→Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.

## Data Organization: 2x2 table

Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities.  The researchers were interested in the exposure of air pollution and the outcome of mortality.

Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome.  Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those that do not are referred to as non-cases.

Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:

2 × 2 Table for an Epidemiologic Study
Category Case
(Number)
Non-Cases
(Number)
Total Exposure
(Number)
Exposed A B TotalExposed
Not Exposed C D TotalNotExposed
Total TotalCases TotalNon-Cases Total

For the air pollution cohort study, the following tables can be constructed.

Six cities cumulative incidence of mortality data
Category Dead Alive* Total
High Pollution (Ohio) 291 1060 1351
Low Pollution (Wisconsin) 232 1399 1631
Total 523 2459 2982

*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See...

Six cities incidence rate of mortality data
Category Dead Alive* Person-Years
High Pollution (Ohio) 291   17914
Low Pollution (Wisconsin) 232   21618
Total 523   29532

*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead we need person-years for all people who experienced the outcome.

## Measures of Disease Frequency

#### Disease Prevalence [by Exposure Status]

• For Exposed: A/(A+B)
• In our pollution example, this would be 291/1351= 0.215.  Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.
• For Not Exposed: C/(C+D)
• In our pollution example, this would be 232/1631=0.142.  Thus, 14.2% of the participants from the low pollution city (ie the non-exposed group) died.

#### Exposure Prevalence [by Disease Status]

• For Cases: A/(A+C)
• In our pollution example, this would be 291/523= 0.556.  Thus, 55.6% of the participants who died were from the high pollution city.
• Fon Non-cases: B/(B+D)
• In our pollution example, this would be 1060/2459= 0.431.  Thus, 43.1% of the participants who did not die were from the high pollution city.

#### Odds of Disease [by Exposure Status]

• For Exposed: A/B
• In our pollution example, this would be 291/1060.  Thus, the odds of dying in the high pollution city are 291:1060 - which can be simplified to 1:3.64.  (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)
• For Non-Exposed: C/D
• In our pollution example, this would be 232/1399.  Thus, the odds of dying in the low pollution city are 232:1399 - which can be simplified to 1:6.03.

#### Odds of Exposure [by Disease Status]

• For Cases: A/C
• In our pollution example, this would be 291/232= 1.25:1.  Thus, the odds of being from the high pollution city are 1.25:1 for those who died.
• For Non-cases: B/D
• In our pollution example, this would be 1060/1399= 0.76:1.  Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.

There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.

# 4.2 - Using Ratios to Compare Two Populations

4.2 - Using Ratios to Compare Two Populations

A ratio may be used to convey the strength of an effect or association between two population groups or the relative 'risk' of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.

Ratio
$$\dfrac{\text { Disease Frequency }(\text { Population } A)}{\text { Disease Frequency }(\text { Population } B)}$$

A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson.  When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.

## Ratio Calculations

### Risk Ratios

#### Cumulative incidence ratio

• More generally can be thought of as:
• Ratio of Disease Incidence= [A/(A+B)] / [C/(C+D)]
• In our pollution example, this would be [291/1351] / [232/1631] = 1.51.  Thus, participants from high-pollution cities are 1.51 times as likely as those from low-pollution cities to die.  This makes sense since we saw that the cumulative incidence of death was about 21% in the high pollution city, and 14% in the low-pollution cities.

#### Incidence rate ratio

• Follows the same general formula, but instead of comparing incidences, we are comparing incidence rates.  So first, we need to calculate the incidence rate in each city.
• High pollution city incidence rate of death = 291 deaths/ 17917 person-years, simplifies to 16.24 deaths per 1000 person-years
• Low pollution city incidence rate of death = 232 deaths/ 21618 person years, simplifies to 10.73 deaths per 1000 person years
• In our pollution example this would be (16.24/1000 person-years) / (10.72/1000 person-years) = 1.51

### Odds Ratio

• Exposure Odds Ratio [A/C] / [B/D] = [A*D] / [B*C]
• Disease Odds Ratio [A/B] / [C/D] = [A*D] / [B*C]
• Both simplify to the same OR
• In our pollution example, this would be 291*1399 / 1060*232 = 1.66.  Thus, participants from the high-pollution cities have 1.66 times higher odds of dying than those from low-pollution cities.

# 4.3 - Using Differences to Compare Two Populations

4.3 - Using Differences to Compare Two Populations

Alternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds.  If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.

Differences: Disease Frequency(Population A) - Disease Frequency(Population B)

## Difference Calculations

### Risk Difference

#### Cumulative incidence difference

• More generally can be thought of as:
• Difference of Disease Incidence= [A/(A+B)]  -  [C/(C+D)]
• In our pollution example, this would be [291/1351]  -  [232/1631] = 21.5% - 14.2% = 7.3%.  Thus, participants from high-pollution cities have a 7.3% higher risk of death than participants from low-pollution cities.

#### Incidence rate difference

• In our pollution example, this would be (16.24/1000 person-years) - (10.72/1000 person-years) = 5.51/1000 person-years. Thus, there are 5.51 excess deaths per 1000 person-years among those in the high pollution city.  Alternatively, the number of deaths could be reduced by 5.51 per 1000 person-years, if the pollution level in the high-pollution city was reduced to that of the low-pollution city.

### Attributable Proportion among the Total Population (APt)

(Also known as population attributable risk (PAR))

The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.

#### General Formula

• $$\mathrm{AP}_{\mathrm{t}}=\dfrac{\text { Risk(study population)-Risk(unexposed group) }}{\text { Risk(study population) }}$$

#### Incidence rate APt

• First, we need to calculate the incidence rate in the entire population.  This would be the sum of all the deaths (1430) divided by the sum of all the person-years (111076) = 12.87 deaths per 1000 person-years.
• In our example, this would be [(12.87/1000 person-years - 10.73/1000 person-years)] / (12.87/1000 person-years) = 0.166.  Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.

# 4.4 - Standardization

4.4 - Standardization

When comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.

Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population at-risk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.

• A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population at-risk, adjusted to remove the effect of potential confounder [e.g., age]).
• Age-specific rates (i.e.., x cases in a specific age group/(population at-risk in same age group) are also useful in summarizing the health status of a population.

## Types of Standardization

There are two different approaches to standardizing a rate:

### Direct standardization

Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted group-specific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of age-specific rates.

$$I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}$$
where Ii is a group-specific rate and $$\sum W_{i}=1$$

The necessary data for direct standardization is the group-specific disease rates for the study population and the population distribution from the standard population.

#### Stop and Think!

Review the SEER Stat Tutorials: Calculating Age-adjusted Rates and Pennsylvania Dept of Health’s tutorial on Age-Adjusted Rates (pa.gov) for producing an adjusted rate by the direct method.

Do you understand how direct adjustment is a weighted average of age-specific rates?

### Indirect standardization

Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate group-specific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.

The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and group-specific rates for the standard population.

## Example

Consider the below data for which the researcher could not obtain the gender-specific rates.
From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.
If we also know that:

• Group 1 male 60%, female 40%
• Group 2 male 80%, female 20%

What is the expected crude rate for group 1?
It is (2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000. The observed crude rate is 1.68.

#### Stop and Think!

Come up with an answer to this question by yourself and then click on the button below to reveal the solution.

What is the expected crude rate for group 2?

Answer: the expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.

## Try it! Comparison of Direct and Indirect Standardization Methods

Try the following true/false questions to test your knowledge.

1. Age-adjusted rates are measures of mortality risk, conveying the magnitude of a health problem.

False

2. Age-adjusted rates can be compared regardless of the standard population used.

False
Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates.

3. Direct age adjustment is an average of the observed age-specific rates, weighting each age-specific rate by the proportion of that same age group in a standard population.

True

4. Indirect adjustment is preferred if there are only a few cases across all age groups.

True

5. If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.

True

6. To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.

True

7. To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.

False
The age-adjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.

# 4.5 - Lesson 4 Summary

4.5 - Lesson 4 Summary

## Lesson 4 Summary

Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3.  The next step of comparing outcomes between groups was introduced in Lesson 4.  We saw some examples of situations when the goal is to compare groups, and learned the two main ways to do so:  absolute (differences) and relative (ratios).  An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility