4 Comparing Groups In Terms of Disease Occurrence and Frequency
Objectives
Upon completion of this lesson, you should be able to:
- Organize disease frequency data into a 2 x 2 epidemiological table.
- Calculate and describe Risk Ratios and Odds Ratios to compare groups.
- Calculate and describe Risk Differences and Population Attributable Risk to compare groups.
- Recognize situations in which direct or indirect standardization should be considered.
- Given the required data, standardize a rate with a direct and indirect method.
4.1 Example Research Hypotheses & Measurement Calculations
Research Hypotheses
Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location, time, or exposure status. We’d like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.
There are many examples of hypotheses that are comparative in nature, such as:
High salt intake increases the incidence of heart disease.
Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.
The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.
Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.
The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.
Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.
A high intake of Vitamin C reduces the prevalence of colds.
Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.
Dizziness is associated more frequently with therapeutic agent A than with agent B.
Compare the incidence of dizziness for patients receiving A with those receiving B.
The administration of pre-surgical antibiotics decreases the rate of wound infections.
Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.
Data Organization: 2 x 2 table
Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities. The researchers were interested in the exposure of air pollution and the outcome of mortality.
Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome. Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those who do not are referred to as non-cases.
Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:
Category | Case (Number) |
Non-Cases (Number) |
Total Exposure (Number) |
---|---|---|---|
Exposed | A | B | TotalExposed |
Not Exposed | C | D | TotalNotExposed |
Total | TotalCases | TotalNon-Cases | Total |
For the air pollution cohort study, the following tables can be constructed.
Category | Dead | Alive* | Total |
---|---|---|---|
High Pollution (Ohio) | 291 | 1060 | 1351 |
Low Pollution (Wisconsin) | 232 | 1399 | 1631 |
Total | 523 | 2459 | 2982 |
*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See…
Category | Dead | Alive* | Person-Years |
---|---|---|---|
High Pollution (Ohio) | 291 | - | 17914 |
Low Pollution (Wisconsin) | 232 | - | 21618 |
Total | 523 | - | 29532 |
*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead, we need person-years for all people who experienced the outcome.
Measures of Disease Frequency
There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.
4.2 Using Ratios to Compare Two Populations
A ratio may be used to convey the strength of an effect or association between two population groups or the relative ‘risk’ of the study (e.g. exposed) group compared to a comparison group (e.g. unexposed.). A ratio is not dependent on the prevalence of exposure among the study population.
A ratio can be reported with upper and lower bounds. We will learn some formulas for these calculations in a later lesson. When there is no significant difference between groups, the ratio will equal 1 and/or include 1 in its confidence interval.
Ratio Calculations
4.3 Using Differences to Compare Two Populations
Alternatively, differences can be calculated between the estimates for the two groups. The difference can be reported with a confidence interval that includes upper and lower bounds. If the confidence interval includes 0, this indicates that there is no significant difference between the groups. If the interval does not include 0, there is an increased risk for one population compared to the other (or conversely, a decreased risk). The difference can convey an excess or decreased risk among the exposed group due to exposure, possibly an excess or decreased risk that would be removed if the exposure ends, a potential reduction in risk for exposed individuals, or the absolute risk of the exposure.
Difference Calculations
Attributable Proportion among the Total Population (APt)
(Also known as population attributable risk (PAR))
The Attributable Proportion among the Total Population depends upon the prevalence of the exposure in the study population. This value is often used to convey implications for policy or regulations.
General Formula
- \(\mathrm{AP}_{\mathrm{t}}=\dfrac{\text{ Risk(study population)}-\text{Risk(unexposed group) }}{\text{ Risk(study population) }}\)
Incidence rate APt
- First, we need to calculate the incidence rate in the entire population. This would be the sum of all the deaths (1430) divided by the sum of all the person-years (111076) = 12.87 deaths per 1000 person-years.
- In our example, this would be [(12.87/1000 person-years - 10.73/1000 person-years)] / (12.87/1000 person-years) = 0.166. Thus 16.6% of the deaths in the population are attributable to the high pollution levels, and thus would be eliminated if the pollution levels were reduced.
4.4 Standardization
When comparing groups, it is important to make sure we are making a fair comparison. Thus, it is helpful to standardize the rates, in order to remove the effect of a potential confounder (often age), which might differ between populations and could distort the results. Standardization is also helpful when comparing rates of one population over time, such as monitoring disease in a population over many years.
Disease can be measured in one population or compared between populations. Within one population, it is common to summarize disease burden with the number of cases. Another measure is the crude rate (i.e., x cases / y population at-risk), which you will also recognize as the cumulative incidence rate. If the distribution of a modifier of disease frequency (such as age) is different between two populations, however, a comparison of the crude rates in the two populations can mask the rate.
- A standardized rate is a measure of disease frequency that facilitates comparisons of populations with a different distribution of one or more potential confounding variables. (e.g., x cases / y population at-risk, adjusted to remove the effect of potential confounder [e.g., age]).
- Age-specific rates (i.e., x cases in a specific age group/(population at-risk in same age group)) are also useful in summarizing the health status of a population.
Types of Standardization
There are two different approaches to standardizing a rate:
Direct standardization
Direct standardization, more commonly used, creates a summary disease rate for a population that would be expected if the study population had a population distribution identical to that of an arbitrarily chosen standard population. A reference population is used as the standard population. The standardized rate is the sum of weighted group-specific rates, with weights derived from the standard population. The weights sum to 1.0. A standardized rate is essentially a weighted average of age-specific rates.
\[I_{W}=\dfrac{\sum W_{i} I_{i}}{\sum W_{i}}\]
where \(I_i\) is a group-specific rate and \(\sum W_{i}=1\)
The necessary data for direct standardization is the group-specific disease rates for the study population and the population distribution from the standard population.
Try It!
Review the SEER Stat Tutorials: Calculating Age-adjusted Rates and Pennsylvania Dept of Health’s tutorial on Age-Adjusted Rates (pa.gov) for producing an adjusted rate by the direct method.
Do you understand how direct adjustment is a weighted average of age-specific rates?
Indirect standardization
Indirect standardization also produces a weighted average, through the production of a summary disease rate for the study population which would be expected if the disease experience of the study population were identical to that of a standard population. The standard population is arbitrarily chosen, but should be as similar as possible to the study population Indirect adjustment is used when accurate group-specific rates for the study population are not available. If these rates are available, direct adjustment is preferred because it uses more information from the study population. Indirect adjustment produces an expected rate. Observed and expected rates are typically compared as a standardized ratio. Indirect adjustment is often used in occupational health to calculate standardized mortality ratios, which is dividing the observed death rate by the expected death rate.
The data required for indirect standardization is the crude rate for the study population; the population distribution for the study population and group-specific rates for the standard population.
Example 4.1 (Gender Specific Rates) Consider the below data for which the researcher could not obtain the gender-specific rates.
From a standard population, it is known that the crude rate is 1.5/1000, the male rate is 2.2/1000, and the female rate is 0.9/1000.
If we also know that:
- Group 1 male 60%, female 40%
- Group 2 male 80%, female 20%
What is the expected crude rate for group 1?
It is \((2.2 × 0.6) + (0.9 × 0.4) = 1.68 / 1000\). The observed crude rate is 1.68.
Try It!
Come up with an answer to this question by yourself and then click on the button below to reveal the solution.
What is the expected crude rate for group 2?
The expected crude rate is (2.2 * 0.8) + (0.9 * 0.2) = 1.94.
Try It!
Comparison of Direct and Indirect Standardization Methods
Try the following true/false questions to test your knowledge.
Age-adjusted rates are measures of mortality risk, conveying the magnitude of a health problem.
False
Age-adjusted rates can be compared regardless of the standard population used.
Direct standardized rates are only comparable if the same standard population is used. For example, the US standard population of 1940 was considerably younger than the US standard population based on the 2000 census. This will affect the adjusted rates. Always pay attention to the reference population when comparing standardized rates.
Direct age adjustment is an average of the observed age-specific rates, weighting each age-specific rate by the proportion of that same age group in a standard population.
True
Indirect adjustment is preferred if there are only a few cases across all age groups.
True
If I want to understand the magnitude of a health problem, the appropriate statistic is the number of events.
True
To explore the underlying risk in a population, the appropriate statistic is a crude rate with its confidence interval.
True
To compare populations on the basis of differences in risk, after controlling for age, the appropriate statistic is the crude relative risk.
FalseThe age-adjusted rates and confidence intervals or relative risk (rate ratio) adjusted for age.
4.5 Lesson Summary
Describing the state of public health is an important component of epidemiology, which was presented in Lesson 3. The next step of comparing outcomes between groups was introduced in Lesson 4. We saw some examples of situations when the goal is to compare groups and learned the two main ways to do so: absolute (differences) and relative (ratios). An additional point to consider is that the groups may differ on a characteristic that affects the measure, most often age, so standardization is needed in order to make fair comparisons.