##
Research Hypotheses
Section* *

Suppose our goal is to compare two populations with regard to disease or exposure-disease frequency. We wish to use precise valid measures of disease frequency such as point prevalence, period prevalence, cumulative incidence, incidence density, etc. The two populations must be distinct in location or time or exposure status. We'd like to apply statistical tests to these measures to see if any difference is likely to have occurred by chance.

There are many examples of hypotheses that are comparative in nature, such as:

**High salt intake increases the incidence of heart disease.**→Compare the incidence of heart disease among persons with high salt intake with those who have a low salt intake.

**The prevalence of screening mammograms in managed care organization A and managed care organization B are not equal.**→Compare the prevalence of screening mammograms for members of managed care organization A and managed care organization B.

**The incidence of lower extremity amputations among diabetics can be reduced through quarterly foot examinations.**→Compare the incidence of lower extremity amputations among diabetics who receive quarterly foot examinations to the incidence among diabetics who receive less than quarterly foot examinations.

**A high intake of Vitamin C reduces the prevalence of colds.**→Compare the prevalence of colds for persons with a high intake of Vitamin C compared to those with a low intake.

**Dizziness is associated more frequently with therapeutic agent A than with agent B.**→Compare the incidence of dizziness for patients receiving A with those receiving B.

**The administration of pre-surgical antibiotics decreases the rate of wound infections.**→Compare the incidence of wound infections for persons who receive antibiotics with those who do not receive antibiotics.

##
Data Organization: 2x2 table
Section* *

Consider this cohort study, An Association between Air Pollution and Mortality in Six U.S. cities, which investigated the relationship between air pollution and mortality in six US cities. The researchers were interested in the __exposure__ of air pollution and the __outcome__ of mortality.

Exposure refers to the characteristic of interest that the researcher hypothesizes may be associated with or causing a certain outcome. Often in epidemiological studies, the outcome of interest is a certain disease. Those who develop the disease are often referred to as cases, while those that do not are referred to as non-cases.

Data from a study that includes a risk factor (exposure) and indicators of the presence or absence of disease is often summarized as shown below:

Category | Case (Number) |
Non-Cases (Number) |
Total Exposure (Number) |
---|---|---|---|

Exposed | A | B | Total_{Exposed} |

Not Exposed | C | D | Total_{NotExposed} |

Total | Total_{Cases} |
Total_{Non-Cases} |
Total |

For the air pollution cohort study, the following tables can be constructed.

Category | Dead | Alive* | Total |
---|---|---|---|

High Pollution (Ohio) | 291 | 1060 | 1351 |

Low Pollution (Wisconsin) | 232 | 1399 | 1631 |

Total | 523 | 2459 | 2982 |

*this column was calculated by subtracting the number alive in Table 1 of the manuscript from the total number of participants. See...

Category | Dead | Alive* | Person-Years |
---|---|---|---|

High Pollution (Ohio) | 291 | 17914 | |

Low Pollution (Wisconsin) | 232 | 21618 | |

Total | 523 | 29532 |

*For incidence rate, the number of non-diseased (i.e. alive) participants is not necessary. Instead we need person-years for all people who experienced the outcome.

##
Measures of Disease Frequency
Section* *

**Disease Prevalence [by Exposure Status]**

**For Exposed: A/(A+B)**- In our pollution example, this would be 291/1351= 0.215. Thus, 21.5% of the participants from the high pollution city (ie the exposed group) died.

**For Not Exposed: C/(C+D)**- In our pollution example, this would be 232/1631=0.142. Thus, 14.2% of the participants from the low pollution city (ie the non-exposed group) died.

**Exposure Prevalence [by Disease Status]**

**For Cases: A/(A+C)**- In our pollution example, this would be 291/523= 0.556. Thus, 55.6% of the participants who died were from the high pollution city.

**Fon Non-cases: B/(B+D)**- In our pollution example, this would be 1060/2459= 0.431. Thus, 43.1% of the participants who did not die were from the high pollution city.

**Odds of Disease [by Exposure Status]**

**For Exposed: A/B**- In our pollution example, this would be 291/1060. Thus, the odds of dying in the high pollution city are 291:1060 - which can be simplified to 1:3.64. (This value is hardly ever reported, but is needed to calculate the odds ratio, which will be presented later.)

**For Non-Exposed: C/D**- In our pollution example, this would be 232/1399. Thus, the odds of dying in the low pollution city are 232:1399 - which can be simplified to 1:6.03.

**Odds of Exposure [by Disease Status]**

**For Cases: A/C**- In our pollution example, this would be 291/232= 1.25:1. Thus, the odds of being from the high pollution city are 1.25:1 for those who died.

**For Non-cases: B/D**- In our pollution example, this would be 1060/1399= 0.76:1. Thus, the odds of being from the high pollution city are 0.76:1 for those who did not die.

There are two ways to compare measures between groups: ratios and differences. The next few sections will outline both methods and show examples using the air pollution study. Also, note that for these examples, estimates for the cumulative incidences and incidence rates are similar, but that is not always the case. In this study, person-years were similar across groups, resulting in similar estimates.