Confounding is a situation in which the effect or association between an exposure and outcome is distorted by the presence of another variable. Positive confounding (when the observed association is biased away from the null) and negative confounding (when the observed association is biased toward the null) both occur.
If an observed association is not correct because a different (lurking) variable is associated with both the potential risk factor and the outcome, but it is not a causal factor itself, confounding has occurred. This variable is referred to as a confounder. A confounder is an extraneous variable that wholly or partially accounts for the observed effect of a risk factor on disease status. The presence of a confounder can lead to inaccurate conclusions.
A confounder meets each of the following three criteria:
- It is a risk factor for the disease, independent of the putative risk factor.
- It is associated with putative risk factor.
- It is not in the causal pathway between exposure and disease.
The first two of these conditions can be tested with data. The third is more biological and conceptual.
Confounding masks the true effect of a risk factor on a disease or outcome due to the presence of another variable. We identify potential confounders from our:
- Prior experience with data
- Three criteria for confounders
We will talk more about this later, but briefly here are some methods to control for a confounding variable (if the confounder is suspected a priori):
- randomize individuals into different groups (use an experimental approach)
- restrict/filter for certain groups
- match in case-control studies
- analysis (stratify, adjust)
Controlling potential confounding starts with a good study design including anticipating potential confounders.
Example: Coronary Heart Diseas and Diabetes Section
Suppose as part of the cross-sectional study we survey patients to find out whether they have coronary heart disease (CHD) and if they are diabetic. We generate a 2 × 2 table (below):
|No Diabetes||90 (3.9%)||2241||2331|
The prevalence of coronary heart disease among people without diabetes is 90 divided by 2331, or 3.9% of all people with diabetes have coronary heart disease. By a similar calculation, the prevalence among those with diabetes is 12%. A chi-squared test shows that the p-value for this table is p<0.001. The large sample size results in a significant p-value, and the magnitude of the difference is fairly large (12% v 3.9%).
- Prevalence Ratio (PR):
- The prevalence ratio, considering whether diabetes is a risk factor for coronary heart disease is 12 / 3.9 = 3.1. Thus, people with diabetes are 3.1 times as likely to have CHD than those without diabetes.
- Odds Ratio (OR):
- The odds ratio, considering whether the odds of having CHD is higher for those with versus without diabetes is ( 2241 × 26) / ( 90 × 190) = 3.41. The odds of having CHD among those with diabetes is 3.41 times as high as the odds of having CHD among those who do not have diabetes.
Which of these do you use? They come up with slightly different estimates.
It depends upon your primary purpose. Is your purpose to compare prevalences? Or, do you wish to address the odds of CHD as related to diabetes?
Now, let's add hypertension as a potential confounder. There are 3 criteria to evaluate to assess if hypertension is a confounder.
"Is hypertension (confounder) associated with CHD (outcome)?" also could be thought of as “Is hypertension a risk factor for CHD, independent of diabetes?”
First of all, prior knowledge tells us that hypertension is related to many heart related diseases. Prior knowledge is an important first step but let's test this with data. We look at this relationship just among the non-diabetics, so as to not complicate the relationship between the confounder and the outcome.
Consider the 2 × 2 table below:
Category CHD No CHD Total Hypertension 39 (5.5%) 669 708 No Hypertension 51 (3.1%) 1572 1623 Total 90 2241 2331
PR = 1.75
OR = 1.80
The prevalence of coronary heart disease among people without hypertension is 51 divided by 1623, or 3.1% of all people with hypertension have coronary heart disease. By a similar calculation, the prevalence among those with hypertension is 5.5%. A chi-squared test shows that the p-value for this table is p=0.006. The large sample size results in a significant p-value, even if the magnitude of the difference is not large. But yes, we see that hypertension is associated with higher rates of CHD.
This leads us to our next question, "Is hypertension (confounder) associated with diabetes (exposure)?"
Category Diabetes No Diabetes Total Hypertension 133(15.8%) 708 841 No Hypertension 83(4.9%) 1623 1706 Total 216 2331 2547
PR = 3.25
OR = 3.67
The prevalence of diabetes among people without hypertension is 83 divided by 1706, or 4.9% of all people with hypertension have diabetes. By a similar calculation, the prevalence among those with hypertension is 15.8%. A chi-squared test shows that the p-value for this table is p<0.001. The large sample size results in a significant p-value, and the magnitude of the difference is fairly large.
A final question, "Is hypertension an intermediate pathway between diabetes (exposure) and development of CHD?"
– or, vice versa, does diabetes cause hypertension which then causes coronary heart disease? Based on biology, that is not the case. Diabetes in and of itself can cause coronary heart disease. Using the data and our prior knowledge, we conclude that hypertension is a major confounder in the diabetes-CHD relationship.
What do we do now that we know that hypertension is a confounder?Stratify....let's consider some stratified assessments...
Among hypertensives: Category CHD No CHD Total Diabetes 20 (15%) 113 133 No Diabetes 39 (5.5%) 669 708 Total 59 782 841
PR = 2.73
OR = 3.04
Among non-hypertensives: Category CHD No CHD Total Diabetes 6 (7%) 77 83 No Diabetes 51 (3.1%) 1572 1623 Total 57 1649 1706
PR = 2.30
OR = 2.40
Both estimates of the odds ratio (hypertensives OR=3.04, non-hypertensives OR= 2.40) are lower than the odds ratio based on the entire sample (OR=3.41). If you stratify a sample, without losing any data, wouldn't you expect to find the crude odds ratio to be a weighted average of the stratified odds ratios? A similar phenomenon occurs with the prevalence ratios: (hypertensives PR=2.73, non-hypertensives PR= 2.30) when the PR for the entire sample was 3.1.
This is an example of confounding - the stratified results are both on the same side of the crude odds ratio. This is positive confounding because the unstratified estimate is biased away from the null hypothesis. The null is 1.0. The true odds ratio, accounting for the effect of hypertension, is 2.84 from the Maentel Hanzel test. The crude odds ratio of 3.41 was biased away from the null of 1.0. (In some studies you are looking for a positive association; in others, a negative association, a protective effect; either way, differing from the null of 1.0). The adjusted prevalence ratio is 2.60.
This is one way to demonstrate the presence of confounding. You may have a priori knowledge of confounded effects, or you may examine the data and determine whether confounding exists. Either way, when confounding is present, as, in this example, the adjusted odds ratio should be reported. In this example, we report the odds ratio for the association of diabetes with CHD = 2.84, adjusted for hypertension. Accordingly, the prevalence ratio for the association of diabetes with CHD is 2.60, adjusted for hypertension.