8.1 - Bias

Bias is a systematic error in the design, recruitment, data collection, or analysis that results in a mistaken estimation of the true effect of the exposure and the outcome.

Bias occurs when the method used to select subjects or collect data results in an incorrect association.  Bias is something to consider while designing the study - it usually cannot simply be “corrected” in the analysis stage of the study.  

Two main types of bias are selection and information bias.

Selection Bias
Selection bias is the systematic error in the selection or retention of participants.

Examples of selection bias:

  • Suppose you are selecting cases of rotator cuff tears (a shoulder injury). Many older people have experienced this injury to some degree, but have never been treated for it. Persons who are treated by a physician are far more likely to be diagnosed (and identified as cases) than persons who are not treated by a physician. If a study only recruits cases among patients receiving medical care, there will be selection bias.
  • Some investigators may identify cases predicated upon previous exposure. Suppose a new outbreak is related to a particular exposure, for example, a particular pain reliever. If a press release encourages people taking this pain reliever to report to a clinic to be checked to determine if they are a case and these people then become the cases for the study, a bias has been created in sample selection. Only those taking the medication were assessed for the problem. Ascertaining a case based upon previous exposure creates a bias that cannot be removed once the sample is selected.
  • Exposure may affect the selection of controls – e.g, hospitalized patients are more likely to have been smokers than the general population. If controls are selected among hospitalized patients, the relationship between an outcome and smoking may be underestimated because of the increased prevalence of smoking in the control population.
  • In a cohort study, people who share similar characteristics may be lost to follow-up. For example, people who are mobile are more likely to change their residence and be lost to follow-up. If the length of residence is related to the exposure then our sample is biased toward subjects with less exposure.
  • In a cross-sectional study, the sample may have been non-representative of the general population. This leads to bias. For example, suppose the study population includes multiple racial groups but members of one race participate less frequently in the type of study.
Information Bias
Information bias, (also known as misclassification bias) is the systematic error due to inaccurate measurement or classification of disease, exposure, or other variables.

Examples of information bias

  • Instrumentation - an inaccurately calibrated instrument creating a systematic error
  • Misdiagnosis - if a diagnostic test is consistently inaccurate, then information bias would occur
  • Recall bias - if individuals can't remember exposures accurately, then information bias would occur
  • Missing data - if certain individuals consistently have missing data, then information bias would occur
  • Socially desirable response - if study participants consistently give the answer that the investigator wants to hear, then information bias would occur

Misclassification can be differential or non-differential.

Differential misclassification

The probability of misclassification varies for the different study groups, i.e., misclassification is conditional upon exposure or disease status.
Are we more likely to misclassify cases than controls? For example, if you interview cases in person for a long period of time, extracting exact information while the controls are interviewed over the phone for a shorter period of time using standard questions, this can lead to differential misclassification of exposure status between controls and cases.

Nondifferential misclassification

The probability of misclassification does not vary for the different study groups; is not conditional upon exposure or disease status, but appears random. Using the previous example, if half the subjects (cases and controls) were randomly selected to be interviewed by phone and the other half were interviewed in person, the misclassification would be nondifferential.

Either type of misclassification can produce misleading results.