Lesson 11: Diagnostic Tests & Disease Screening Studies

Lesson 11: Diagnostic Tests & Disease Screening Studies

Overview

In this lesson, we'll discuss the prevention and early detection of disease. Early detection is accomplished through screening procedures.

Objectives

Upon completion of this lesson, you should be able to:

  • differentiate between primary, secondary, and tertiary prevention
  • calculate and differentiate between sensitivity, specificity, and positive and negative predictive values of a diagnostic test or series of tests
  • recognize evidence supporting or failing to support the effectiveness of a proposed prevention measure
  • identify examples of bias in screening studies.

11.1 - Prevention

11.1 - Prevention

Epidemiologists, medical and public health professionals use the terms 'primary prevention', 'secondary prevention', and even 'tertiary prevention'. What is the difference between primary and secondary prevention of disease? What is an example of tertiary prevention?

Primary prevention
Primary prevention prevents the onset of disease, which may be accomplished through the removal of a risk factor. For example, eating more fruits and vegetables may reduce the risk of the onset of diabetes, insulin resistance, colon cancer, or heart disease. Immunizations against diseases are another example of primary prevention.
Secondary prevention
Secondary prevention is the detection of disease among asymptomatic persons when treatment of early disease can reduce morbidity or mortality.
Tertiary prevention
Tertiary prevention is the prevention of health deterioration once the disease is present. For instance, once diagnosed with diabetes, managing insulin levels and regularly examining feet is tertiary prevention, relieving or preventing complications of the disease.

Let's try to apply these terms (you'll find that it is not as easy as it is defined)...

  Stop and Think!


What level of prevention is:

  1. Control of blood glucose among diabetics?
  2. Increasing physical activity?
  3. Increasing physical activity among diabetics?
  4. Quitting smoking?
  5. Eliminating cigarette vending machines from places frequented by adolescents?
  6. Detecting polymorphisms for a breast-cancer gene such as \(BRCA_1\)?
  1. Control of blood glucose among diabetics? Tertiary - the disease has been diagnosed; control of blood glucose is to prevent further deterioration of health
  2. Increasing physical activity? Primary - if the individual has no obesity-related disease at present
  3. Increasing physical activity among diabetics? Tertiary - increased activity within a population defined by its disease status.
  4. Quitting smoking? Could be Primary or Secondary, depending on the damage already done to the lungs at the time of quitting; even smoking 100 cigarettes has been shown to decrease lung function
  5. Eliminating cigarette vending machines from places frequented by adolescents? Pre-Primary - we are trying to prevent adolescents from smoking by removing the opportunity to even begin smoking.
  6. Detecting polymorphisms for a breast-cancer gene such as \(BRCA_1\)? Secondary - because you are detecting an increased risk of disease, not the disease itself. This is a little bit different than measuring blood glucose for diabetes or women having mammograms to detect breast cancer because both are detecting the presence of disease. \(BRCA_1\) is associated with an increased risk for developing the disease later in life; the vast majority of women with the \(BRCA_1\) mutation don't go on to develop breast cancer.

Is prevention of disease a worthwhile effort? Danaei, Ding et al. (2009) assessed the effects of 12 modifiable risk factors on mortality in the U.S. These authors estimated that tobacco smoking and high blood pressure were responsible for 467,000 and 395,000 deaths respectively, in 2005, accounting for about one in every five or six deaths among U.S. adults. Overweight obesity and physical inactivity each accounted for 1/10 deaths. If the U.S. population was filled with active nonsmokers with controlled blood pressure who were not overweight, perhaps as many as 4/10 deaths would be averted each year! These substantial numbers support the importance of epidemiological studies applied to the prevention of disease, as opposed to simply identifying causative factors.

If a large proportion of deaths is associated with preventable risk factors, should resources be allocated to eliminating these risk factors? In the face of competing demands and finite resources, what preventative measures have the greatest impact? Which services should be the focus of clinical practice improvement and national policies and programs? Masciosek et al rank 25 evidence-based clinical primary and secondary preventative services based on their relative value to the U.S. population (as of 2004 information). The measures used for the rankings were a clinically preventable burden (CPB) and cost-effectiveness (CE). CPB was defined as the disease, injury, and premature death that would be prevented if the service were delivered at recommended intervals to a U.S. birth cohort, expressed as quality-adjusted life years. CE was defined as the average net cost per QALY gained by offering the preventative service. The Partnership for Prevention lists the rankings and provides supporting evidence on the Annals of Family medicine website. Clicking on a preventative service in the rankings brings up mortality rates, incidence rates, and risk factor prevalence by sub-population when those data are available.

  Stop and Think!


Suppose you are the health director of the state of Pennsylvania and you have options of implementing free breast cancer screening versus free vaccination of children (let's say MMR - measles, mumps, rubella) - but only enough resources for one project- which will you choose? You might consider the cost of mammography versus the cost of the vaccination. Mammography is much more expensive than MMR vaccination. Vaccination is relatively simple. The vaccine is delivered to clinics, children inoculated. Disease prevented. However, it is also important to look at effectiveness in reducing preventable mortality. How would these rankings help you reach your decision?

In this first example, N = 100 is not very large compared to n, so one should not ignore the finite population adjustment!


11.2 - Early Detection and Screening

11.2 - Early Detection and Screening

In this course, we are characterizing early detection and screening as secondary prevention. Classic examples include mammography to detect breast cancer, Pap smears to detect cervical cancer, fasting blood glucose to detect diabetes, PSA's to detect prostate cancer, etc.

Let's look at Gordis' map of the natural history of disease, (from Gordis L. Epidemiology. Philadelphia: Saunders and Company, 1996). The biological onset of disease is followed by clinical symptoms, then diagnosis, and therapy until there is an outcome.

ASymptomsDiagnosisTherapyOutcome

We can label phases in this process. From the onset of disease until clinical symptoms occur is the pre-clinical phase. The individual has the disease but doesn't know it. The clinical phase is the latter part of the process, from the occurrence of clinical symptoms through therapy.

BSymptomsDiagnosisTherapyOutcome

Within the preclinical phase, there may be an interval between the onset of the disease and the occurrence of clinical symptoms during which disease can be detected with certain tests. This is called a detectable pre-clinical phase. If treatment is more effective when the disease is in the preclinical stage, screening for disease during the detectable pre-clinical phase offers an advantage.

CSymptomsDiagnosisTherapyOutcomeDetectablePreclinicalPhase(DPCP)DiseaseDetectableBy Screening

The gain from screening for disease is the difference between the time a disease would have been diagnosed by clinical symptoms and when it is detected with a screening procedure. This is the lead time.

DSymptomsDiagnosisTherapyOutcomeLead TimeDiseaseDetectableBy Screening

Diagnostic Tests for Asymptomatic Disease or Disease Risk

What is the Objective of Screening?

  • To improve the quality of life or to reduce the morbidity and mortality for an individual, by applying effective treatment to disease or increased risk at an early stage, when treatment is more effective than if it were applied at a later stage.

To meet this objective, we seek to identify diseased or at-risk individuals at an asymptomatic stage. This helps us to separate individuals into populations such as:

  • diseased vs non-diseased
  • at-risk vs not at-risk

However, screening tests are not 100% accurate at classifying individuals. As a result, the distributions are not completely separated. For example, consider the distributions of blood sugar in diabetics and non-diabetics as depicted below. How would you set the cutpoint for a test of blood sugar to indicate diabetes? If you choose a low value, people with normal blood sugar will be included among the diabetics; if you select a high value, some diabetics will be included with the normals.

DISTRIBUTION OF THOSE WITHOUT DIABETESDISTRIBUTION OF THOSE WITH DIABETES00510203040405080120160200240280320360BLOOD SUGAR -- mg / 100ccNUMBER OF CASES

Distribution of blood sugar in diabetics and non-diabetics
(From Blumberg M: Evaluating health screening procedures. Operations Res 5: 351-360, 1957.)

We must also recognize that screening is only useful when we are assured that treatment is more effective at the earliest stages of the disease.


11.3 - Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value

11.3 - Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value

In this example, two columns indicate the actual condition of the subjects, diseased or non-diseased. The rows indicate the results of the test, positive or negative.

Cell A contains true positives, subjects with the disease, and positive test results. Cell D subjects do not have the disease and the test agrees.

A good test will have minimal numbers in cells B and C. Cell B identifies individuals without disease but for whom the test indicates 'disease'. These are false positives. Cell C has false negatives.

TruthTest ResultTotalTNon DiseaseTDisease Disease(number)Non Disease (number) Total(number)Positive(number)Negative(number)TTest NegativeTTest PositiveABCD(True Positive)(False Positive)(True Negative)(False Negative)

If these results are from a population-based study, prevalence can be calculated as follows:

Prevalence of Disease= \(\dfrac{T_{\text{disease}}}{\text{Total}} \times 100\)

The population used for the study influences the prevalence calculation.

Sensitivity is the probability that a test will indicate 'disease' among those with the disease:

Sensitivity: A/(A+C) × 100

Specificity is the fraction of those without the disease who will have a negative test result:

Specificity: D/(D+B) × 100

Sensitivity and specificity are characteristics of the test. The population does not affect the results.

A clinician and a patient have a different question: what is the chance that a person with a positive test truly has the disease? If the subject is in the first row in the table above, what is the probability of being in cell A as compared to cell B? A clinician calculates across the row as follows:

Positive Predictive Value: A/(A+B) × 100

Negative Predictive Value: D/(D+C) × 100

Positive and negative predictive values are influenced by the prevalence of disease in the population that is being tested. If we test in a high prevalence setting, it is more likely that persons who test positive truly have the disease than if the test is performed in a population with low prevalence.

Let's see how this works out with some numbers...

Hypothetical Example 1 - Screening Test A

100 people are tested for the disease. 15 people have the disease; 85 people are not diseased. So, the prevalence is 15%:

  • Prevalence of Disease:
    \(\dfrac{T_{\text{disease}}}{\text{Total}} \times 100\),
    15/100 × 100 = 15%

Sensitivity is two-thirds, so the test is able to detect two-thirds of the people with the disease. The test misses one-third of the people who have the disease.

  • Sensitivity:
    A/(A + C) × 100
    10/15 × 100 = 67%

The test has 53% specificity. In other words, out of 85 persons without the disease, 45 have true negative results while 40 individuals test positive for a disease that they do not have.

  • Specificity:
    D/(D + B) × 100
    45/85 × 100 = 53%

The sensitivity and specificity are characteristics of this test. For a clinician, however, the important fact is among the people who test positive, only 20% actually have the disease.

  • Positive Predictive Value:
    A/(A + B) × 100
    10/50 × 100 = 20%

Of those that test negative, 90% do not have the disease.

  • Negative Predictive Value:
    D/(D + C) × 100
    45/50 × 100 = 90%
TruthTest ResultTotalTNon DiseaseTDisease Disease(number)Non Disease (number) Total(number)Positive(number)Negative(number)TTest NegativeTTest PositiveABCD(True Positive)(False Positive)(True Negative)(False Negative)104050545501585100

Now, let's change the prevalence.

Hypothetical Example 2 - Increased Prevalence, Same Test

This time we use the same test, but in a different population, with a disease prevalence of 30%.

  • Prevalence of Disease:
  • \(\dfrac{T_{\text{disease}}}{\text{Total}} \times 100\)
    30/100 × 100 = 30%

We maintain the same sensitivity and specificity because these are characteristics of this test.

  • Sensitivity:
    A/(A + C) × 100
    20/30 × 100 = 67%
  • Specificity:
    D/(D + B) × 100
    37/70 × 100 = 53%

Now let's calculate the predictive values:

  • Positive Predictive Value:
    A/(A + B) × 100
    20/53 × 100 = 38%
  • Negative Predictive Value:
    D/(D + C) × 100
    37/47 × 100 = 79%
TruthTest ResultTotalTNon DiseaseTDisease Disease(number)Non Disease (number) Total(number)Positive(number)Negative(number)TTest NegativeTTest PositiveABCD(True Positive)(False Positive)(True Negative)(False Negative)2033531037473070100

Using the same test in a population with a higher prevalence increases positive predictive value. Conversely, increased prevalence results in decreased negative predictive value. When considering predictive values of diagnostic or screening tests, recognize the influence of the prevalence of the disease. The figure below depicts the relationship between disease prevalence and predictive value in a test with 95% sensitivity and 95% specificity:

Negative TestPositive TestPredicitive Value (Percentage)Prevalence of Disease (Percentage)020204040606010080801000

Relationship between disease prevalence and predictive value in a test with 95% sensitivity and 85% specificity.
(From Mausner JS, Kramer S: Mausner and Bahn Epidemiology: An Introductory Text. Philadelphia, WB Saunders, 1985, p. 221.)

Try it!

Under what circumstances would you really want to minimize the false positives?

Minimizing false positives is important when the costs or risks of follow-up therapy are high and the disease itself is not life-threatening...prostate cancer in elderly men is one example; as another, obstetricians must consider the potential harm from a false positive maternal serum AFP test (which may be followed up with amniocentesis, ultrasonography, and increased fetal surveillance as well as producing anxiety for the parents and labeling of the unborn child), against the potential benefit.

Try it!

When would you want to minimize the false negatives?

We don’t want any false negatives if the disease is often asymptomatic and

  1. is serious, progresses quickly and can be treated more effectively at early stages, OR
  2. easily spreads from one person to another

What is a good test in a population? Actually, all tests have advantages and disadvantages, such that no test is perfect. There is no free lunch in disease screening and early detection.


11.4 - Examples

11.4 - Examples

Example 1: Accuracy of Prostate Cancer Screening Tests

Two methods are commonly used to screen for prostate cancer: PSA (a blood test), and digital rectal exam (DRE). In this example, researchers used an abnormal PSA as an indicator for prostate cancer, using a cut-off of 4.0 micrograms per milliliter. The researchers found that this test had a sensitivity of 0.67 or 67%. In other words, two-thirds of all the cases that truly have prostate cancer were detected. One-third of the cases of prostate cancer goes undiagnosed. On the other hand, when the PSA test indicated no disease, in almost all cases there was no disease. The specificity was 97%.

Test Characteristics of PSA and DRE
  Sensitivity Specificity Positive Predictive Value
Abnormal PSA
(> 4.0 micrograms/milliliter)
0.67 0.97 0.43
Abnormal DRE 0.50 0.94 0.24

Adapted from: Kramer BS, Brown ML, Prorok PC, Potosky AL, Gohagan JK. Prostate cancer screening: what we know and what we need to know. Ann Int Med 1993;119:914-923

PSA then was very good for giving a 'clear' prognosis but was not a very good test for detecting disease (only 43% of the positive results actually had prostate cancer) For a positive result, the clinician will perform a follow-up procedure.

The other common test is the digital rectal exam (DRE). DRE has a low sensitivity of 50%, and the specificity is also lower than PSA. Positive predictive value is even worse than PSA.

Example 2: Accuracy of One or Two INDEPENDENTLY Administered Prostate Cancer Screening Tests

What if you used two tests? Let's add these data to the table:

Test Characteristics of PSA and DRE
  Sensitivity Specificity Positive Predictive Value
Abnormal PSA
(> 4.0 micrograms/milliliter)
0.67 0.97 0.43
Abnormal DRE 0.50 0.94 0.24
Abnormal DRE or Abnormal PSA 0.84 0.92 0.28
Abnormal DRE and Abnormal PSA 0.34 0.995 0.49

Adapted from: Kramer BS, Brown ML, Prorok PC, Potosky AL, Gohagan JK. Prostate cancer screening: what we know and what we need to know. Ann Int Med 1993;119:914-923

If either one or the other test is positive, then sensitivity is increased, but specificity reduced.

What if both tests are positive? Will using a higher standard for declaring disease results in lower sensitivity? This is the case, sensitivity goes down while the test becomes more specific. This approach does produce the highest positive predictive value.

If the goal is to have the highest positive predictive value, the best choice is to require both tests to be abnormal. However, what is the consequence of letting two-thirds of prostate cancer cases go undiagnosed? PSA is obtained from a simple blood draw. DRE is uncomfortable but temporary, so there is not much long-term consequence from either of the test procedures. Abnormal test results, however, are often followed by biopsies which are costly, uncomfortable, and have significant co-morbidities associated with them. We don't want to put men through the follow-up unnecessarily. Prostate cancer is often slow-growing and is not communicable. If the consequences of watchful waiting are not great, we may be willing to let a sizeable proportion of men who actually have prostate cancer go undiagnosed. The choice of tests and how to use them for screening purposes is heavily influenced by the consequences of making a wrong decision.

The PSA and DRE examples used two independent tests. What if a test is performed in a series?

Example 3: Accuracy of Two Screening Tests Administered in Series

Consider a population in which there are 500 diabetic individuals among a total population of 10,000, i.e. a 5% prevalence. Suppose you administer a non-fasting blood sugar test with a sensitivity of 350/500 (70%) and a specificity of 7600/9500 (80%).

Screening Test A
Blood Sugar Diabetes
+ - Total    
+ 350 1900 2250 Sensitivity =
- 150 7600 7750 Specificity =
Total 500 9500 10000
Screening Test B
GTT Diabetes
+ - Total    
+ 315 190 505 Sensitivity =
- 35 1710 1745 Specificity =
Total 350 1900 2250 Prevalence =
Comparison
Net Sensitivity = 315/500 = 63%
Net Specificity = 7600+1710/9500=98%
Prevalence = 500/10000=5%

  Stop and Think!


Using this one test administered to a population of 10,000 people, with the prevalence of disease at 5%, how many people did you miss who had diabetes? How many false positives are there?
We missed 150 people who had diabetes while we also had 1900 false positives! This is a lot!
What should we do about the 2250 persons for whom the test was positive? Should there be a second test?

Take a look at the bottom half of the table: Of 2250, 350 have the disease. The second test has much higher sensitivity and specificity, doesn't it? (90% and 90%) To perform a glucose tolerance test (GTT), the subject fasts overnight then comes to the clinic, drinks a glucose solution, the amount determined by their weight, then blood is drawn at regular intervals and assessed for evidence of regulation of blood glucose. A GTT requires considerably greater resources than drawing blood for a blood sugar test. It makes sense to put these two tests in a series, GTT after blood glucose.

We started with 10,000 people, 315 of these we have labeled as positive out of the total of 500 that have diabetes. Net sensitivity for the series is 63%, 315 out of 500. Net specificity includes the 7600 persons correctly identified as negative with the first test plus the 1710 individuals who were ruled out with the second test divided by the total of those without diabetes, 9500, for a net specificity of 98%.

The net specificity is much higher by using the two tests in a series than by just using the first test in a population with a prevalence of 5%. A significant advantage is gained from performing the simple test upfront, identifying individuals who are positive, and following up in this group with the more complex and costly test.


11.5 - Risks from Screening

11.5 - Risks from Screening

Tests have false positives and false negatives. But there are risks associated with screening also. Recently, risks of screening are being given more attention.

True positive

What would be the risk associated with an accurate diagnosis? In some situations, a person may prefer to not know their diagnosis. What if there is no effective treatment at present? What if having the disease brings certain consequences in health insurance policies or social standing? A risk of being a true positive is the “labeling effect”.

False positive

Among the negative consequences are the following:

  • Monetary loss
  • Harm from confirmatory tests
  • Anxiety
  • Fear of future tests

 

True negative

The negative consequences here are the needless costs and risks of the screening tests.

False negative

If these individuals have some assurance that they don't have the disease, they may no longer seek health care or may disregard early symptoms. Their risks include:

  • Delayed intervention
  • Disregard of early signs or symptoms

Example: Risks Associated with Colorectal Cancer Screening

Let's take a look at an example related to colorectal cancer. Winawer et al simulated the effects of a screening program of annual fecal occult blood tests on one-thousand persons over 35 years, i.e. from age 50 until age 85. What were the consequences?

The table below provides the simulation results. Over the 35 years, 27,030 instances of FOBT-screening were performed. 2263 colonoscopies were performed as a follow-up to a positive FOBT-screening test. The colonoscopy test was negative for 2158 of the 2263 persons, but colonoscopy is not without potential complications! Complications can include death, perforation of the bowel, major bleeding, or minor complications. Notice the incidence of these harms that came from what seemed like an innocuous screening program.

Clinical Consequences for 1000 People Entering a Program of Annual FOB T-Screening for Colorectal Cancer at Age 50 and Remaining in the Program Until 85 Years of Age Clinical Consequences

Clinical Consequences Number
Harms    
Screening tests 27,030.0
Diagnostic evaluations (by colonoscopy) 2,263.0
False-positive screening tests 2,158.0
Deaths due to colonoscopy complications) 0.5
Bowel perforations from colonoscopy 3.0
Major bleeding episodes from colonoscopy 7.4
Minor complications from colonoscopy 7.7
Benefits    
Death averted 13.3
Years of life saved 123.3
Years of life gained per person whose cancer death was prevented 9.3

Adapted from Winawer SJ, Fletcher RH, Millar L, et al. Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology 1997;112:594-642.

There are also benefits to colorectal screening. Death was averted in 13 individuals, which works out to 123 years of life saved, 9.3 years per person.

  Stop and Think!

How would you compare 123 years of life saved compared to one person dying needlessly because they underwent this screening program? At the population level? As an individual?


11.6 - Screening Biases

11.6 - Screening Biases

Bias results from a problem with the methods of a study that can't be corrected in an analysis. We can adjust for the effects of confounders in an analysis. For example, we can calculate adjusted rates, but we can't correct for biases.

There are two common types of biases

Information bias
Data is not accurate, possibly due to faulty instruments, or possibly the data is wrong.
Selection bias
The study population is not representative of the larger population, possibly because of a poor sampling process, or because a lot of individuals are lost to follow-up.

Common biases in screening include:

  • Lead time (information bias)- the systematic error of apparent increased survival from detecting disease in an early stage
  • Length (information bias) - the systematic error from detecting disease with a long latency or pre-clinical period
  • Referral/Volunteer bias (selection bias) - the systematic error in detecting disease in persons who have the propensity to seek healthcare
  • Detection (information bias) - the detection of insignificant disease
Note! Lead time and length biases reduce the utility of using increased survival time as the measure of success for a screening modality. Instead, screening programs have traditionally been assessed with changes in mortality rates. Disease-specific mortality rates have been the most commonly used measure of disease frequency.

Lead Time Bias

Let's take a look at a chart we saw earlier. Here the disease starts in 1985, is diagnosed in 1992 and the person dies of that disease in 1995. How long is his survival? Three years.

DeathA198519921995SURVIVALDeathDiagnosisandTreatmentBiologicOnset ofDisease

Now we institute an effective screening program. The disease starts in 1985 and is detected by the screening program in 1989. The person dies of the disease in 1995. How long was the survival? Six years. Screening seems to have increased their survival time, correct?

B198519891995SURVIVALDeathDetectedBy Screening:Diagnosis &TreatmentBiologicOnset ofDiseaseDeath

You have also noted that in either situation, it is 10 years from the time the disease started until the person dies. If our measure is survival time, we can easily produce a lead-time bias. In this example, there is actually no benefit of the screening process, in terms of survival. The person still died in 1995. They know about the disease for three years longer; that is the effect of the screening. This example demonstrates a lead-time bias of three years.

C198519891995SURVIVALDeathDetectedBy Screening:Diagnosis &TreatmentBiologicOnset ofDisease1992SURVIVALUsualTime ofDiagnosis &TreatmentLEADTIMEBIAS

An effective screening program for a life-threatening disease should extend life. Screening studies with an outcome of survival time are subject to lead-time bias that can favor the screening process when there is no actual benefit to the program.

Another way to represent the lead-time bias is on a survival curve:

graph

Gordis, L. Epidemiology. Philadelphia: Saunders and Company, 1996

In this graph, at time zero 100% of the people are alive. After five years, 30% are alive. We have instituted a screening program that detects disease one year earlier. In this case, five years following the screening diagnosis gets us out to a 50% survival rate for individuals five years following the screening diagnosis. We have a lead time bias of a 20% increased survival to five years. All that has been done here is to move the diagnosis back one year!

Instead, we need to look at the mortality rates from the disease, i.e. the mortality rates in the exposed group and the non-exposed group. Mortality rates are the "gold standard" for measuring the effect of early screening and treatment, not survival time.

Length Bias

Let's use this graph to consider the effect of length bias:

0DDDSSSOnsetScreenedDiagnosis aftersymptomsDetection possibleby screeningSlow GrowthRapid GrowthTimeSize/Stage of Tumor

Disease onset is at zero and each line represents an individual. The bottom person, for instance, has a very slow growth rate of disease. The top line, with the steepest slope, represents someone with an aggressive disease. This person has rapid growth and dies (D). The individuals with slower growth lived to the point where they get screened (S).

A screening initiative is more likely to detect slow-growing diseases. There is not much that a researcher can do about this type of bias other than to realize it is likely to occur.

Referral/Volunteer Bias

Breast Cancer Screening - HIP Experience

Data from the Health Insurance Program (HIP) in New York are represented as rates of death in 10,000 women per year, from all causes and from cardiovascular (CV) disease causes in the table below:

Deaths /10,000 women per year
Control Women from all causes from CV causes
54 25
Experimental Women Volunteer 42 17
Refused 77 38

The control group has death rates of 54 and 25/10,000. The experimental group, women who were screened, were recruited in two different ways:

  1.  women who volunteered, after being asked to participate
  2.  women who did not volunteer to be screened, after being asked to participate

The state was able to record the death rates for each of these groups. The women who volunteered for screening had much lower rates of death, both all causes and cardiovascular causes. The women who volunteer appear to be healthier. The women who were offered to participate in the program, but refused, have the highest death rates.

If a screening study does not include a randomized process for selection, volunteers for the study are likely to be in better health than the general population. Thus, in evaluating a screening study, consider how the subjects were recruited. Were they volunteers or were they randomized into screening or no-screening groups?

 

How to Avoid Bias

Can we design a screening study without these biases?

For lead time bias – use mortality rather than survival rates.

A randomized clinical trial design can reduce biases:

  • For length time bias – count all outcomes regardless of the method of detection
  • For volunteer bias – count all outcomes regardless of the group; follow up with those who refuse to get outcomes

11.7 - Designs for Controlled Trials for Screening

11.7 - Designs for Controlled Trials for Screening

A randomized (experimental) trial prevents many design problems but is rare and expensive for evaluating a screening test. Randomized screening trials do allow estimation of lead-time bias. If accepted practice is to treat a disease early, which is likely for any screening effort, is it ethical to randomize subjects to a non-screening arm?

A non-randomized study is subject to selection biases as discussed earlier: volunteers are likely to be healthier than those not screened; length bias (the natural courses of the detected cases have longer pre-clinical phases than those cases not detected by screening) and lead-time bias (additional 'survival' time is a result of earlier detection, not longer lifespan). A non-randomized design may be a case-control study in which cases (that would have been detected without screening) and controls (cases in all stages) are retrospectively compared concerning prior screening experience.

A randomized screening trial is depicted on the left in the figure below. Individuals are randomized into either screening or no screening arms of the trial. When the disease is detected, treatment begins. Outcomes are compared between these two arms. This design tests the effect of screening.

Another type of study is represented on the right in the figure below. All are screened and followed. If an early disease is found, individuals are randomized into early treatment or no treatment until symptoms present (usual treatment). Then, measure outcomes. In this study design, what is really tested is the efficacy of treating early disease. Does treating early disease improve outcomes?

RandomizeRandomizeScreenScreenNo ScreenTreat Early DiseaseTreat at Usual Time ofPresentationTreat at Usual Time ofPresentationTreat Early Diseaseor Risk FactorEarly Disease or Risk FactorDetectedNo Disease or RiskFactor DetectedOutcomeOutcomeOutcomeOutcome

It seems a little hard to understand how the second design can meet the requirements of an IRB today. If you find individuals with the disease, how do you make an ethical argument for not treating all? Only if the necessity of early treatment is questioned. Prostate cancer is a good example because there is controversy as to whether it is more effective to treat men early or adopt a posture of watchful waiting.


Multiple Observations or Observers

Do different people look at the same results, independently and consistently arrive at the same conclusions?

Reliability
 The consistency of results from multiple screening tests or multiple observers.

Reliability can be assessed in various ways:

  • Intrasubject (multiple screening tests) - means, averages; paired t-tests
  • Inter-observer or inter-instrument (multiple observers or instruments)
    • The dichotomous outcome with paired samples
    • Percent agreement = a / (a + b + c)
    • Kappa statistic (test agreement, not quantify agreement)
    • McNemar’s test - non-parametric test of agreement of paired samples
  • Continuous outcome
    • Differences in paired measurements
    • Coefficient of variation

This 2 x 2 table below depicts inter-observer reliability data. Each observer makes a judgment. If inter-observer reliability is high, the greatest proportion of their decisions are in cells a and d, but some of them are going to end up in cells b and c.

Observer 1
Observer 2 positive negative
positive a b
negative c d

We can use percent agreement, a Kappa statistic, or McNemar's test to assess such data.


11.8 - Considerations in the Establishment of Screening Recommendations and Programs

11.8 - Considerations in the Establishment of Screening Recommendations and Programs

To summarize, all of the following will be considered when making decisions about screening programs or guidelines. Are these factors considered in the ongoing discussions about mammography guidelines?

Patient/Community/Population Considerations

  • Acceptability/desirability by patient and community
  • The historical precedent for screening
  • The sufficient burden of disease

Epidemiological/Medical Considerations

  • Different populations have a distinct and different distributions of disease
  • The natural history of disease supports early detection
  • Availability of ‘adequate’ screening tests
  • Availability of and compliance with ‘effective’ treatment and follow up
  • Acceptability by the provider and health care system

Resource Considerations

  • Will of the political and healthcare system
  • Availability of resources - funds, services, transportation,…
  • Cost-effectiveness of screening

11.9 - References

11.9 - References

books on shelves in a library

  • Barrett A, Irwig L, Glasziou P et al. User’s guide to the medical literature XVII. How to use guidelines and recommendations about screening. JAMA 1999;281(21):2029-2034.
  • Coffield AB et. Priorities among recommended clinical preventive services. Am J Prev Med. 2001 Jul;21(1):1-9.
  • Danaei G, Ding EL, Mozaffarian D, Taylor B, Rehm J, et al. The Preventable Causes of Death in the United States: Comparative Risk Assessment of Dietary, Lifestyle, and Metabolic Risk Factors. PLoS Med 2009 6(4): e1000058. doi:10.1371/journal.pmed.1000058.
  • Ghandur-Mnaymneh L, Raub WA, Sridar KS, et al. The accuracy of the histological classification of lung carcinoma and its reproducibility: a study of 75 archival cases of adenosquamous carcinoma. Cancer Invest 11:641, 1993.
  • Gordis, L. Epidemiology. Philadelphia: Saunders and Company, 1996. Chapter 4.
  • Kramer BS, Brown ML, Prorok PC, Potosky AL, Gohagan JK. Prostate cancer screening: what we know and what we need to know. Ann Int Med 1993;119:914-923.
  • McGinnis JM, Foege WH. JAMA 1993; 270(18);2207-12.
  • Wilkinson I. Illinois legislature repeals requirement for prenuptual AIDS tests. NY Times. June 25, 1989.
  • Winawer SJ, Fletcher RH, Millar L, et al. Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology 1997;112:594-642.

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility