A randomized (experimental) trial prevents many design problems but is rare and expensive for evaluating a screening test. Randomized screening trials do allow estimation of lead-time bias. If accepted practice is to treat a disease early, which is likely for any screening effort, is it ethical to randomize subjects to a non-screening arm?
A non-randomized study is subject to selection biases as discussed earlier: volunteers are likely to be healthier than those not screened; length bias (the natural courses of the detected cases have longer pre-clinical phases than those cases not detected by screening) and lead-time bias (additional 'survival' time is a result of earlier detection, not longer lifespan). A non-randomized design may be a case-control study in which cases (that would have been detected without screening) and controls (cases in all stages) are retrospectively compared concerning prior screening experience.
A randomized screening trial is depicted on the left in the figure below. Individuals are randomized into either screening or no screening arms of the trial. When the disease is detected, treatment begins. Outcomes are compared between these two arms. This design tests the effect of screening.
Another type of study is represented on the right in the figure below. All are screened and followed. If an early disease is found, individuals are randomized into early treatment or no treatment until symptoms present (usual treatment). Then, measure outcomes. In this study design, what is really tested is the efficacy of treating early disease. Does treating early disease improve outcomes?
It seems a little hard to understand how the second design can meet the requirements of an IRB today. If you find individuals with the disease, how do you make an ethical argument for not treating all? Only if the necessity of early treatment is questioned. Prostate cancer is a good example because there is controversy as to whether it is more effective to treat men early or adopt a posture of watchful waiting.
Multiple Observations or Observers
Do different people look at the same results, independently and consistently arrive at the same conclusions?
- The consistency of results from multiple screening tests or multiple observers.
Reliability can be assessed in various ways:
- Intrasubject (multiple screening tests) - means, averages; paired t-tests
- Inter-observer or inter-instrument (multiple observers or instruments)
- The dichotomous outcome with paired samples
- Percent agreement = a / (a + b + c)
- Kappa statistic (test agreement, not quantify agreement)
- McNemar’s test - non-parametric test of agreement of paired samples
- Continuous outcome
- Differences in paired measurements
- Coefficient of variation
This 2 x 2 table below depicts inter-observer reliability data. Each observer makes a judgment. If inter-observer reliability is high, the greatest proportion of their decisions are in cells a and d, but some of them are going to end up in cells b and c.
We can use percent agreement, a Kappa statistic, or McNemar's test to assess such data.