A randomized (experimental) trial prevents many design problems, but is rare and expensive for evaluating a screening test. Randomized screening trials do allow estimation of lead-time bias. If accepted practice is to treat disease early, which is likely for any screening effort, is it ethical to randomize subjects to a non-screening arm?
A non-randomized study is subject to selection biases as discussed earlier: volunteers are likely to be healthier than those not screened; length bias (the natural courses of the detected cases have longer preclinical phases than those cases not detected by screening) and lead-time bias (additional 'survival' time is a result of earlier detection, not longer lifespan). A non-randomized design may be a case-control study in which cases (that would have been detected without screening) and controls (cases in all stages) are retrospectively compared with regard to prior screening experience.
A randomized screening trial depicted on the left in the figure below. Individuals are randomized into screening or no screening arms of the trial. When disease is detected, treatment begins. Outcomes are compared between these two arms. This design tests the effect of screening.
Another type of study is represented on the right in the figure below. All are screened and followed. If early disease is found, individuals are randomized into early treatment or no treatment until symptoms present (usual treatment). Then, measure outcomes. In this study design, what is really tested is the efficacy of treating early disease. Does treating early disease improve outcome?
It seems a little hard to understand how the second design can meet requirements of an IRB today. If you find individuals with disease, how do you make an ethical argument for not treating all? Only if the necessity of early treatment is questioned. Prostate cancer is a good example because there is controversy as to whether it is more effective to treat men early or adopt a posture of watchful waiting.
Multiple Observations or Observers
Do different people looking at the same results, independently and consistently arrive at the same conclusions?
- the consistency of results from multiple screening tests or multiple observers
Reliability can be assessed in various ways:
- Intrasubject (multiple screening tests) - means, averages; paired t-tests
- Inter-observer or inter-instrument (multiple observers or instruments)
- Dichotomous outcome with paired samples
- Percent agreement = a / (a + b + c)
- Kappa statistic (test agreement, not quantify agreement)
- McNemar’s test - non parametric test of agreement of paired samples
- Continuous outcome
- Differences in paired measurements
- Coefficient of variation
This 2 x 2 table below depicts inter-observer reliability data. Each observer makes a judgment. If inter-observer reliability is high, the greatest proportion of their decisions are in cell a and d, but some of them are going to end up in cell b and c.
We can use percent agreement, a Kappa statistic, or McNemar's test to assess such data.