# 10.1 - Specific Data Imperfections

10.1 - Specific Data Imperfections

## Evaluable

Protocols sometimes contain improper plans which create or exacerbate imperfections in the data. One problem in this regard involves the evaluation of which patients received the correct treatment at the correct amounts. Patients who meet certain criteria are said to be "evaluable".

As an example, consider an SE trial in which $$N_E$$ patients are considered evaluable and $$N_I$$ patients are considered inevaluable. Suppose that the numbers of evaluable and inevaluable patients with favorable outcomes are $$R_E$$ and $$R_I$$, respectively. You may consider using one of the following estimates for the probability of a favorable outcome, namely,

$$P=\frac{R_E+R_I}{N_E+N_I} \text{ and }P_E=\frac{R_E}{N_E}$$

P (pragmatic approach) is based on all patients, (intention-to-treat) whereas $$P_E$$ (explanatory approach) is based on evaluable patients only. Usually $$R_I$$ is close to zero so that $$P_E > P$$.

$$P_E$$ may appear a more appropriate estimate of the clinical effect since it is obvious that the treatment cannot have an effect if it is not received. Some investigators will write a protocol to indicate that only data from those who received treatment for at least some number of doses or longer than a particular length of time will be used in the analysis. Do you recall a major difficulty with this explanatory approach? What about post-entry exclusion bias?

Since evaluability criteria define inclusion retroactively based on treatment adherence which is not determined until completion of the study, there is potential post-entry exclusion bias. Participant data should not be selected for inclusion in data analysis based on an outcome variable.

The pragmatic approach does not encounter such difficulties, although obviously it does not help elicit biological effects in an efficacy trial. It is prudent then, to select treatments and a protocol design that will result in a high level of treatment adherence with the hope that the pragmatic/intention-to-treat approach agrees as much as possible with the explanatory approach.

## Missing data

Usually, unrecorded data imply that methodologic errors have occurred. If this happens frequently, there could be a fundamental problem with the design or conduct of the study. Some missing data are due to human error, such as forgetting to record/enter the data

In longitudinal clinical trials, some patients may be lost to follow-up. If losses to follow-up occur for reasons not associated with outcome, then they have little impact, other than reducing precision. If losses to follow-up occurred independently of the outcome, then the explanatory and pragmatic approaches would be equivalent. Investigators, however, cannot assume that all losses to follow-up are random events and conduct analyses that ignore such losses. Being lost to follow-up may be associated with a higher chance of disease progression, recurrence, or death. If a patient has not withdrawn consent, then every effort should be made to recover lost information.

There are three generic approaches to handling missing data values:

1. disregard the observations that contain missing values;
2. disregard the outcome variable if it has a high proportion of missing values;
3. replace the missing values by appropriate values (data imputation).

Data imputation is a reasonable approach under certain circumstances:

1. the frequency of missingness is relatively small (say less than 10%);
2. the outcome variable with the missing values is important clinically or biologically;
3. reasonable strategies for the data imputation exist;
4. the sensitivity of the conclusions to different data imputation strategies can be determined.

Simple data imputation involves substituting one data point for each missing value. Some substitution choices include the mean of the non-missing values or a predicted value from a linear regression model.

Another simple data imputation method is the last observation carried forward (LOCF) approach in longitudinal studies. With LOCF, the last observed value for a patient is substituted for all of that patientâ€™s subsequent missing values.

The problems with simple data imputation methods are that they can yield a very biased result and they tend to underestimate variability during the data analysis.

Multiple imputation methods are preferred, in which

1. imputations are generated, usually via a regression model, and random errors are added to the predicted values via random number generators,
2. multiple imputed data sets are created in this manner (say 10-20 data sets), and
3. the results are averaged across the multiple data sets.

In most clinical trials, it is common to find errors that yield ineligible patients participating in the trial. Objective eligibility criteria are less susceptible to error than subjective criteria. Also, patients can fail to comply with nearly every aspect of treatment specification, such as reduced or missed doses and improper dose scheduling.

Ineligible patients in the study can be:

1. included in the analysis of the cohort of eligible patients (pragmatic approach/intention-to-treat)
2. excluded from the analysis (explanatory approach).

In a randomized trial, if the eligibility criteria are objective and assessed prior to randomization, then both approaches do not cause bias. The pragmatic approach, however, increases the external validity.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility