9.2 - Likelihood Methods

It may be possible to assess treatment effects after each patient is accrued, treated, and evaluated. Such an approach is impractical in most circumstances, especially for trials that require lengthy follow-up to determine outcomes.

The first classical likelihood method proposed for this situation is called the sequential probability ratio test (SPRT) and it is based on the likelihood function. (This method is very rarely implemented because it is impractical in the clinical setting, but is important for historical reasons.) Let's review this method in general terms here.

A likelihood function is constructed from a probability model for a sequence of random variables which correspond to the outcome measurements on the experimental units. In the likelihood function, however, the observed data points replace the random variables. Suppose we have a binary response (success/failure) from each patient which is determined immediately after a treatment is administered. (Again, not very practical.) However, for the situation discussed, we are examining one treatment which is administered to every patient. If there are N patients with K successes, and p represents the probability of success within each patient, then the likelihood function is based on the binomial probability function:

\(L(p, K)=p^K(1-p)^{N-K}\)

This is a very simple likelihood function for a very simple example.

If the investigator is trying to decide whether \(p_0\) or \(p_1\) is the more appropriate value of p, then the likelihood ratio can be constructed to assess the evidence:

\(R=\dfrac{L(p_0, K)}{L(p_1, K}=\left(\dfrac{p_0}{p_1} \right)^K \left(\dfrac{1-p_0}{1-p_1} \right)^{N-K} \)

This is a ratio of two different likelihood functions. If R is large, then the evidence is going to favor \(p_0\). If R is small, then the evidence is going to favor \(p_1\). Therefore, when analyzing interim data, we can calculate the likelihood ratio and stop the trial only if we have the amount of evidence that is expected for the target sample size.

Suppose that N is the target sample size and that after n patients there are k successes. After each treatment we will stop and analyze the data to determine whether to continue the trial or not. Under this scenario, we stop the trial if:

\(R=\dfrac{L(p_0, K)}{L(p_1, K}=\left(\dfrac{p_0}{p_1} \right)^k \left(\dfrac{1-p_0}{1-p_1} \right)^{n-k} \le R_L \text{ or }\ge R_U \)

where \(R_L\) and \(R_U\) are prespecified constants. Let's not worry about the details of the statistical calculation here. The values of \(R_L\) and \(R_U\) that correspond to testing \(H_0\colon p = p_0\) versus \(H_1 \colon p = p_1\) are \(R_L = \dfrac{\alpha}{(1 - \beta)}\) and \(R_U = \dfrac{(1 - beta)}{\alpha}\).

A sample schematic of the SPRT in practice is shown below. Here you would calculate R after the treatment of each patient. As you accumulate patients you can see that R is moving around as the trial proceeds. Before we had accrued all of the patients that we wanted we hit the upper boundary and would not recruit the remaining patients.

R n RU RL N

Here is another example...

The SPRT might be useful in a phase II SE trial in which a treatment is to be monitored closely to determine if it reaches a certain level of success or failure. For example, suppose the investigator considers the treatment successful if \(p = 0.4\) (40% or greater), but considers it a failure if \(p = 0.2\) (20 % or less). Thus, the hypothesis testing problem is \(H_0 \colon p = 0.2\) vs. \(H_1 \colon p = 0.4\). Suppose we take \(\alpha = 0.05\) and \(\beta = 0.05\). Then the bounds would be calculated as \(R_L = \dfrac{1}{19}\) and \(R_U = 19\). We would reject \(H_0\) in favor of \(H_1\), and claim success, as soon as R gets small enough, \(R = (0.5)^k (1.33)^{n-k} \leq \dfrac{1}{19}\). On the other hand, we would stop the trial and accept \(H_0\) and reject \(H_1\), and claim failure, as soon as \(R \geq 19\).

The statistical formulation for the SPRT is relatively straightforward, but it is more commonly used in a quality control setting than in clinical trials. The obvious criticism is that each patient’s outcome must be observed quickly before you recruit the next patient. The SPRT also has the statistical property that it has a positive probability of never reaching the boundaries \(R_L\) and \(R_U\). If this is the case after the target sample size, N, is reached, then the trial is inconclusive.