First, let's review the Bayesian approach in general and then apply it to our current topic of likelihood methods.
The Bayesian approach to statistical design and inference is very different from the classical approach (the frequentist approach).
Before a trial begins, a Bayesian statistician summarizes the current knowledge or belief about the treatment effect, say we call it \(\theta\), in the form of a probability distribution. This is known as the prior distribution for \(\theta\). These assumptions are made prior to conducting the study and collecting any data.
Next, the data from the trial are observed, say we call it X, and the likelihood function of X given \(\theta\) is constructed. Finally, the posterior distribution for \(\theta\) given X is constructed. In essence, the prior distribution for \(\theta\) is revised into the posterior distribution based on the data X. The data collection in the study informs or revises the earlier assumptions.
The following schematic describes this Bayesian approach:
The development of the posterior distribution may be very difficult mathematically and it may be necessary to approximate it through computer algorithms.
The Bayesian statistician performs all inference for the treatment effect by formulating probability statements based on the posterior distribution. This is a very different approach and is not always accepted by the more traditional frequentist oriented statisticians.
In the Bayesian approach, \(\theta\) is regarded as a random variable, about which probability statements can be made. This is the appealing aspect of the Bayesian approach. In contrast, the frequentist approach regards \(\theta\) as a fixed but unknown quantity (called a parameter) that can be estimated from the data.
As an example of the contrasting philosophies, consider the frequentist description and the Bayesian description of a 95% confidence interval for \(\theta\).
Frequentist: "If a very large number of samples, each with the same sample size as the original sample, were taken from the same population as the original sample, and a 95% confidence interval constructed for each sample, then 95% of those confidence intervals would contain the true value of \(\theta\)." This is an extremely awkward and dissatisfying definition but technically represents the frequentist's approach.
Bayesian: "The 95% confidence interval defines a region that covers 95% of the possible values of \(\theta\)." This is much more simple and straightforward. (As a matter of fact, most people when they first take a statistics course believe that this is the definition of a confidence interval.)
In a Bayesian analysis, if \(\theta\) is a parameter of interest, the analysis results in a probability distribution for \(\theta\). Using the probability distribution, many statements can be made. For example, if \(\theta\) represents a probability of success for a treatment, a statement can be made about the probability that \(\theta > 0.90\) (or any other value).