Lesson 16: Overviews and Meta-analysis

Overview

An overview (also called a systematic review) attempts to summarize the scientific evidence related to treatment, causation, diagnosis, or prognosis of a specific disease. An overview does not generate any new data - it reviews and summarizes already-existing studies.

Overviews, which are relied upon by many physicians, are important because there usually exist multiple studies that have addressed a specific research question. Yet these types of studies may differ with respect to:

Design
Patient population
Quality
Results

Although it appears that conducting an overview is easy, it requires a good deal of effort and care to do it well. For example, determining inclusion and exclusion criteria for studies is a major challenge for researchers when putting together a useful overview.

What does this process involve? There are six basic steps to an overview:

Define a focused clinical question
Conduct a thorough literature search
Apply inclusion/exclusion criteria to the identified studies
Abstract/summarize the data from the eligible studies
Perform statistical analysis (meta-analysis), if appropriate
Disseminate the results

Objectives

Upon completion of this lesson, you should be able to:

Describe the processes for conducting a systematic overview.
Describe how publication bias can affect the results of a systematic review.
Recognize patterns in a ‘funnel plot’ that would indicate potential publication bias.
Evaluate the quality of a clinical report with the Jaded scale.
Recognize the appropriate use of a fixed effects model vs. a random effects model for a meta-analysis. State how the weights differ between the fixed and random approaches.
Describe the rationale for a test of heterogeneity among the studies used in a meta-analysis.
Describe methods for performing a sensitivity analysis of the meta-analysis.

References:

Piantadosi Steven. (2005) Reporting and Authorship. Meta-Analyses. In: Piantadosi Steven. Clinical Trials: A Methodologic Perspective. 2nd ed. Hobaken, NJ: John Wiley and Sons, Inc.

16.1 - 1. Define a Focused Question

Example

Does garlic reduce serum cholesterol levels?
Is montelukast (Singulair®) as effective as an inhaled steroid in treating asthma?
Is induced abortion a risk factor for breast cancer?

If the question is too broad, it may not be useful when applied to a particular patient. For example, whether chemotherapy is effective in cancer is too broad a question (the number of studies addressing this question could exceed 10,000).

If the question is too narrow, there may not be enough evidence to answer the question. For example, the following question is too narrow: Is a particular asthma therapy effective in Caucasian females over the age of 65 years in Central Pennsylvania?

16.2 - 2. Conduct a Thorough Literature Search

Many sources for studies (throughout the world) should be explored:

Bibliographic databases (Medline, Embase, etc.)
Publicly available clinical trials databases such as clinicaltrials.gov, etc.
Databanks of pharmaceutical firms (e.g. clinical trial results)
Conference proceedings
Theses/dissertations
Personal contacts
Unpublished reports

As discussed earlier in this course, beware of publication bias. Studies in which the intervention is not found to be effective, or as effective as other treatments, may not be submitted for publication. (This is referred to as the "file-drawer problem".) Studies, with 'significant results' are more likely to make it into a journal. Recent initiatives in online journals, such as PLoS Medicine, and databases of trial results may encourage increased publication of results from scientifically valid studies, regardless of outcome. Even so, in an imperfect world, realize it is possible for an overview based only on published studies to have a bias towards an overall positive effect.

Construction of a "funnel plot" is one method for evaluating whether or not publication bias has occurred.

Suppose there are some relevant studies with small sample sizes. If nearly all of them have a positive finding (p < 0.05), then this may provide evidence of a "publication bias" because of the following reason. It is more difficult to show positive results with small sample sizes. Thus, there should be some negative results (p > 0.05) among the small studies.

A "funnel plot" can be constructed to investigate the latter issue. Plot sample size (vertical axis) versus p-value or magnitude of effect (horizontal axis).

Notice that the p-values for some of the small studies are relatively large, yielding a "funnel" shape for the scatterplot.

Notice that none of the p-values for the small studies are large, yielding a "band" shape for the scatterplot and the suspicion of publication bias. This is evidence to suggest that there does exist a degree of 'publication bias'.

16.3 - 3. Apply Inclusion/Exclusion Criteria

Eligibility criteria for studies need to be established prior to the analysis.

The researcher should base the inclusion/exclusion criteria on the design aspects of the trials, the patient populations, treatment modalities, etc. that are congruent with the objectives of the overview. Looking across a variety of studies this process can get quite complicated.

Although subjective, some researchers grade the selected studies according to quality and may weight the studies accordingly in the analysis. One such example of a quality rating of randomized trials is the Jadad scale (Jadad et al. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials , 1996; 17: 1-12). Here are the questions that are asked as a part of this scale along with the scores that are associated with these answers:

Is the study described as randomized?

No, or yes but inappropriate method (0 points)
Yes but no discussion of method (1 point)
Yes and appropriate method (2 points)

Is the study described as double blind?

No, or yes but inappropriate method (0 points)
Yes but no discussion of method (1 point)
Yes and appropriate method (2 points)

Is there a description of withdrawals/dropouts?

No (0 points)
Yes (1 point)

16.4 - 4. Abstract / Summarize the Data

In most circumstances, the researcher easily can gather the relevant descriptive statistics (e.g., means, standard errors, sample sizes) from the reports on the eligible studies.

Sometimes, older reports (say, prior to 1980) do not include variability estimates (e.g., standard errors). If possible, the researcher should attempt to contact the authors directly in such situations. This may not be successful, however, because the authors may no longer have the data.

Ideally, the statistical analysis for a systematic review will be based on the raw data from each eligible study. This has rarely occurred. Either the raw data were no longer available or the authors were unwilling to share the raw data. However, the success of shared data in the Human Genome Project has given impetus to increased data sharing to promote rapid scientific progress. Since the US NIH now requires investigators receiving large new NIH grants to have a plan for data-sharing.( NIH Data Sharing Policy Guide ) and has provided more guidance on how federal data are to be shared. we may anticipate more meta-analyses based on raw data.

As we have discussed earlier, problems to be solved before private entities embrace data sharing include proprietary rights, authorship, patient consent and confidentiality, common technology, proper use, enforcement of policy, etc. As these challenges are overcome, the path to a systematic review and meta-analysis based on raw data will be smoother.

16.5 - 5. Meta-analysis

The obvious advantage for performing a meta-analysis is that a large amount of data, pooled across multiple studies, can provide increased precision in addressing the research question. The disadvantage of a meta-analysis is that the studies can be very heterogeneous in their designs, quality, and patient populations and, therefore, it may not be valid to pool them. This issue is something that needs to be evaluated very critically.

Researchers invoke two basic statistical models for meta-analysis, namely, fixed-effects models and random-effects models.

A fixed-effects model is more straightforward to apply, but its underlying assumptions are somewhat restrictive. It assumes that if all the involved studies had tremendously large sample sizes, then they all would yield the same result. In essence, a fixed-effects model assumes that there is no inter-study variability (study heterogeneity). This statistical model accounts only for intra-study variability.

A random-effects model, however, assumes that the eligible studies actually represent a random sample from a population of studies that address the research question. It accounts for intra-study and inter-study variability. Thus, a random-effects model tends to yield a more conservative result, i.e., wider confidence intervals and less statistical significance than a fixed-effects model.

A random-effects model is more appealing from a theoretical perspective, but it may not be necessary if there is very low study heterogeneity. A formal test of study heterogeneity is available. Its results, however, should not determine whether to apply a fixed-effects model or random-effects model. You need to use your own judgment as to which model should be applied.

The test for study heterogeneity is very powerful and sensitive when the number of studies is large. It is very weak and insensitive if the number of studies is small. Graphical displays provide much better information as to the nature of study heterogeneity. Some medical journals require that the authors provide the test of heterogeneity, along with a fixed-effects analysis and a random-effects analysis.

16.6 - The Fixed-Effects Model Approach

The basic step for a fixed-effects model involves the calculation of a weighted average of the treatment effect across all of the eligible studies.

For a continuous outcome variable, the measured effect is expressed as the difference between sample treatment and control means. The weight is expressed as the inverse of the variance of the difference between the sample means. Therefore, if the variance is large the study will be given a lower weight. If the variance is smaller, the weight of the study is larger.

For a binary outcome variable, the measured effect usually is expressed as the logarithm of the estimated odds ratio. The weight is expressed as the inverse of the variance of the logarithm of the estimated odds ratio. Basically, the weighting takes the same approach using this value.

Suppose that there are K studies.

The estimated treatment effect (e.g., difference between the sample treatment and control means) in the \(k^{th}\) study, \(k = 1, 2, \dots , K\), is \(Y_{k}\) .

The estimated variance of \(Y_k\) in the \(k^{th}\) study is \(S_k^2\).

The weight for the estimated treatment effect in the \(k^{th}\) study is \(w_k= \dfrac{1}{S_k^2}\).

The overall weighted treatment effect is:

\(Y=\dfrac{\left(\sum_{k=1}^{K}w_kY_k \right)}{ \left(\sum_{k=1}^{K}w_k \right)} \)

The estimated variance of Y, the weighted treatment effect, is:

\(S^2 = \dfrac{1}{\left(\sum_{k=1}^{K}w_k \right) }\)

Testing the null hypothesis of no treatment effect (e.g., \(H_0 \colon \mu_1 - \mu_0 = 0\) for a continuous outcome) is performed from assuming that \(\dfrac{|Y|}{S}\) asymptotically follows a standard normal distribution.

The \(100(1 - \alpha )\%\) confidence interval for the overall weighted treatment effect is:

\(Y - \left(z_1 - \dfrac{\alpha}{2} \times S\right), Y + \left(z_1 - \dfrac{\alpha}{2} \times S\right)\)

The statistic for testing \(H_0 \colon\) {study homogeneity} is

\(Q=\sum_{k=1}^{K}w_k(Y_k-Y)^2 \)

Q has an asymptotic \(\chi^{2}\) distribution with K - 1 degrees of freedom.

Alternative: Mantel-Haenszel Test

An alternative, fixed-effects approach for a binary outcome is to apply the Mantel-Haenszel test for the pooled odds ratio. The Mantel-Haenszel test for the pooled odds ratio assumes that the odds ratio is equal across all studies. For the \(k^{th}\) study, \(k = 1, 2, \dots , K\), a 2 × 2 table is constructed:

	Control	Treatment
Failure	\(n_{0k} - r_{0k}\)	\(n_{1k} - r_{1k}\)
Success	\(r_{0k}\)	\(r_{1k}\)

The disadvantage of the Mantel-Haenszel approach, however, is that it cannot adjust for covariates/regressors. Many researchers now use logistic regression analysis to estimate the odds ratio from a study while adjusting for covariates/regressors, so the weighted approach described previously is more applicable.

16.7 - Example

Example

Consider the following example for the difference in sample means between an inhaled steroid and montelukast in asthmatic children. The outcome variable is FEV 1 (L) from four clinical trials. Note that only the first study yields a statistically significant result (p-value < 0.05).

k	\(Y_k\)	\(S_k\)	\(w_k\)	p-value
1	0.070	0.032	977	0.028
2	0.043	0.049	416	0.375
3	0.058	0.052	370	0.260
4	0.075	0.041	595	0.067

The overall treatment effect is Y =

\( \dfrac{(0.070\times977)+(0.043\times416)+(0.058\times370)+(0.075\times595)}{977+416+370+595} \)

Overall effect \(Y = \dfrac{152.363}{2358} = 0.065\)

Overall estimated variance \(S^2 = \dfrac{1}{2358} = 0.0004 \left(S = 0.0206\right)\)

\(\dfrac{|Y|}{S} = \dfrac{|0.065|}{0.0206} = 3.155\), and the p-value = 0.002

The magnitude of the effect, or the 95% confidence interval is [0.025, 0.105]

It appears that there is a degree of homogeniety between these studies...

The statistic for testing homogeneity is Q = 0.303 which does not exceed 7.81, the 95^th percentile from the \(\chi_3^2 \) distribution. Therefore, we have further evidence that the studies are homogenous, although the small number of studies involved in this overview does not give this result very much power.

Based on the evidence presented above, we can conclude that the inhaled steroid is significantly better than montelukast in improving lung function in children with asthma.

The weighted analysis for the fixed-effects approach described previously corresponds to the following linear model for the \(k^{th}\) study, \(k = 1, 2, \dots , K\):

\(Y_k = \theta + e_k\)

where \(Y_k\) is the observed effect in the \(k^{th}\) study, \(\theta\) is the pooled population parameter of interest (difference in population treatment means, natural logarithm of the population odds ratio, etc.) and \(e_k\) is the random error term for the \(k^{th}\) study.

It is assumed that \(e_1 , e_2 , \dots , e_k\) are independent random variables with \(e_k\) having a N (0 , \(\sigma_k^2\)) distribution. The variance term \(\sigma_k^2\) then reflects intra-study variability and its estimate is \(S_k^2\). Usually, \(Y_k\) and \(S_k\) are provided as descriptive statistics in the \(k^{th}\) study report.

16.8 - Random Effects / Sensitivity Analysis

A corresponding linear model for the random-effects approach is as follows:

\(Y_k = \theta + t_k + e_k\)

where \(Y_k , \theta\), and \(e_k\) are the same as described above and \(t_k\) is a random effect for the \(k^{th}\) study.

It is assumed that \(t_1 , t_2 , \dots , t_k\) are independent and identically distributed as \(N \left(0 , \omega^{2} \right)\) random variables. The variance term \(\omega^2\) reflects inter-study variability.

Whereas \(\text{Var}\left(Y_k\right) = \sigma_k^2\) is the variance associated with the fixed-effects linear model,

\(\text{Var}\left(Y_k\right) = \sigma_k^2+\omega^2\) is the variance associated with the random-effects linear model.

A weighted analysis will be applied, analogous to the weighted analysis for the fixed-effects linear model, but the weights are different. The overall weighted treatment effect is:

\(Y=\left( \sum_{k=1}^{K}w_{k}^{*}Y_k \right) / \left( \sum_{k=1}^{K}w_{k}^{*} \right) \)

where

\(w_{k}^{*}=1/ (S_{k}^{2}+\hat{\omega}^2)\)

\(\hat{\omega}^2 = max(0, Q-K+1) \left( \sum_{k=1}^{K}w_{k} \right) / \left( \left( \sum_{k=1}^{K}w_{k} \right)^2 - \sum_{k=1}^{K}w_{k}^2 \right) \)

and where Q is the heterogeneity statistic and \(w_k\) is the weight for the \(k^{th}\) study, which were defined previously for the weighted analysis in the fixed-effects linear model.

Analogous to the fixed-effects linear model, the variance of Y in the random-effects linear model is:

\(S^2= \dfrac{1}{\left(\sum_{k=1}^{K}w_k^* \right)}\)

and statistical inference is performed in a similar manner.

If there exists a large amount of study heterogeneity, then \(\hat{\omega}^2\) will be very large and will dominate in the expression for the weight in the \(k^{th}\) study, i.e.,

\(w_k^*= \dfrac{1}{\left(S_k^2 + \hat{\omega}^2\right)} \approx \dfrac{1}{\hat{\omega}^2} \)

Therefore, in such a situation with a relatively large amount of heterogeneity, the weight for each study will approximately be the same and the weighted analysis for the random-effects linear model will approximate an unweighted analysis.

SAS program P21.1 in this module illustrates the fixed and random effects approaches to the asthma meta-analysis. Try it!

Graphs of the Treatment Differences

Graphical displays showing the estimated treatment difference and its confidence interval for every study are very useful for evaluating treatment effects over time or with respect to other factors. An example is given below:

Sensitivity Analyses

Statistical diagnostics (sensitivity analyses) should be performed to investigate the validity and robustness of the meta-analysis via applying the meta-analytic approach to subsets of the K studies, and/or applying the leave-one-out method.

The steps for the leave-one-out method are as follows:

Remove the first of the K studies and conduct the meta-analysis on the remaining K - 1 studies
Remove the second of the K studies and conduct the meta-analysis on the remaining K - 1 studies
Continue this process until there are K distinct meta-analyses (each with K - 1 studies)

If the results of the K meta-analyses in the leave-one-out method are consistent, then there is confidence that the overall meta-analysis is robust. The likelihood of consistency increases as K increases. The idea here is that removing one of the studies from the meta-analysis should not affect the overall results. If this does occur, this suggests that there exists a lack of homogeniety among the studies involved.

Rather than invoke the leave-one-out method, researchers often prefer to perform a sensitivity analysis by applying the meta-analysis to subsets of studies based on high-quality versus low-quality studies, randomized versus non-randomized studies, early studies versus late studies, etc.

16.9 - 6. Disseminate the Results

Many medical journals have guidelines on the process for publishing a systematic review/meta-analysis. Similar to CONSORT guidelines we have seen earlier as related to reporting on clinical trials, there is a PRISMA checklist for standardized reporting of meta-analyses. Check it out!

The Cochrane Collaboration has been influential in improving the methodology for systematic reviews. Cochrane Reviews are based on the best available information about healthcare interventions. They explore the evidence for and against the effectiveness and appropriateness of treatments (medications, surgery, education, etc) in specific circumstances. This is an excellent resource to explore.

How to Evaluate an Overview/Meta-analysis

Below are questions that we may ask as we evaluate the value of a particular meta-analysis. You will have the opportunity to evaluate a meta-analysis in the homework exercise.

Did the overview explicitly address a sensible clinical question?
Was the search for relevant studies detailed and exhaustive? Were the inclusion/exclusion criteria for studies developed and applied appropriately?
Were the studies of high methodologic quality?
Were the results consistent across studies?
What are the results and how precise were they?
How can the results be applied to patient care?

16.10 - Summary

In this lesson, among other things, we learned how to:

describe the processes for conducting a systematic overview,
describe how publication bias can affect the results of a systematic review,
recognize patterns in a ‘funnel plot’ that would indicate potential publication bias,
evaluate the quality of a clinical report with the Jaded scale,
recognize the appropriate use of a fixed effects model vs. a random effects model for a meta-analysis, stating how the weights differ between the fixed and random approaches,
describe the rationale for a test of heterogeneity among the studies used in a meta-analysis, and
describe methods for performing a sensitivity analysis of the meta-analysis.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility