ISCTM 2011 Autumn Meeting – Meta-analysis: Methods and Applications to Policy Session Summary

Meta-analysis: Methods and Applications to Policy Frequentist Methods of Meta-analysis
Dr. Michael Borenstein, Biostat, Inc.

Dr. Borenstein gave the foundational presentation of the session. He began by introducing the concept of meta-analysis, a statistical approach to summarize and synthesize multiple studies in a single figure and numerical estimate. In meta-analysis, the effect size, often a risk ratio associated with one hypothesis of interest, is presented for each study and a pooled estimate over all studies is computed. These effect sizes are presented as single values and the related 95% confidence interval estimates. The relative weight of each study’s data in the pooled estimate is dependent on how similar the studies are to each other by design.  

The benefits of meta-analysis are that it provides context, statistical power, and the ability to examine subgroups. Meta-analysis generates a visual summary of the literature with regards to a particular outcome, allowing the reader to easily identify the consistency or inconsistency of the scientific finding. As meta-analysis incorporates data from multiple studies there is greater power to examine the question of interest than in any one study. This visual presentation and related modeling may help identify important subgroups where the primary hypothesis holds or does not hold. These benefits were illustrated using examples from the literature on the impact of naltrexone in opioid use, impact of statin dose on death and myocardial infarction, impact of rosiglitazone on death and myocardial infarction, efficacy and tolerability of first versus second generation antipsychotics, and psychological approaches to treating specific phobias. 

The next portion of the talk illustrated some of the mechanics of meta-analysis including heterogeneous versus homogeneous studies, weighting schemes, and data types. An important decision prior to beginning a meta-analysis is whether the outcome of interest can be considered a fixed or random effect as this will determine how each study will be weighed in the overall estimate. If one can reasonably assume that each included study is modelally identical, such as when a drug company has conducted each of these studies using the same drug under similar protocols in a single patient population, then the fixed effect model is appropriate. If this assumption is not tenable, then a random effects model should be used. 

In concluding, Dr. Borenstein presented and addressed some of the published criticisms of meta-analysis – that it is not appropriate to summarize results in a single number, that meta-analysis compares apples and oranges, and that meta-analysis is often performed poorly. 

Bayesian Methods and Meta-Analysis
Dr. John Seaman, Baylor University

Dr. Seaman provided an overview of the foundation of Bayesian statistics and highlighted how the Bayesian approach to inference has advantages in meta-analysis. The Bayesian principles were illustrated using non-restorative sleep (NRS) examples derived from published biomedical literature. Bayes’ Theorem is the cornerstone of Bayesian inference; it can be simply expressed as the idea that the best model to estimate something is proportional to your prior knowledge about that attribute (prior model) multiplied by what you observe in a particular study (data model). This best model incorporating both sets of information is referred to as the posterior model and allows one to make probabilistic statements about the variable you wish to estimate. An example of such a statement would be, “There is a 0.954 probability that the true rate of male patients experiencing NRS on benzodiazepine-like (BL) drugs exceeds 50%.” To get to the best model/posterior models, assumptions have to be made about the prior and likelihood models. 

The prior model represents all the information that is known about an attribute (for example, rate of NRS in male patients administered BL drugs) prior to executing a new trial. This model may be determined on the basis of previous studies, opinions of subject matter experts, or both. The shape of this curve will be narrower if there is greater certainty about the prior information, or wider (flatter) if there is less certainty. The likelihood model is determined by the characteristics of the observed data in the current trial. Multiplying the prior and likelihood model, and a little algebra/integration, leads to the posterior model for the rate of males experiencing NRS on BL drugs. This model allows one to make statements about the true rate of NRS in male patients administered BL drugs. For example, “The probability is 0.45 that the true rate of NRS in male patients administered BL drugs is between 50% and 75%.” 

Mathematical adjustments to the posterior model lead to the posterior predictive model. The posterior predictive model allows one to make statements about the rate of NRS that will be observed in the next sample of male patients administered BL drugs. For example, “There is a 0.189 probability that the rate of NRS in the next sample of male patients on BL drugs will be 75%.” 

Bayesian inference is designed to easily adapt to the evolution of knowledge that is central to scientific research. Today’s knowledge, captured as the posterior model, can serve as the prior model for the next clinical trial. This accumulation of prior knowledge to update the results in a current trial achieves the goals of meta-analysis. As Stangl and Berry said, “The Bayesian paradigm is synonymous with meta-analysis. For both, the goal is to incorporate all information to predict with as much accuracy as possible some future event, and the uncertainty associated with it, and to present this prediction in a manner that leads to coherent decisions.”  

Dr. Seaman then presented an example of Bayesian meta-analysis, extending the example of estimating rates of NRS in benzodiazepine-like drugs. In this example from (Parmigiani, 2002), there were multiple studies that determined the relative risk (RR) of NRS in BL drugs versus placebo and 13 met the inclusion criteria for meta-analysis.  All studies were published, so sample sizes, observed rates, and RR for each study were available for analysis. For each study, a posterior model was built for the RR; from this model the posterior predictive model for RR of NRS associated with BL drugs and placebo was derived. The posterior model was used to build 98% posterior probability interval estimates of RR of NRS for each study. These are interpreted for a single study as follows: “the probability is 98% that the RR of NRS with BL drugs versus placebo in Study 1 is between 0.6 and 1.1.” Heirarchical Bayesian models combined these individual posterior models, incorporating individual study variability, into a common overall posterior model of the true rate RR of NRS with BL versus placebo. This common overall posterior model was queried to determine the probability that the true RR of BL drugs versus placebo was >1 which would indicate a treatment effect. In this example, the probability that the true RR of BL drugs versus placebo being >1 was 0.01, so the meta-analysis concluded that it was unlikely that BL drugs caused NRS. The overall posterior predictive model of RR for a new trial showed that the probability of the observed RR in that trial exceeding 1 was 0.14. 

Finally, advantages and challenges of the Bayesian approach were summarized. In Bayesian analyses (including meta-analyses), the conclusions are probability statements about the attribute of interest which are easy to understand. The modeling approach precludes having to make an assumption about fixed or random effects and allows one to account for unobserved aspects of variability across published studies. Bayesian meta-analysis does not require the assumption of specific data models (e.g. normality) and allows one to predict outcomes of future trials facilitating sample size estimation and prior specification for future analyses. However, prior model selection and elicitation can be challenging, sensitivity analyses are essential as conclusions may vary according to prior model assumptions, there is no significance cut-off of 0.05, and the methods are less familiar to the consumer. A number of technical references including recommended reading on Bayesian statistics for non-statisticians were also presented. 

Proof in Medicine: The Role of Research Synthesis
Dr. Joel Greenhouse, Carnegie Mellon University

Dr. Greenhouse presented the use of cross-design synthesis methods via work funded by NIMH and AHRQ to assess association between suicidality and antidepressant use in children and adolescents. The goal of medical research is to determine which interventions are most effective (or harmful) for which patients under what circumstances. To reach this goal, it is essential to generate evidence about interventions, both benefits and harms, and to assess and weigh evidence about interventions. The complete evidence about a treatment or intervention typically comes from multiple studies of varied design, from explanatory (randomized placebo-controlled trials) to observational (epidemiological or claims database studies). Ultimately judgments about the evidence must be made, with respect to the reliability and generalizability of each component as well as the overall conclusions regarding a potential association. 

Cross-design synthesis refers to the use of evidence from multiple data sources, experimental and observational, to address similar questions of interest. The FDA meta-analysis of 24 pediatric trials (23 industry-sponsored and 1 NIMH-sponsored) was reviewed and limitations noted. In this application of cross-design synthesis, a new meta-analysis was performed using a subset of the FDA trial database, the sixteen randomized placebo-controlled clinical trials of patients with major depressive disorder, using a different outcome measure, definitive suicidal behavior. Both frequentist and Bayesian methods of meta-analysis were applied to these studies. The 95% CI for RR of definitive suicidal behavior using frequentist meta-analysis was (1.01, 3.96) while the 95% credible interval from the Bayesian meta-analysis was (0.96, 4.12). The posterior probability of the RR exceeding 1 from the Bayesian meta-analysis was 0.96. To increase the generalizability of the evidence set, an analysis of the LifeLink Health Plans Claims database was performed. This database represents over 95 managed care plans in the United States and over 42 million covered lives. The cohort for analysis was restricted to members aged 5-17 between January 1, 1999 and June 30, 2008 with a depression claim based on ICD 9 codes. The index episode was identified as the first claim coded for depression in the time interval for which there was not a previous antidepressant claim in the past 3 months or depression diagnosis in the past 6 months. Data for 12 months prior to the index episode and 6 months following was examined. There were 52, 293 members who met these criteria (LifeLink cohort) and were eligible for analysis. Members in the cohort were classified as having been treated with SSRI, No Psychotherapy (N = 6,872), Other Antidepressant (N = 17,608), No Treatment (N = 6,677), SSRI + Psychotherapy (N = 10,949), or Psychotherapy, No Antidepressant (N = 10,187) according to the presence of relevant claims in the pre-index period. The outcome of definitive suicidal behavior was defined as having a CDC e-code for self-inflicted injury during the 6 month period post index episode. 

The LifeLink cohort was split into two groups, one that was more reflective of the RCTs examined in the FDA database, Restricted Cohort, and those that were not. To be included in the Restricted Cohort, members could not meet any of the following criteria – high risk for suicidal behavior, current schizophrenia diagnosis, current or lifetime history of drug or alcohol dependence, current bipolar I or II diagnosis, currently pregnant or sexually active/no acceptable contraceptive use, or attempted suicide within 7 days of index episode. The Restricted Cohort was comprised of 39,396. The RR of definitive suicidal behavior was computed for the Restricted Cohort, incorporating propensity scoring, age, gender, geographical region, and Medicaid status to adjust for potential confounding differences across treatment groups. Using no treatment as a reference group, the adjusted RR(95% CI) of SSRIs without psychotherapy was 2.3 (1.1–5.1) and the adjusted RR (95% CI) of any antidepressant in the Restricted Cohort was 4.5 (2.3–8.9). Raw RR was also computed for the LifeLink Cohort. The observational RR from the Restricted Cohort was graphed with the meta-analytic results from RCTs with RR of the whole LifeLink Cohort illustrated as a reference line, allowing a grouped snapshot of clinical trial and observational study findings relevant to the research question. Dr. Greenhouse summarized the talk by pointing out the benefits of this method for synthesizing various sources of medical research and generating new hypothesis. He highlighted that cross-design synthesis is inherently interdisciplinary as medical expertise is essential to proper design and conclusions and there is need for sensitivity analyses to understand the dependencies of conclusions upon underlying definitions, judgments and model assumptions. 

Impact of Meta-Analysis for Policy Decisions about Reimbursement-Payer Perspective
Dr. Rhonda Robinson-Beale, Optum Health

Dr. Robinson-Beale presented the current procedures Optum Behavioral Health uses to synthesize the multiple types of published evidence for in their regular review of non-pharmacologic interventions (technologies) for behavioral health to determine plan coverage. They have an established hierarchy of evidence that is used in  scoring relevant publications according to a scientific merit rating scale which grades the study’s research design, measurement of the dependent and variables, participant ascertainment, and generalizability. Each technology is classified as Unproven, Emerging, or Proven based on the magnitude, quality, and consistency of published evidence related to the technology. Inherent in this classification is an assessment of the ability of the technology to be implemented with fidelity. This is especially important in psychotherapeutic interventions. Fidelity is assessed according to the rigor of training, licensing, and re-training applied. Dr. Robinson-Beale concluded with recommendations to researchers as they are preparing non-pharmacologic behavioral interventions for technology assessment – clearly delineate the population who will use the technology, include naturalistic population-based studies, analyze ethnic and racial subpopulations, perform comparative effectiveness analyses with standard established treatments, include cost of implementation, and provide detail on how fidelity was established where appropriate. She encouraged those who fund research of these treatments to establish requirements for integration and standardization of research protocols, including a central database for all studies and standard research parameters, procedures, and methods of assessment. She also recommended that the central database be structured to allow meaningful integration of datasets and the establishment of registries and this database be accessible to payers to promote low cost and informative descriptive analysis and matched controlled comparisons.