isctm « ISCTM | The International Society for CNS Clinical Trials and Methodology

Author Archive

ISCTM 2012 Scientific Meeting – Metric-Based Measurement to Assess Cognitive Outcome in AD Trials Summary

Wednesday, June 27th, 2012

22 Feb 2012

The Working Group titled “Metric-Based Measurement to Assess Cognitive Outcome in AD Trials” (Working Group 7) convened recently at the 8th Annual ISCTM meeting in Washington, DC to continue work towards defining a new cognitive endpoint for Alzheimer’s disease clinical trials as an alternative to the ADAS-Cog.

The overall goals of the Working Group are to:

1.Develop a cognitive algorithm for use in MCI trials that is both based on the ADAS-Cog and expands beyond it, by adding more difficult and relevant cognitive questions from pre-existing assessments

2.Expand the difficulty of cognitive measures for prodromal and mild AD clinical trials

3.Develop tests that are relative to the condition of interest and that are able to adequately discriminate between different cognitive abilities in people enrolled in MCI clinical trials

4.Apply the approach and methods for cognitive assessment to failed trials to determine if a signal can be detected or proof of concept confirmed

The ADAS-Cog is the most commonly used scale in Alzheimer’s disease (AD); it was developed in the early 1980’s as a quantitative conceptualization of the cognitive domains affected in AD specifically for use in clinical trials in people with this disease. The scale was validated, according to the standards that existed at the time, in a single, small study that included only 27 patients with AD and 28 age-matched elderly controls, and was designed with questions appropriate for a more severely affected population of patients than that for which it is currently most often used. Despite these limitations on construct validity, the ADAS-Cog has been the primary cognitive measure for all approved drugs for mild to moderate Alzheimer’s disease and is generally included in trials for new Alzheimer’s drugs. The most commonly used version of the ADAS-Cog includes 11 items assessing four cognitive domains (memory – 3; language – 5; praxis – 2; orientation – 1). Scores range from 0 to 70, with higher scores indicating more cognitive impairment. The ADAS-Cog items were thought to be structured from less difficult to more difficult items, but item analysis has shown this to not be true for all of the subscales. Nonetheless, the ADAS-Cog has been, and continues to be, used in clinical trials across the cognitive continuum of people affected by Alzheimer’s disease. And, given experience and the issues outlined above, among others, a major question still looms regarding the sensitivity of the ADAS-Cog to detect improvement or decrement in cognition in people with mild cognitive impairment (MCI) or early Alzheimer’s disease, since the scale does not represent the cognitive profile in these populations. Can this issue be remediated by including additional questions or new subscales that better reflect the cognition of the subjects in current trials?

Determining an answer to these questions comes in part from analyzing the scores from ADAS-Cog data collected in people with mild to moderate AD (M2M AD) and mild cognitive impairment (MCI). Data from the AD neuroimaging initiative (ADNI), an observational cohort, was reviewed. In people with M2M AD, ADAS-Cog scores ranged from 6 to 57 and MCI scores ranged from 0 to 37. Thus, in MCI clinical trials only one-half of the scale is being utilized; easier questions are not informative and the ceiling effects seen on the harder items reflect this observation.

Given this above preliminary data and knowledge of the other cognitive assessment scales concurrently tested within ADNI, the Working Group has concluded that it may be possible to eliminate items with ceiling effects from the ADAS-cog and add in items from other scales to expand the range cognitive range to generate a tool more sensitive to milder cognitive decline. By using a Guttman analysis to rate the hierarchy of each item a rationale for eliminating too easy items is generated; if the item can’t be rated, then it is not useful. Using a Guttman analysis, it is apparent that performance-based items included in the ADAS-Cog are more relevant that impression- or interpretation-based items. Further, a Rasch analysis of the ADAS-Cog, which generates category probabilities for each item as a measure of scoring function, can identify potential gaps in the domains covered by the ADAS-Cog and provide explicit guidance on how to fill the gaps. For example, according to Rasch analyses, adding delayed word recall and number cancellation to the ADAS-Cog in an effort to boost sensitivity for detecting cognitive change in early populations (i.e., ADAS-Cog 13) does not fill the item gaps for utilization of this scale in MCI clinical trials.

Although the ADAS-Cog is the current ‘gold standard’ for measuring cognition in Alzheimer’s disease, it is not required for registration, which could suggest other endpoints for primary analysis of cognition. Regulatory bodies (FDA and EMA) do not mandate which outcome measures must be used in registration trials, but optimization and use of a new cognitive endpoint will require FDA and EMA agreement. In order to smooth the path towards acceptance of a novel cognitive endpoint by health authorities it will be of key significance to have these regulatory agencies intimately involved. Of importance is the fact that cognitive endpoints are not PROs (Patient Reported Outcomes) and as such a cognitive endpoint based on the ADAS-Cog, which is validated in Alzheimer’s disease, should seen differently from a regulatory perspective; there is a significant amount of neuropsychological data to support use of the individual ADAS-Cog items and other items in the other cognitive assessments in ADNI and elsewhere. The neuropsychological data tells us that item sensitivity for the ADAS-Cog is a product of the cognitive ability of the patient. Developing a single measure to span the spectrum of the diagnosis from MCI to moderate AD is unlikely.

The Working Group’s approach to this is generation of an item bank with stage-specific items that can be selected based on the disease severity of the population being tested such that they may provide a reasonable means of avoiding ceiling and floor effects commonly observed with the ADAS-cog. Key to this approach is the understanding that the item bank depends heavily on the baseline disease severity and rate of deterioration over time; thus, each item may require additional sub-tests that can measure and detect change as the patient progresses.

Many questions remain unanswered and the Working Group has agreed to continue its review work. Item analysis is progressing through work funded by the Foundation for the NIH, which funds ADNI. Ultimately, the working group’s main goal is to review and give advice and feedback on a cognitive item bank that allows for both selection of items relative to the baseline severity and adequate sensitivity to detect change over time. One potential outcome of this work is that it may lead to re-analysis of promising new compounds that have been shelved due to lack of efficacy when measured with the standard ADAS-Cog.

2012 Autumn: Options and Methods to Improve Cognitive Assessment in Clinical Trials of AD and Its Precursors Workshop

Tuesday, June 5th, 2012

Co-chairs: Holly Posner, MD; Phil Harvey, PhD

Objectives:

Understand the challenges in assessment of cognitive enhancement studies in AD and its precursors. Critically evaluate whether sophisticated statistical techniques are adequate to overcome psychometric challenges in existing instruments. Propose alternatives in the event that it appears as if current data sets cannot be salvaged. Evaluate new approaches to assessing outcome using new performance-based strategies. Participants will include Terry Goldberg, PhD, (LIJ Hofstra School of Medicine), Dan Mungas, PhD, (UCI Davis), Yaakov Stern, PhD, (Columbia University), Veronica Logovinsky (Eisai), and Allitia DiBernardo, JnJ.

Background: The Alzheimer’s Disease Assessment Scale (ADAS-cog) is a composite of several neuropsychological tests and is the de facto cognitive outcome standard in AD trials. This scale was rationally derived on the basis of knowledge and measurement techniques that are now close to 30 years old and many of the items lack sensitivity to a broad range of impairments in the AD spectrum. Thus, the instrument typically generates data that is insensitive to impairment in milder cases and hence is handicapped in its ability to detect changes.

Multiple pharmaceutical companies have re-analyzed ADAS-cog and other AD/ MCI/ Prodromal AD endpoints in an effort to improve the sensitivity of the outcome measure to detect improvement from a pharmaceutical or biologic intervention. Some companies have used purely statistical methods and others have also used psychometric analysis, like IRT. Most have also attempted to combine multiple outcomes measures, such as MMSE, CDR, and other scales along with ADAS-Cog data. A sampling of these efforts will be presented.

There is an understandable sense of urgency regarding solutions to the low sensitivity issue and an understandable reluctance to wait for alternative instrumentation to be developed. Since very large and expensive drug development decisions can rest on the quality of outcomes assessments, key stakeholders require a thorough understanding of whether statistical techniques can overcome the intrinsic limitations of existing assessment methods. At the same time, if statistics cannot resurrect data collected with older methods, then new strategies must be developed or alternative regulatory approaches will be needed. Critical evaluation of the competing strategies of “charge forward now” vs. “develop something that works” is required. Further there are recent developments in co-primary assessments in other conditions, such as schizophrenia, that have been applied with considerable success in mildly impaired cases.

Potential deliverable: we plan on producing a white paper, starting out by laying out the issues:

· Questionable sensitivity

· Possible inadequate coverage

· Urgent need to assess less impaired patients

· The tension of the need to run trials vs. the need to have an outcome measure that work.

This would include a review of the publicly available information on salvaging ADAS-cog data with statistics. It would also outline previous efforts at consensus-based outcomes development, including evaluation of the time-line for this. Finally, the white paper could comment on the regulatory perspective and whether some thinking, like we have seen in schizophrenia, on performance based measures of everyday functioning could be a substitute for current cognitive measures supplemented by clinical judgments and informant reports.

2012 Autumn: Neurocognition and Social Cognition as Endpoints for Clinical Trials

Monday, June 4th, 2012

Co-chairs: Michael Green, PhD; Stephen Marder, MD

This panel will discuss the implications of having separate endpoints in clinical trials for non-social neurocognition and social cognition. In the past 5 years there has been increasing evidence that neurocognition and social cognition form separable factors. Though there is overlap in some of the underlying processes (e.g., perception, attention, working memory), a variety of statistical methods indicate that models fit better when the two domains are separated compared to when they are combined. The conclusion of partial overlap between the domains in schizophrenia is consistent with studies from nonclinical social neuroscience. The separability of factors suggests that certain interventions may influence neurocognition more than social cognition. Similarly, there are now several studies, both pharmacological and non-pharmacological, targeting social cognitive endpoints. The regulatory implications of this distinction between neurocognition and social cognition endpoints are not understood.

During this panel, we will: 1) provide an overview of social cognition, how it is distinguished from neurocognition, and how it relates to daily functioning, 2) present data on an ongoing study that is psychometrically evaluating social cognitive measures for use in clinical trials, 3) present data on how findings from clinical trials would be influenced if assessments were limited to neurocognitive measures, 4) discuss the challenges associated with international assessment of social cognition, and 5) hear impressions from regulatory representatives.

2012 Autumn Conference – Concurrent Workshop Sessions

Friday, June 1st, 2012

WS1: Adaptive Design Workshop

Chair: Ginger Haynes, PhD

The adaptive design workshop will present a case study illustrating how statisticians and clinicians collectively prepared an argument for using adaptive design in Phase IIB rather than a traditional design. This argument was to be made to senior management who may be very skeptical to accepting or implementing new strategies. Evidence provided in the case will include relevant upfront costs, overall duration of the trial, sample size differences, overall cost, and as well as potential challenges from regulatory agencies. This case study is designed to demonstrate the practical value of adaptive design to a non-statistical audience and generate group discussion about additional evidence that would be convincing in a business environment.

WS2: Integrated Imaging and Genetic Biomarkers in Schizophrenia Workshop

Co-chairs: Stephen Potkin, MD; Henry Riordan, PhD

This session of the Biomarkers Working Group (BWG) will expand upon the 9th Annual Scientific Meeting which summarized the activities of the BWG regarding the qualitative and quantitative evidence for the grading of imaging biomarkers in schizophrenia with a focus on structural MRI. The upcoming session will investigate the linkage between these imaging biomarkers and genetics. Genes and psychiatric disorders such as schizophrenia are clearly related to brain development and function. Brain imaging as a biomarker provides quantitative measurement of brain structure, function and receptor occupancy. Genetic and brain imaging studies in isolation have had limited explanatory power of the causes of schizophrenia and in understanding the response to pharmacological treatment. However, simultaneously considering both brain imaging and genetic factors may yield clear advantages in the quest for useful quantitative biomarkers in clinical trial settings. These quantitative biomarkers may actually be nearer to the underlying genetic etiological influences than clinical diagnostic categories and have the benefits of less heterogeneity and variability than clinical symptoms. Integrated biomarkers such as these can be used in clinical drug development to improve our understanding of the pathophysiology of schizophrenia, to provide new targets for drug development, enrich study populations for Proof of Mechanism or Target Engagement studies, and to develop predictors of clinical response and the development of side-effects. Tools currently exist to accomplish many of these goals but there are also many barriers that need to be overcome. A portion of this session will be dedicated to the practical application of various analytic tools that are designed not only to overcome many of these issues but to help establish the evidence base for these biomarkers and provide essential data needed for biomarker confirmation and validation as well.

WS3: Options and Methods to Improve Cognitive Assessment in Clinical Trials of AD and Its Precursors

Co-chairs: Holly Posner, MD; Phil Harvey, PhD

Objectives: Understand the challenges in assessment of cognitive enhancement studies in AD and its precursors. Critically evaluate whether sophisticated statistical techniques are adequate to overcome psychometric challenges in existing instruments. Propose alternatives in the event that it appears as if current data sets cannot be salvaged. Evaluate new approaches to assessing outcome using new performance-based strategies.

Potential deliverable: we plan on producing a white paper, starting out by laying out the issues:

·         Questionable sensitivity
·         Possible inadequate coverage
·         Urgent need to assess less impaired patients
·         The tension of the need to run trials vs. the need to have an outcome measure that works.

WS4: Medication Development for Stimulant Dependence Workshop

Co-Chairs: Thomas Kosten, MD; Joseph Palumbo, MD

Stimulant dependence (cocaine and methamphetamine) has no FDA approved pharmacotherapy, in spite of development of these medications for over 25 years. The behavioral pharmacology of this disorder provides excellent animal models, but potential medications evolved from these models have been clinically disappointing, as reviewed by Dr. Koob. Many of these medications have been taken into human laboratory studies to evaluate potential medical toxicity from interactions with abused stimulants and to assess surrogate measures such as craving, euphoria, adverse subjective effects, and behavioral responses, as reviewed by Dr. Newton. These behavioral responses include selecting money or other reinforcers rather than the stimulant while under the influence of potential treatment agents. Several agents have show good safety and potential efficicay including disulfiram, bupropion, modafinil, and a cocaine vaccine. Outpatient clinical trials have confirmed some of these medications as effective in phase 2 single site and multisite randomized, placebo controlled trials, but none has progressed to a phase 3 NDA study, as reviewed by Dr. Kosten. Pharmacogenetics involving the adrenergic and dopaminergic systems is showing some promise for increasing the efficacy of these medications by selecting appropriate candidates for optimal efficacy. The lack of industry support for moving on to NDA studies involves many factors including an unclear FDA pathway to approval without clearly defined outcome guidelines. The planned workshop and ongoing study group will work with NIDA and the FDA to define such guidelines for NDA approval studies in order to facilitate industry participation. Dr. Palumbo will review these opportunities for industry collaborations with academia and NIH and FDA in getting approved medications for stimulant addictions.

WS5: Negative Symptoms Workshop

Co-Chairs: Stephen Marder, MD; David Daniel, MD

This workshop on clinical trials methodology for studies targeting negative symptoms is a continuation of a process for re-evaluating recommendations based on emerging data. This group will review new data from a number of sources including recently reported trials; studies evaluating the properties of newer instruments for measuring negative symptoms; and recent analyses from the European Union NEWMEDS data base. The group will also review the recommendations from the NEWMEDS negative symptom meeting from April, 2012.

WS6: Precision Medicine in CNS Clinical Trials Workshop
Co-Chairs: Douglas E. Feltner, MD; Aidan Power, MD

Precision medicines target treatment to a diseased population using a specifc genetic or biologic marker that can be assessed with a diagnostic test. Precision medicine (also known as personalized or stratified medicine) has become a standard approach to the development of new anti-cancer therapeutics. The adoption of precision medicine approaches in neuroscience therapeutics has lagged that of oncology and other therapy areas.

In response to the increasing number of precision medicines being developed, regulatory agencies have recently drafted several guidances related to the qualification and use of biomarkers in clinical trials. These include the recent FDA guidance on drug development tools, ICH guidance on the format of genomic biomarker qualification submissions, FDA standards for clinical trial imaging endpoints, FDA guidance on clinical pharmacogenomics, and FDA guidance on in vitro companion diagnostic devices.

This workshop will seek to identify the opportunities and barriers for developing precision medicines in neuroscience indications. Alzheimer’s disease, schizophrenia, and major depressive disorder will be used as examples to highlight the issues. Both scientific and practical opportunities and challenges will be addressed, including state of the science, sponsor organizational challenges/opportunities, regulatory interactions, and payor and implementation opportunities/challenges. A second goal of this workshop will be to identify those individuals interested in advancing precision medicines for neuroscience indications and assessing whether a precision medicine workgroup should be formed to develop precision medicine expertise and knowledge within ISCTM, and to advance the goal of developing methods for the evaluation of precision medicines in neuroscience clinical trials.

WS7: Psycho-Social Treatment Platforms in Psychopharmacology Workshop

Co-chairs: Nina R. Schooler, PhD; Dawn Velligan PhD

This workshop will examine possible models and examples of psychosocial treatment platforms for RCTS. It will consider the advantages and disadvantages of specifying a uniform psychosocial platform in multi-center psychopharmacology clinical trials. Advantages include minimizing error variance due to differences among sites in the kid of support provided and therefore reducing error. Possible disadvantages include increased cost and the possibility that the platform will benefit a placebo treatment differentially.

WS8: Approaches to Improve Signal Detection in Studies Comparing Drug and Placebo Workshop
Chair: Amir Kalali, MD

Previous work by participants in this workshop has identified four areas of focus: Fit for purpose design, sponsor engagement, data sharing, and patient selection. This session will focus on developing strategies to address these issues as well as continue development of manuscript.

2012 Autumn: Statistical, Clinical, Payor and Regulatory Perspectives on Dimensions That Define the Spectrum of Efficacy and Effectiveness

Wednesday, May 30th, 2012

Co-chairs: Larry Alphs, MD, PhD; Nina R. Schooler, PhD; Ginger Haynes, PhD

Brief Description

Demands for personalized medicine, comparative effectiveness, and cost control are increasingly driving sponsors to consider both efficacy (explanatory) and effectiveness (pragmatic) trials to be part of a complete drug development program. Yet, there is much confusion in the field about the differences between these trials including their definition, requirements, design and interpretation. Further, within each approach, a wide range of clinical trial designs (each with its own strengths and weaknesses) can be used to address these considerations. Effectiveness trials are particularly poorly understood. Few designs completely address the various types of effectiveness questions that can be raised. Indeed, essential elements of trials claiming to be ‘effectiveness’ are often not clearly reported and few broadly established standards for qualifying trials as either ‘effectiveness’ or ‘efficacy’ exist. Not infrequently meta-analyses use results from an explanatory approach and draw conclusions to address questions that require a pragmatic design. Commonly agreed upon regulatory, provider and publication quality standards for these two approaches are particularly needful. The lack of clear standards within the field has led to a plethora of generally inadequate local standards that vary by country and by publication. Thus, even when studies are available, a clear, coherent body of work is not being developed and the overall quality of clinical trial programs for effectiveness questions is diminished. This session will attempt to clarify issues inherent in conducting effectiveness trials and provide initial solutions to address them.

Goals

The goals of this session are to:

Increase the understanding of the difference in efficacy (explanatory) and effectiveness (pragmatic) trials, clarifying questions of clinical, regulatory and provider stakeholders that require pragmatic designs
Clarify the criteria that should be addressed when designing effectiveness trials and identify standards to assess whether such trials have been well-designed and well-conducted
Provide initial solutions to address these needs

Stratified Medicine and Targeted Clinical Trials for Alzheimer’s Disease Drug Development in Light of Recent Phase III Trial Results: Example ApoE Genotype

Wednesday, May 30th, 2012

Co-Chairs: Lon S. Schneider, MD; Terry Goldberg, PhD; Larry Ereshefsky, PharmD, BCPP

The purpose of this panel is to provide audience members with a strong scientific and economic rationale for considering stratified approaches to clinical trials and in particular using APOE as a stratification variable in Alzheimer’s disease related trials. In a presentation entitled, “Stratified medicine for AD drug development,” Dr. Mark R. Trusheim will provide an overview of and an argument for use of stratified medicine techniques in clinical trial design, He will discuss a broad array of trials, but will focus on bapineuzemab (a monoclonal antibody to beta amyloid plaques). Using models that in combination address power, biology, and economics and can be manipulated to test a variety of strategies, he will demonstrate why it is scientifically sound and economically imperative to consider this approach.

In a presentation entitled, “The impact of ApoE4 carriage on clinical trials,” Dr. Terry E. Goldberg will review basic APOE neurobiology, including its roles in lipid transport and amyloid clearance. He will present current novel work on isoform specific effects at the message, protein and biomarker levels. He will also review treatment impact of APOE4 individuals (who carry the risk allele) carriers in the context of clinical trials. Perhaps most strikingly he will review new data on the large protective effects of APOE2 on biomarkers and cognition and the implicatins that these have for clinical trial design and drug development.

Dr. Eric Reiman will present his experiences in “Using registries and genotyping for trials: What are the implications and pragmatics.” He will address the use of registries and genotyping the Alzheimers Prevention Initiative. He will review his own experiences in accessing such cohorts and will discuss the scientific motivations for stratification by APOE and early intervention in PS1 mutation carriers. More specifically he will discuss proposed or possible trials using APOE genotype as an inclusion requirement for a range of drugs/prevention approaches. This will largely involve the exploitation of a registry of APOE e4 carriers established around Phoenix, aspects of planning or undertaking such a trial, its theoretical reasons, and outcomes including imaging as a potential surrogate or co-primary.

In a talk entitled, ‘Simulating stratified medicine trials in AD,” Dr. Lon S. Schneider will present results of subanalyses of AD trials based on a post hoc stratification such as APOE 4 carriage have provided interesting and sometimes contradictory results. Although some results might be due to play-of-chance in underpowered subanalyses or to the metrics of the outcomes scales, other results may be due to actual interaction of the drug with the subgroup. As APOE 4 is the strongest risk factor for AD, and to a large degree is associated with age of onset of AD in late-life it has received particular attention for stratified medicine. After reviewing the few trials that published outcomes based on APOE 4 carriage, Schneider and colleagues will present simulations derived from a large ADCS and ADNI data base that empirically test the efficiency of developing a drug based on several trials scenarios of APOE 4 carriage. Questions around differential drop outs, heterogeneity of outcomes, and disease severity will be examined.

Discussion: Next steps what can be done now?

2012 Autumn: Clinical Trials in Autism Spectrum Disorders: Methods, Measurements, Designs and Outcomes

Wednesday, May 30th, 2012

Co-chairs: Douglas Feltner, MD; Joseph Horrigan, MD

This session will provide an overview of the progress toward new treatments for Autism Spectrum disorders (ASD). The characteristics of the disorders will be reviewed, along with impending changes in DSM-V and their impact on subject selection and outcome measurement. Results from recent clinical trials will be presented, along with insights into the challenges of designing and running clinical trials for these disorders. Methods of clinical measurement for use in early and late phase development will be discussed. Established regulatory precedents will be reviewed along with areas of regulatory uncertainty.

ISCTM 2011 Autumn Meeting – Meta-analysis: Methods and Applications to Policy Session Summary

Wednesday, May 30th, 2012

Meta-analysis: Methods and Applications to Policy Frequentist Methods of Meta-analysis
Dr. Michael Borenstein, Biostat, Inc.

Dr. Borenstein gave the foundational presentation of the session. He began by introducing the concept of meta-analysis, a statistical approach to summarize and synthesize multiple studies in a single figure and numerical estimate. In meta-analysis, the effect size, often a risk ratio associated with one hypothesis of interest, is presented for each study and a pooled estimate over all studies is computed. These effect sizes are presented as single values and the related 95% confidence interval estimates. The relative weight of each study’s data in the pooled estimate is dependent on how similar the studies are to each other by design.

The benefits of meta-analysis are that it provides context, statistical power, and the ability to examine subgroups. Meta-analysis generates a visual summary of the literature with regards to a particular outcome, allowing the reader to easily identify the consistency or inconsistency of the scientific finding. As meta-analysis incorporates data from multiple studies there is greater power to examine the question of interest than in any one study. This visual presentation and related modeling may help identify important subgroups where the primary hypothesis holds or does not hold. These benefits were illustrated using examples from the literature on the impact of naltrexone in opioid use, impact of statin dose on death and myocardial infarction, impact of rosiglitazone on death and myocardial infarction, efficacy and tolerability of first versus second generation antipsychotics, and psychological approaches to treating specific phobias.

The next portion of the talk illustrated some of the mechanics of meta-analysis including heterogeneous versus homogeneous studies, weighting schemes, and data types. An important decision prior to beginning a meta-analysis is whether the outcome of interest can be considered a fixed or random effect as this will determine how each study will be weighed in the overall estimate. If one can reasonably assume that each included study is modelally identical, such as when a drug company has conducted each of these studies using the same drug under similar protocols in a single patient population, then the fixed effect model is appropriate. If this assumption is not tenable, then a random effects model should be used.

In concluding, Dr. Borenstein presented and addressed some of the published criticisms of meta-analysis – that it is not appropriate to summarize results in a single number, that meta-analysis compares apples and oranges, and that meta-analysis is often performed poorly.

Bayesian Methods and Meta-Analysis
Dr. John Seaman, Baylor University

Dr. Seaman provided an overview of the foundation of Bayesian statistics and highlighted how the Bayesian approach to inference has advantages in meta-analysis. The Bayesian principles were illustrated using non-restorative sleep (NRS) examples derived from published biomedical literature. Bayes’ Theorem is the cornerstone of Bayesian inference; it can be simply expressed as the idea that the best model to estimate something is proportional to your prior knowledge about that attribute (prior model) multiplied by what you observe in a particular study (data model). This best model incorporating both sets of information is referred to as the posterior model and allows one to make probabilistic statements about the variable you wish to estimate. An example of such a statement would be, “There is a 0.954 probability that the true rate of male patients experiencing NRS on benzodiazepine-like (BL) drugs exceeds 50%.” To get to the best model/posterior models, assumptions have to be made about the prior and likelihood models.

The prior model represents all the information that is known about an attribute (for example, rate of NRS in male patients administered BL drugs) prior to executing a new trial. This model may be determined on the basis of previous studies, opinions of subject matter experts, or both. The shape of this curve will be narrower if there is greater certainty about the prior information, or wider (flatter) if there is less certainty. The likelihood model is determined by the characteristics of the observed data in the current trial. Multiplying the prior and likelihood model, and a little algebra/integration, leads to the posterior model for the rate of males experiencing NRS on BL drugs. This model allows one to make statements about the true rate of NRS in male patients administered BL drugs. For example, “The probability is 0.45 that the true rate of NRS in male patients administered BL drugs is between 50% and 75%.”

Mathematical adjustments to the posterior model lead to the posterior predictive model. The posterior predictive model allows one to make statements about the rate of NRS that will be observed in the next sample of male patients administered BL drugs. For example, “There is a 0.189 probability that the rate of NRS in the next sample of male patients on BL drugs will be 75%.”

Bayesian inference is designed to easily adapt to the evolution of knowledge that is central to scientific research. Today’s knowledge, captured as the posterior model, can serve as the prior model for the next clinical trial. This accumulation of prior knowledge to update the results in a current trial achieves the goals of meta-analysis. As Stangl and Berry said, “The Bayesian paradigm is synonymous with meta-analysis. For both, the goal is to incorporate all information to predict with as much accuracy as possible some future event, and the uncertainty associated with it, and to present this prediction in a manner that leads to coherent decisions.”

Dr. Seaman then presented an example of Bayesian meta-analysis, extending the example of estimating rates of NRS in benzodiazepine-like drugs. In this example from (Parmigiani, 2002), there were multiple studies that determined the relative risk (RR) of NRS in BL drugs versus placebo and 13 met the inclusion criteria for meta-analysis. All studies were published, so sample sizes, observed rates, and RR for each study were available for analysis. For each study, a posterior model was built for the RR; from this model the posterior predictive model for RR of NRS associated with BL drugs and placebo was derived. The posterior model was used to build 98% posterior probability interval estimates of RR of NRS for each study. These are interpreted for a single study as follows: “the probability is 98% that the RR of NRS with BL drugs versus placebo in Study 1 is between 0.6 and 1.1.” Heirarchical Bayesian models combined these individual posterior models, incorporating individual study variability, into a common overall posterior model of the true rate RR of NRS with BL versus placebo. This common overall posterior model was queried to determine the probability that the true RR of BL drugs versus placebo was >1 which would indicate a treatment effect. In this example, the probability that the true RR of BL drugs versus placebo being >1 was 0.01, so the meta-analysis concluded that it was unlikely that BL drugs caused NRS. The overall posterior predictive model of RR for a new trial showed that the probability of the observed RR in that trial exceeding 1 was 0.14.

Finally, advantages and challenges of the Bayesian approach were summarized. In Bayesian analyses (including meta-analyses), the conclusions are probability statements about the attribute of interest which are easy to understand. The modeling approach precludes having to make an assumption about fixed or random effects and allows one to account for unobserved aspects of variability across published studies. Bayesian meta-analysis does not require the assumption of specific data models (e.g. normality) and allows one to predict outcomes of future trials facilitating sample size estimation and prior specification for future analyses. However, prior model selection and elicitation can be challenging, sensitivity analyses are essential as conclusions may vary according to prior model assumptions, there is no significance cut-off of 0.05, and the methods are less familiar to the consumer. A number of technical references including recommended reading on Bayesian statistics for non-statisticians were also presented.

Proof in Medicine: The Role of Research Synthesis
Dr. Joel Greenhouse, Carnegie Mellon University

Dr. Greenhouse presented the use of cross-design synthesis methods via work funded by NIMH and AHRQ to assess association between suicidality and antidepressant use in children and adolescents. The goal of medical research is to determine which interventions are most effective (or harmful) for which patients under what circumstances. To reach this goal, it is essential to generate evidence about interventions, both benefits and harms, and to assess and weigh evidence about interventions. The complete evidence about a treatment or intervention typically comes from multiple studies of varied design, from explanatory (randomized placebo-controlled trials) to observational (epidemiological or claims database studies). Ultimately judgments about the evidence must be made, with respect to the reliability and generalizability of each component as well as the overall conclusions regarding a potential association.

Cross-design synthesis refers to the use of evidence from multiple data sources, experimental and observational, to address similar questions of interest. The FDA meta-analysis of 24 pediatric trials (23 industry-sponsored and 1 NIMH-sponsored) was reviewed and limitations noted. In this application of cross-design synthesis, a new meta-analysis was performed using a subset of the FDA trial database, the sixteen randomized placebo-controlled clinical trials of patients with major depressive disorder, using a different outcome measure, definitive suicidal behavior. Both frequentist and Bayesian methods of meta-analysis were applied to these studies. The 95% CI for RR of definitive suicidal behavior using frequentist meta-analysis was (1.01, 3.96) while the 95% credible interval from the Bayesian meta-analysis was (0.96, 4.12). The posterior probability of the RR exceeding 1 from the Bayesian meta-analysis was 0.96. To increase the generalizability of the evidence set, an analysis of the LifeLink Health Plans Claims database was performed. This database represents over 95 managed care plans in the United States and over 42 million covered lives. The cohort for analysis was restricted to members aged 5-17 between January 1, 1999 and June 30, 2008 with a depression claim based on ICD 9 codes. The index episode was identified as the first claim coded for depression in the time interval for which there was not a previous antidepressant claim in the past 3 months or depression diagnosis in the past 6 months. Data for 12 months prior to the index episode and 6 months following was examined. There were 52, 293 members who met these criteria (LifeLink cohort) and were eligible for analysis. Members in the cohort were classified as having been treated with SSRI, No Psychotherapy (N = 6,872), Other Antidepressant (N = 17,608), No Treatment (N = 6,677), SSRI + Psychotherapy (N = 10,949), or Psychotherapy, No Antidepressant (N = 10,187) according to the presence of relevant claims in the pre-index period. The outcome of definitive suicidal behavior was defined as having a CDC e-code for self-inflicted injury during the 6 month period post index episode.

The LifeLink cohort was split into two groups, one that was more reflective of the RCTs examined in the FDA database, Restricted Cohort, and those that were not. To be included in the Restricted Cohort, members could not meet any of the following criteria – high risk for suicidal behavior, current schizophrenia diagnosis, current or lifetime history of drug or alcohol dependence, current bipolar I or II diagnosis, currently pregnant or sexually active/no acceptable contraceptive use, or attempted suicide within 7 days of index episode. The Restricted Cohort was comprised of 39,396. The RR of definitive suicidal behavior was computed for the Restricted Cohort, incorporating propensity scoring, age, gender, geographical region, and Medicaid status to adjust for potential confounding differences across treatment groups. Using no treatment as a reference group, the adjusted RR(95% CI) of SSRIs without psychotherapy was 2.3 (1.1–5.1) and the adjusted RR (95% CI) of any antidepressant in the Restricted Cohort was 4.5 (2.3–8.9). Raw RR was also computed for the LifeLink Cohort. The observational RR from the Restricted Cohort was graphed with the meta-analytic results from RCTs with RR of the whole LifeLink Cohort illustrated as a reference line, allowing a grouped snapshot of clinical trial and observational study findings relevant to the research question. Dr. Greenhouse summarized the talk by pointing out the benefits of this method for synthesizing various sources of medical research and generating new hypothesis. He highlighted that cross-design synthesis is inherently interdisciplinary as medical expertise is essential to proper design and conclusions and there is need for sensitivity analyses to understand the dependencies of conclusions upon underlying definitions, judgments and model assumptions.

Impact of Meta-Analysis for Policy Decisions about Reimbursement-Payer Perspective
Dr. Rhonda Robinson-Beale, Optum Health

Dr. Robinson-Beale presented the current procedures Optum Behavioral Health uses to synthesize the multiple types of published evidence for in their regular review of non-pharmacologic interventions (technologies) for behavioral health to determine plan coverage. They have an established hierarchy of evidence that is used in scoring relevant publications according to a scientific merit rating scale which grades the study’s research design, measurement of the dependent and variables, participant ascertainment, and generalizability. Each technology is classified as Unproven, Emerging, or Proven based on the magnitude, quality, and consistency of published evidence related to the technology. Inherent in this classification is an assessment of the ability of the technology to be implemented with fidelity. This is especially important in psychotherapeutic interventions. Fidelity is assessed according to the rigor of training, licensing, and re-training applied. Dr. Robinson-Beale concluded with recommendations to researchers as they are preparing non-pharmacologic behavioral interventions for technology assessment – clearly delineate the population who will use the technology, include naturalistic population-based studies, analyze ethnic and racial subpopulations, perform comparative effectiveness analyses with standard established treatments, include cost of implementation, and provide detail on how fidelity was established where appropriate. She encouraged those who fund research of these treatments to establish requirements for integration and standardization of research protocols, including a central database for all studies and standard research parameters, procedures, and methods of assessment. She also recommended that the central database be structured to allow meaningful integration of datasets and the establishment of registries and this database be accessible to payers to promote low cost and informative descriptive analysis and matched controlled comparisons.

2012 Autumn: Precision Medicine in CNS Clinical Trials Workshop

Tuesday, May 29th, 2012

Co-Chairs: Douglas E. Feltner, MD; Aidan Power, MD

2012 Autumn: Medication Development for Stimulant Dependence Workshop

Friday, May 25th, 2012

Co-Chairs: Thomas Kosten, MD; Joseph Palumbo, MD