Menu
Synergy in Action

Comments on AHRQ Draft Workgroup Report: Antipsychotics in Adults: Comparative Effectiveness of First‐Generation versus Second‐Generation Medications

Chair: Larry Alphs, MD, PhD, President of ISCTM, Janssen Pharmaceuticals, Inc
Co‐chair: Navid Samad, MD, MPH, Samad Pharma & Biotech Consulting, LLC
Ross J. Baldessarini, MD, Harvard Medical School
Jesse Berlin, ScD, Janssen Pharmaceuticals, Inc
Doug Feltner, MD, Douglas E Feltner, LLC
Reuven Ferziger, MD, Janssen Pharmaceuticals, Inc
Jean‐Pierre Lindenmayer, MD, New York University
Rajiv Radhakrishnan, MD, Yale University School of Medicine
Tanya Ramey, MD, PhD, Pfizer, Inc
Gary Sachs, MD, MA, United BioSource Corporation
Cynthia Siu, PhD, MBA, Data Power, Inc
Paul Stang, PhD, Janssen Pharmaceuticals, Inc
Douglas Vanderburg, MD, Pfizer, Inc

GENERAL COMMENTS
The International Society for CNS Clinical Trials and Methodology (ISCTM) appreciates having an opportunity to comment on a draft report (2011‐0606) prepared by the Evidence‐Based Practice Center of the Agency for Healthcare Research and Quality (AHRQ) of the US Department of Health and Human Services, entitled Antipsychotics in Adults: Comparative Effectiveness of First‐Generation versus Second‐Generation Medications. Recognizing the potential importance of this document to clinicians, clinical investigators and other stakeholders, ISCTM convened the working group listed above to review and comment on this report. As a society interested primarily in methods employed in clinical research on CNS‐active agents, our comments focus largely on methodological considerations and the validity of the conclusions that can be drawn from the analyses reported.

ISCTM recognizes that a great deal of effort has gone into the draft report, and understands that the approaches utilized represent several widely accepted methods to analyze disparate sets of data. We support AHRQ’s intentions to provide information aimed at helping clinicians, employers, insurers, manufacturers, policymakers, and other stakeholders to make informed decisions about providing health care services involving older and newer antipsychotic drugs. We further note that AHRQ states that the conclusions of this report do not necessarily represent its views or those of the U.S. Department of Health
and Human Services.  
 
This report is being issued by an influential agency of the federal government and its conclusions are likely to be read with great interest and acted upon by a variety of interested persons. Consequently, the work should meet rigorous scientific standards. ISCTM agrees with the writers of the AHRQ draft report that most studies considered for analysis are not of sufficient quality to resolve questions of comparative effectiveness or to support the report’s conclusions. On this basis and due to concerns about the methods employed (see below), ISCTM proposes that the current state of comparative effectiveness data available to support meta‐analyses of efficacy data pertaining to older (“first‐generation” [FGA]) and more recently introduced (“second‐generation” [SGA]) antipsychotic drugs is so low that very few claims regarding the comparative therapeutic efficacy and side effects of these highly pharmacologically and clinically heterogeneous classes of agents. We are further concerned that the sheer volume of material presented in the AHRQ report may suggest that more or firmer conclusions can be made than are possible. Consequently, we suggest that many of the tables presented in the main report be provided in an appendix, with the main report focusing on the few valid conclusions, with critical discussion of limitations, and suggestions for improving the state of the research data on this topic. We are particularly concerned that limited and potentially unreliable and misleading findings of an apparent lack of statistically significant differences between types of drugs might be misconstrued as providing firm support for similarity of efficacy or safety. We suggest that concerns expressed in our commentary be highlighted in the Structured Abstract of the AHRQ report so that even casual and non‐expert readers will be aware of the major limitations of the available evidence and their valid interpretation.  
 
STRUCTURED ABSTRACT AND EXECUTIVE SUMMARY
Primary Conclusion

Our most important recommendation is that the AHRQ report emphasize as a primary conclusion that the studies available for analysis were not adequately designed to support sound comparative effectiveness analyses, and that biases inherent in them make it difficult, if not impossible, to draw conclusions about the relative efficacy of FGAs versus SGAs, particularly as the comparison groups are frankly arbitrary classes of highly pharmacologically and clinically dissimilar agents. Overall, ISCTM reviewers find the draft AHRQ report to be inconclusive, and are concerned about incomplete, inconsistent, or erroneous reporting of clinical trial results in several sections of the report, some of which are highlighted below.  
 
Overall rationale for the AHRQ analysis
As a general observation, a clear, compelling rationale for this study is not provided. The AHRQ report indicates that “both FGA and SGA are associated with a range of side effects.” The differences identified represent commonly held perceptions regarding dissimilar adverse‐event profiles between SGAs and FGAs. However, the range of adverse effects varies extensively among specific drugs in both classes, as well as among doses, so as to yield largely inconclusive comparisons of agents considered as FGA vs. SGA drugs. With respect to SGAs, it is stated that “SGAs are generally thought to have a lower risk of motor side effects, but most are associated with a higher risk of weight gain, elevated lipid and prolactin levels, and development of type 2 diabetes mellitus.” This statement is too broad and over‐inclusive, especially regarding risk of developing type 2 diabetes or hyperprolactinemia as well as risk of adverse neurological effects (notably including akathisia), which vary markedly among the class of SGAs and with their doses. Further, the draft AHRQ report asserts that “SGAs have shown greater benefits in many outcome domains compared with FGAs” (page 2). This statement is both confusing and inappropriate as it presents as a prior conclusion the question being asked by the study, and is not supported by the evidence reviewed in the report itself, and indeed may not be possible to answer given the limitations of the data available.  
 
Based on the preceding considerations, we do not agree that there are fundamental defining pharmacological or clinical differences between FGA and SGA other than an arbitrary time of their discovery or licensing, and some differences in their receptor‐binding profiles, so that focusing on putative clinical differences between FGA and SGA with respect to either antipsychotic efficacy or adverse effects risks being arbitrary as well as being poorly supported by the data analyzed. Indeed, the highly heterogeneous pharmacologic and adverse‐event characteristics among so‐called FGA and SGA or ‘typical versus atypical’ antipsychotics surely limit groups differences between them. Further, it is a matter of considerable concern that the AHRQ report lacks comment on effects of drug dose, route of administration, or patient characteristics (notably, age and current clinical status) on risks or severity of many adverse effects. With respect to FGA, the database is limited by the number of studies and the predominance of haloperidol as the most commonly employed FGA comparator. The over‐representation of haloperidol as a prototypical FGA is likely to be misleading and not to support generalizations, especially as haloperidol is a high‐potency FGA and has often been administered at relatively high doses. Further, existing data suggest that FGAs of high and low potency may have meaningfully dissimilar adverse‐effect profiles. It would be preferable for the AHRQ report to focus on individual drug comparisons with large numbers of trials, and to provide additional, critical assessments of the generally low quality of available data.  
 
Examples of inconsistencies
The Structured Abstract, states (Page v): “Risperidone was favored over haloperidol for positive symptoms and total psychosis score.” At the same time, the Executive Summary, states (Page ES 7, Table ES 2 – under “Positive Symptoms”): “Significant difference favoring risperidone for PPI. No difference for PANSS or SAPS.”

It may appear to a reader that the conclusion “Significant difference favoring risperidone for PPI” (with a “Low” strength of evidence) was drawn from 24 RCTs. However, Table 20 (Evidence summary table: haloperidol versus risperidone) clearly shows that PPI data came from only one of 24 (4.2%) trials, involving only 30 (0.7%) of all 4317 subjects, included in the positive symptom analysis for comparing risperidone vs. haloperidol. Authors found non‐significant difference in PANSS (pooling 20 studies, n=4064) between risperidone and haloperidol [effect estimate 0.51 (‐0.15 to 1.17)].  
 
Since additional inconsistent, incomplete, or erroneous reporting was found throughout the AHRQ report, we urge that the findings and conclusions summarized in the Structured Abstract and Executive Summary provide additional essential information for each reported outcome: [a] a general assessment of the quality of studies included, [b] specific assessment of the comparability of studies and their suitability for pooling, [c] the numbers of trials and of subjects contributing to specific findings (e.g., only one trial and 30 subjects for positive symptoms, out of 24 trials analyzed), [d] an indication of the quality of evidence specific to each major outcome, whether pre‐specified or not.
 
Discrepant Findings
The conclusion in the AHRQ reports that there was “No difference for PANSS or SAPS” (Page ES 7, Table ES 2 – under “Positive Symptoms”) for haloperidol vs. risperidone contradicts the meta‐analytic findings pooled from RCTs as reported by Leucht et al. (2009). That review reported significant differences favoring risperidone over haloperidol with respect to positive symptoms (21 studies, n=2739, Hedges’g = –0.12, p=0.002; web appendix cs4, page 8), overall symptoms (27 studies, n=3258, g = –0.15, p=0.001; web appendix cs4, page 8), and negative symptoms (23 studies, n=2908, g = –0.13, p=0.001; web appendix cs4, page 8). Similar results also were found in comparisons of various doses of risperidone versus haloperidol given at daily doses of ≤12 mg (web appendix cs4, page 8), or all FGAs (p 34, Table 2).

Furthermore, we found that the AHRQ analysis results (Table ES‐2) with respect to effectiveness of clozapine and olanzapine (vs. FGAs) stand in contradiction to findings from a recent, well conducted meta‐analysis by Leucht et al. (2009, Lancet). Such inconsistencies of the AHRQ analyses with another
important and relevant meta‐analysis should be addressed in the report.
 
High or Uncertain Risk of Bias
The AHRQ report recognizes that reasons for an “uncertain risk of bias” include unclear reporting regarding sequencing of treatments, concealment of allocation to specific treatment‐arms, and methods of blinding. Common reasons for trials to be assessed as having a “high risk of bias” include lack of convincing blinding procedures or inadequate analysis or reporting of outcome data. We recommend that these limitations be stated more strongly in the report when they are identified, and specifically highlighted in the Structured Abstract before any results are stated or conclusions drawn.
 
Selective reporting of adverse events
Justification should be provided for why only four adverse event categories were selected for emphasis (diabetes mellitus, metabolic syndrome, tardive dyskinesia, and mortality). No reason is given why other forms of adverse neurological effects (including acute dystonia or dyskinesia, akathisia, and bradykinesia, as well as excess sedation), cardiovascular effects (including hypotension and cardiac conduction defects or arrthymias), or risks associated with pregnancy are not considered. These significant omissions suggest under‐appreciation of the occurrence of a range of adverse effects, particularly with respect to FGAs. In addition, the ability of the included trials to detect meaningful differences in adverse‐event risks should be discussed, especially in trials relying on typically passive and incidental reporting of clinically observed adverse events or patient‐complaints, rather than specific assessments. It is problematic that studies with radically different approaches to adverse‐event reporting are combined for this meta‐analysis. Since many comparisons are made among adverse‐event rates associated with individual drugs, it would be useful to evaluate the validity of the comparisons based on critical assessments of the quality of the methods of ascertainment, the number of trials, analyses, and subjects involved, and to limit comparisons to instances where the available data are abundant and of plausible quality.  
 
Adverse events often exert a highly variable impact on the ‘effectiveness’ of drug‐treatment. Many experts consider it important to weight adverse events (as is done for other measures of effectiveness). For example, tardive dyskinesia or akathisia are far more distressing to patients than minor elevations in serum concentrations of transaminases or glucose, and far more likely to impact treatment‐adherence and long‐term effectiveness. Weighting of the likely clinical importance of each adverse effect (by subjective distress and by clinical significance) should improve estimates of effectiveness and better support assessment of trade‐off decisions that must be made clinically. We recognize that this approach is still emerging but it would be valuable to highlight its potential value in the AHRQ report. 

Failure to address clinical heterogeneity of schizophrenia
Another significant limitation of analyses of trials in subjects diagnosed with schizophrenia is that their clinical status at the time of randomization is often heterogeneous and poorly defined: past history, illness‐severity, and previous or even current treatments are rarely documented. For instance, patients considered to be in first‐lifetime episodes of schizophrenia, those in acute phases of exacerbation of chronic psychotic illnesses, or in relatively stable chronic phases, as well as those with prominent negative versus positive symptoms, ill‐defined ‘schizoaffective’ disorders, or substantial evidence of dementia, tend to be lumped together within or among trials, despite their obvious clinical heterogeneity. Further, trials typically vary greatly in patient descriptions, locations and cultures, assessment methods, drug doses and durations, drop‐out rates, and adverse‐event reporting techniques. Many other sources of variance are not often defined or measured, but may severely limit comparability of trials and generalizability of findings. Even more troubling is that studies of acutely exacerbated patients (that examine treatment of acute symptoms and their maintenance) and stable patients (that examine only maintenance of therapeutic effect) have been lumped together in the same analysis. As outlined below, this represents a major methodological and logical shortcoming, with a high likelihood for heterogeneous, and therefore, inconclusive outcomes. Signal detection will be very different in these different study populations. Overly broad and heterogeneous inclusion criteria foster noisy outcomes, especially when the number of included studies is small. Significant heterogeneity and bias in the data (e.g., reviewer bias, publication bias, etc) lead to discordant results on the main outcome variables studied. Overall, these limitations will increase heterogeneity and reduce the ability to detect signals when studies are pooled.
 
Limitations in assessments of data quantity and quality
We agree with the AHRQ report’s conclusion that the quality and abundance of evidence are low for almost all of the questions addressed in the report. Even when the available evidence is of even moderate quality, numbers of comparisons and of subjects is often low. Overall, these circumstances make it difficult or risky to draw firm conclusions and to make clinical or treatment‐policy recommendations. They also cast serious doubts on the overall quality and strength of conclusions generated by this report, and severely limit their potential value to guide clinical care or policies. We certainly agree with the report’s conclusion that more studies are required to support recommendations to clinicians, patients, payers, and policy‐makers.  
 
Outcome measures
Statements to the effect that clinically‐relevant outcome measures are rarely assessed appear in several places in the draft report, including in the sub‐section Conclusions.  (“Outcomes potentially important to patients were rarely assessed.”) Typically reported are scores from standardized symptom‐rating scales, which may or may not be clinically important, and usually change very little in treatment trials involving chronically psychotic participants. Measures pertaining to functional status and quality‐of‐life have historically been infrequently included in trials of antipsychotic drugs or in their meta‐analytic reviews. Yet, many symptomatic outcomes (sedation, restlessness, awkwardness, depression, anxiety, and others) that typically are not assessed, are prevalent and perceived as important by patients and their families, and as needing relief and perhaps tending toward discontinuation of treatment. These limitations call for more explicit discussion in the report.  
 
Efficacy versus effectiveness
Most of the treatment trials evaluated in the AHRQ report are studies of efficacy and not effectiveness. As mentioned in the draft report, there are very few effectiveness studies available for analysis. An early report by AHRQ itself (Criteria for Distinguishing Effectiveness from Efficacy Trials in Systematic Reviews) discusses the sensitivity and specificity of seven criteria to distinguish effectiveness from efficacy. If these criteria were applied to the AHRQ’s own report on antipsychotic drug‐comparisons, some of the trials included surely would be assessed as “poor” with respect to criteria for effectiveness, and it would be appropriate to include this information in an ARHQ‐sponsored document.  
 
Responses in subgroups
Many clinicians will want to know which drugs (or combinations) are likely to be safer and better tolerated, as well as most effective, for particular types of patients. However, most of the analyses provided in the AHRQ report deal with effects observed in heterogeneous populations and poorly controlled drug doses, without regard to potential subgroups defined by especially high or low efficacy or tolerability. The little information that does consider subgroups usually arises from comparisons of rare trials and few subjects. The resulting data are likely to provide only weak support for conclusions about sub‐populations. We understand that this outcome reflects the limitations of the available data, but it would be valuable to discuss this limitation and to include it as an unmet need.  
 
Limitations of data presented
The amount of detail provided in the AHRQ report across comparisons varies markedly, and may give the impression that some conclusions rest on more abundant or robust data than is the case. Rather than attempting to draw conclusions from sparse and heterogeneous data, it would be more valuable to identify the limitations of the evidence underlying each comparison and conclusion presented, and to restrict those reported to relatively adequate data, with recommendations of additional information that is needed from future studies. In addition, numerous clinical outcome measures and rating‐scale scores are included, which may or may not address the same or comparable endpoints. It would be helpful to base comparisons on similar outcome measures across trials, or to address variance in outcome measures when it arises.  
 
Types of trials included and excluded
The AHRQ report focuses on direct, head‐to‐head, comparison trials, which are not common in studies of most types of psychotropic drugs. This strategy should improve internal validity of the comparisons since the conditions of observation are presumably well‐matched. However, this decision led to the exclusion of a very large body of trial‐results based on comparisons to placebo, or other designs. Moreover, head‐to‐head trials without a placebo‐control condition carry high risks of finding “no apparent difference” leading to potentially false conclusions of equal efficacy. An alternative is to carry out indirect comparisons using such statistical approaches as mixed‐treatment meta‐analysis. Such analytic methods are most credibly applied when patient samples are truly comparable across trials, and other potentially relevant factors can fairly be assumed to be comparable or irrelevant to measured treatment‐effects (e.g. calendar time, severity and duration of illness, prior treatment resistance, sex, age). ). Given increasingly recognized secular trends in findings from clinical trials that include rising placebo‐associated effects and lower effect‐sizes, if indirect comparisons are made, their limitations should be clearly identified.  
 
METHODS
 
Definition of key terms
An important limitation of the draft AHRQ report is that several key terms are not defined. Instead, we urge inclusion of a list of abbreviations used, and definitions of key terms and constructs. Further, studies selected for analysis should use terms consistent with the definitions provided. It is impossible to list all terms that should be defined, but they should include, at least: ‘effectiveness,’ ‘efficacy,’ ‘comparative effectiveness,’ ‘comparative efficacy,’ ‘first generation antipsychotics (FGA),’ ‘second generation antipsychotics (SGA),’ ‘total psychosis score,’ and others. Failure to provide clear definitions leads to both confusion and potential misinterpretations.  
 
Comparison AHRQ and Leucht reviews
The AHRQ report provides comprehensive analyses of apparent comparative efficacy of first‐generation (FGA) versus second‐generation agents (SGA) arising from direct, head‐to‐head comparisons in individual, randomized, controlled trials. However, three of the SGAs (clozapine, olanzapine, risperidone), involve results that do not accord with a similar analysis of their efficacy compared to FGAs in a recent, well conducted meta‐analysis by Leucht et al. (Lancet, 2009). Given the high quality of the methods used in the Leucht meta‐analysis, the AHRQ report should note their differences and attempt to explain them, including consideration of approaches to identifying, retrieving, analyzing and interpreting data. Some differences in approach between the AHRQ report and that of Leucht et al. that may have led to differing conclusions are provided below. In particular, it may be instructive to consider the comparison of responses to haloperidol vs. risperidone based on BPRS or PANSS ratings.  
 
Among the 20 comparisons of FGAs versus individual SGAs in the AHRQ report (Table 6, page 20), 12 (60%) involved only one study. For the key efficacy measures (changes in BPRS or PANSS total scores), there were 14 studies of olanzapine‐vs.‐haloperidol and 20 studies of risperidone‐vs.‐haloperidol. The modest numbers of trials involved in these comparisons, severely limit statistical power to identify differences or variance in outcomes (page 15), and limit the value of the comparisons. In contrast, Leucht et al. (2009) pooled outcomes for all FGAs, and compared individual SGAs with pooled FGAs or with haloperidol (in a sensitivity analysis). Haloperidol was the most‐often used FGA in both the AHRQ report (Table 6) and in Leucht et al. (2009).  
 
Sampling
The search cut‐off date for the AHRQ report was July 2010, or nearly four years later than October 2006 in Leucht et al. (2009). Nevertheless, by combining key efficacy measures (PANSS, BPRS or BPRS derived from PANSS scores), Leucht et al. (2009) included a greater number of double‐blind, randomized, controlled trials (RCTs) in comparisons of individual SGAs vs. FGAs for efficacy assessments (aripiprazole [n=5], clozapine [23], olanzapine [28], quetiapine [11], risperidone [34], and ziprasidone [5]). This greater number of trials allowed separate analyses for: [a] comparison of individual SGAs with haloperidol, [b] separate analysis of short‐term studies (<12 weeks), and [c] exclusion of studies conducted in Asia as potentially incomparable clinically to Western trials. Results were consistent with those obtained in pooling all double‐blind studies (see Leucht et al. [2009], web Appendix cs4.pdf, pages 5 to 10).
 
Dosing variance
The AHRQ draft report states “we combined data across the available dosing arms before conducting the meta‐analysis” (AHRQ report, page 16). In contrast, the report by Leucht et al. (2009) states that “For fixed‐dose studies, we selected only those with optimum doses of second‐generation antipsychotic drugs as reported in dose‐finding studies” (Leucht et al. [2009, p 32). These included the following total daily mg‐doses: amisulpride 50–300 for predominantly negative symptoms and 400–800 for positive symptoms; aripiprazole 10–30; olanzapine 10–20; quetiapine >250; risperidone 4–6; sertindole 16–24; and ziprasidone 120–160 (Leucht 2009, page 32). Because most fixed‐dose trials included multiple arms and clinically suboptimal dose groups for SGAs, estimates of treatment‐effects extracted from pooled dosing arms can be seriously biased. For example, comparison of a fixed daily dose of haloperidol 10 mg (n=226) with a range of doses of risperidone 1–16 mg (n=1136) as reported by Pueskens et al. (1995) is unlikely to be clinically meaningful.  
 
Estimates of treatment effects
For individual continuous outcome measures, the AHRQ report “extracted the mean with the accompanying measure of variance for each treatment group or the mean difference (MD) between treatments” and “analyzed continuous data as post‐treatment score or absolute difference (or change score) from baseline” (page 14). In contrast, Leucht et al. (2009) “assessed the mean overall change in symptoms, with the following order: change in PANSS total score from baseline, if not available then the change in the Brief Psychiatric Rating Scale (BPRS), and then values of these scales at study endpoint” (page 32).
 
Study selection‐bias
In the AHRQ draft report, comparisons of haloperidol (total N=676) and risperidone (N=1875) based on BPRS total‐scores involved 12 studies in patients with acutely exacerbated (7 studies) or chronic schizophrenia (5 studies) (page K‐16). An overall estimate of efficacy was calculated as a weighted average of the difference in group means. Small variances were observed with haloperidol (least squares (LS) mean –0.14 ± 0.22 [SD], N=30) and with risperidone (LS mean 0.14 ± 0.23, N=33) in a 2‐year maintenance trial involving initially clinically stable schizophrenia patients (Marder et al. 2003), and 44.4% of total weighting for meta‐analysis was assigned to this single study. In contrast, Leucht et al. (2009) assigned a relatively small weight to this trial among a total of 34 studies (see in Figure 1f in Leucht et al. [2009], web Appendix). The second largest weight (34.9%) was assigned to a practical clinical trial for comparing haloperidol (mean improvement: –23.1 ± 1.40, n=56) and risperidone (–23.9 ± 1.4, n=61) involving short‐term treatment of a variety of first‐episode nonaffective psychotic illnesses (Crespo‐Facorro et al. 2006). It is particularly troubling that the heavily‐weighted Marder et al. (2003) study followed patients who were already stably‐treated at baseline, with little expected improvement and low variance, whereas many other trials involved acutely exacerbated psychotic patients with greater room for improvement, but larger variance. It is questionable whether such dissimilar trial can fairly be pooled.
 
The AHRQ draft report states that 8,798 records were screened 1,162 (13%) were considered further, and only 122 (1.4%) included for analysis. This highly selected sampling surely required a great deal of reviewer‐judgment and risk of including many reports that most likely provided insufficient information to exclude major potential biases.  
 
It is also likely that trials comparing an index antipsychotic agent with placebo and not an active comparator were excluded from analysis or that those that may have been included lacked a placebo‐control group, or were biased toward unmatched subject‐counts and variance across treatment‐arms, as is typical of trials that include both placebo and active comparators. This possibility should be addressed and clarified in the report.  
 
Analytical models
The reported AHRQ analysis employed a DerSimonian‐Laird method of meta‐analysis to estimate the average true effect and its variability of FGAs (most often haloperidol) and various SGAs. Given the small number of studies involved in most of the comparisons, can introduce statistical problems with this method, and suggest that limited statistical power and sub‐optimal methods to pool the available research findings are likely.
 
Quality and bias assessment
The AHRQ report states that “The overall assessment was based on the responses to individual domains. If one or more of the individual domains had a high risk of bias, we rated the overall score as having high risk of bias. We rated the overall risk of bias as low only if all components were assessed as having a low risk of bias. The overall risk of bias was unclear for all other studies” (page 13). Currently, the use of quality scores in meta‐analysis is discouraged in favor of considering individual aspects of study design, such as dropout rates.
 
Assessments of depressive and anxiety symptoms
Page 4 of the AHRQ report states, “The MADRS is a generic depression diagnostic questionnaire used by psychiatrists to measure the severity of depressive episodes.” The MADRS is neither “generic” nor “diagnostic,” rather, it is a depression symptom severity measure intended for use in subjects already diagnosed with major depressive disorder. It has also been used empirically in samples involving other diagnoses, but the levels of psychometric support for such applications are limited. These statements should be corrected, and a rationale for considering depressive symptoms in schizophrenia patients provided. 

There are additional basic errors in understanding in the AHRQ report of how other scales should be used. Examples include: [i] the Covi Anxiety Scale is referenced as a measure to evaluate positive symptoms for answering key question #1 (page 53); however, this scale is not designed to assess positive symptoms, and is therefore not an appropriate measurement tool to address this aspect of key question #1; [ii] the Calgary Depression Scale for Schizophrenia (CDSS) is used to compare negative symptoms; however this rating scale designed to assess depressive symptoms and not negative symptoms of schizophrenia; [iii] YMRS, MADRS, and CGI‐BP are listed as “Measures of neurocognition for schizophrenia” although they are, respectively, measures for mania, major depression symptoms, and overall clinical impressions regarding bipolar disorder. These inconsistencies and errors are likely to affect evaluations of antipsychotic drug treatments among subgroups (key question #5, page 54). They require correction and further discussion and justification for their use in the AHRQ report.  
 
Bipolar Disorder
It is highly questionable to lump comparisons involving such dissimilar illnesses as bipolar disorder and schizophrenia (p 60), especially as bipolar disorder presents in a range of episodic clinical syndromes (mania, mixed‐states, depression, sometimes psychosis) and levels of severity. For key question 1, comparison of treatment with olanzapine and haloperidol focuses on sleep, which may not respond similarly in bipolar disorder and in schizophrenia. This comparison also involves a “total score,” which is not adequately defined, and the descriptions involved will be hard to follow by non‐experts.  
 
Moreover, comparisons of antipsychotic treatments among bipolar disorder patients (Table 51) are confounded by differences in doses of haloperidol used for this disorder and for schizophrenia. A key trial comparing olanzapine with relatively low doses of haloperidol for bipolar disorder patients, found “no difference” between the treatments. On the other hand, a major comparison of haloperidol with ziprasidone in bipolar disorder used relatively high doses of haloperidol, and found haloperidol to be significantly superior to the SGA. These dissimilar conclusions may simply reflect differences in dosing, and call for comment.  
 
REFERENCE

Leucht S, Corves C, Arbter D, Engel RR, Li C, Davis JM, Second‐generation versus first‐generation antipsychotic drugs for schizophrenia: a meta‐analysis. Lancet. 2009 Jan 3;373(9657):31‐41. Epub 2008 Dec 6.
 
FINAL STATEMENT

Again, ISCTM thanks AHRQ for the opportunity to review this draft document. It is our hope that our comments will be helpful for revising the report. Should there be questions about any of our comments, we would be happy to discuss them with the AHRQ authors.
 
Contact Information:
Larry Alphs, President ISCTM
lalphs@its.jnj.com
 
Carlotta McKee, Executive Director ISCTM
cmckee@isctm.org
 
ISCTM
PO Box 128061
Nashville, TN 37212
www.isctm.org