Assessing the quality of reports of systematic reviews in pediatric complementary and alternative medicine

Objective To examine the quality of reports of complementary and alternative medicine (CAM) systematic reviews in the pediatric population. We also examined whether there were differences in the quality of reports of a subset of CAM reviews compared to reviews using conventional interventions. Methods We assessed the quality of reports of 47 CAM systematic reviews and 19 reviews evaluating a conventional intervention. The quality of each report was assessed using a validated 10-point scale. Results Authors were particularly good at reporting: eligibility criteria for including primary studies, combining the primary studies for quantitative analysis appropriately, and basing their conclusions on the data included in the review. Reviewers were weak in reporting: how they avoided bias in the selection of primary studies, and how they evaluated the validity of the primary studies. Overall the reports achieved 43% (median = 3) of their maximum possible total score. The overall quality of reporting was similar for CAM reviews and conventional therapy ones. Conclusions Evidence based health care continues to make important contributions to the well being of children. To ensure the pediatric community can maximize the potential use of these interventions, it is important to ensure that systematic reviews are conducted and reported at the highest possible quality. Such reviews will be of benefit to a broad spectrum of interested stakeholders.


Introduction
Healthcare providers, consumers, and others cannot keep up-to-date with the healthcare literature. For example, healthcare professionals attempting to keep abreast of their field would need to read, on average, 19 original ar-ticles each day [1]. Systematic reviews offer the potential to reach that elusive goal of keeping up-to-date without sacrificing quality and thoroughness. There has been a striking increase in the number of published systematic reviews, particularly of RCTs. One of the first 'medical' sys-tematic reviews was published in 1955 [2]. Currently, in the Cochrane library alone, there are more than 1000 published systematic reviews and several hundred protocols even though it has existed for only seven years.
Systematic reviewers have little control over random errors but can exert some influence over systematic errors (bias). Therefore, evaluating the quality of reports of systematic reviews is likely to provide important information to gauge the extent of these problems in any given report. There have been a several publications examining the quality of reports of systematic reviews of conventional interventions, such as pharmaceuticals. Overall, the data indicate that systematic reviews are not reported optimally [3][4][5].
More recently attention has been focused on a similar examination of systematic reviews in complementary and alternative medicine (CAM) [6][7][8]. The result of these investigations is similar to those found in the conventional medicine literature. Much of this evidence pertains to the adult population with far less attention devoted to evaluating the pediatric literature.
We set out to examine the quality of reports of CAM systematic reviews in the pediatric population (PedCAM). We examined whether there were differences in the quality of reports of PedCAM reviews compared to reviews in conventional medicine. We also evaluated how specific issues in the systematic review process, such as searching to identify potentially relevant primary studies, are reported by reviewers.

Methods
Two main sources were used for identifying systematic reviews. Secondly, bibliographic databases were searched to identify additional studies. Several incidental sources were discovered and reviewed, for instance, a list of systematic reviews of herbal medicinal products published as a British Medical Journal 'web extra' [9]. The databases searched are reported in Table 1 and the search strategies are presented in see Additional file 1. appendix.rtf.
All bibliographic records were imported into a reference database where duplicate items were resolved. Bibliographic records or, in many cases, full copies of the documents were screened for inclusion according to the following criteria: 1) one or more studies in the review included children, 2) a CAM therapy was investigated and 3) the article was a systematic review. The screening was completed in an unblinded fashion. Eligibility was determined by a single reviewer (MS) because the criteria were relatively objective, and many reviews had been pre-qualified through the University of Maryland Complementary Medicine Program decision process and were only reviewed to determine whether the review included studies involving children. When there was any doubt about eligibility, a second reviewer (DM) examined record and a final decision was arrived at by consensus. We attempted to match each PedCAM report with conventional pediatric comparator reviews of the same disease drawn from our existing collection [10][11][12][13]. We were able to match 17 PedCAM systematic reviews with 19 conventional therapy reviews.
One member of the group (KS) extracted descriptive information and aspects related to the conduct of each systematic review report using a 37-item structured data collection form. The questions pertained to the type of

Figure 1
Flow of citations and articles through the phases of screening and eligibility evaluation.
*42 additional systematic reviews were retained for screening after removing duplicates between the registry and all databases searched. 47 Systematic reviews included CAM used, the International Classification of Disease (ICD-9) under investigation, how the studies in the review were identified, the number, gender, and ages of the included children, the number and type of outcomes used, the data synthesis conducted, information about handling of heterogeneity and publication bias, the reporting of adverse events and cost information (the complete questionnaire can be obtained from the authors).
Two members of the research team (DM, MS) completed a comprehensive quality assessment of each report using the Oxman and Guyatt validated scale [14]; [15]. This instrument includes nine items pertaining to individual aspects in the reporting of a systematic review (e.g., were the search methods used to find evidence on the primary question stated?). Each item is assessed using a threepoint scale (i.e., no, partially/can't tell or yes). A final question elicits an overall scientific quality of the systematic review based on the previous items. The scoring ranges from one to seven, with higher scores indicating superior quality. We did not complete any formal training prior to evaluating the systematic reviews. We have extensive experience using this assessment tool and have previously conducted training with results indicating substantial agreement between raters [13,16]. Discrepancies were resolved by consensus between both raters.
Fisher's exact test was used to compare the two types of systematic reviews with respect to their quality assessment on each of the first nine items of the Oxman and Guyatt scale. The Kruskal Wallis test was used to assess the difference in the overall scientific quality of the both types of systematic reviews.

Results
We screened 479 potentially relevant articles. Of these 432 failed to meet our inclusion criteria, or were otherwise unusable. The vast majority of reports were excluded (84%) because they did not include any randomized trial involving children ( Figure 1).
The reviews were recent with 1998 being the median year of publication. Diseases of the nervous system and sense organs, mental disorders, and respiratory system were the most common ICD categories investigated (Table 2). Psychotherapy and vitamins were the most common interventions examined ( Table 3).
The median number of primary studies included in the reviews was 12, of which 9 (median) included children. The median number of participants was 604, of which 362 (median) were children. Although 40% of the reviews were limited to children only, another 32% of the reviews did not provide separate results for children included. Only 2 (of 47) reviews reported sex distribution of the included children and none reported the age distribution of included children.  [4] 0 0.0% Diseases of the genitourinary system [10] 0 0 . 0 % Congenital anomalies [14] 0 0 . 0 % Supplemental classification of external causes of injury and poisoning [18] 0 0.0% Supplementary classification of factors influencing health status and contact with health services [19] 0 0.0% Medline was the most commonly searched database to help identify primary studies for inclusion in the reviews. The Cochrane library was the second most commonly searched database (Table 4). However, only about half of the reviews (51.1%) reported the years of coverage for the search. The same percentage of reviews reported the search terms used while only a minority (8.5%) of reports reproduced the entire search. Reviewing reference lists was the most commonly reported other search method used to identify potentially relevant primary studies (Table 4). About two thirds of the reviews (68.1%) reported including unpublished material when it existed and most reviews (78.7%) did not report having any primary study language restrictions.
The quality of reporting of all 47 PedCAM reviews is presented in Table 5. Authors were particularly good at reporting: eligibility criteria for including primary studies, combining the primary studies for quantitative analysis appropriately, and basing their conclusions on the data included in the review. Reviewers were weaker in reporting: how they avoided bias in the selection of primary studies, and how they evaluated the validity of the primary studies. Overall the scientific quality of the reports achieved 43% (median = 3) of their maximum possible total score (Table 5).
We were able to use 17 PedCAM systematic reviews along with 19 conventional therapy reviews (available from the authors upon request) and compare their quality ( Table  5). The PedCAM reports were always assessed as higher quality for all nine items on the Oxman and Guyatt scale and this reached statistical significance for one item ( Table 5). There was no difference in the overall scientific quality of either type of review with both types achieving 43% of their maximum possible score.
Approximately a third of the reviews (38%) reported evaluating statistical heterogeneity. Less than a quarter of them (17%) reported assessing for the presence of publication bias. Information regarding adverse events was reported in less than a quarter of the reviews (14.9%). Similarly, information regarding costs (e.g., cost effectiveness) was only reported in one review. About half  (55.3%) of the reviews reported the funding source for the systematic review.

Discussion
We found relatively few systematic reviews that focused on evaluating CAM interventions in a pediatric population. This contrasts sharply with the 1468 PedCAM randomized trials known to exist [17,18]. This suggests that the pediatric CAM community have not synthesized all of the available evidence. We found information to support this contention. In our efforts to identify PedCAM randomized trials we identified 36 studies that evaluated the efficacy and/or effectiveness of CAM interventions, such as hypnosis, for the management of pain in a variety of settings. However, we were unable to identify any systematic reviews that had pooled this evidence.
In 1989 Crowley and colleagues [19] published a systematic review evaluating the benefits of corticosteroids (versus placebo) in reducing mortality and morbidity (e.g., respiratory distress syndrome) in expectant mothers of premature infants. Their review showed that corticosteroids were significantly more effective in reducing mortality and morbidity. These researchers also noted that this evidence was available 10 years earlier (i.e., 1979) had it been synthesized. Because of the delay in pooling this evidence it is likely that some children suffered unnecessarily. The results of this review were subsequently used as the logo of the Cochrane Collaboration.
Our results indicate that the quality of reporting of the PedCAM systematic reviews, and their conventional medicine comparators, is similar and less than optimal, with considerable room for improvement. The quality of reports of systematic reviews can influence reviewers' conclusions concerning the effectiveness of an intervention. After examining 51 systematic reviews on the effectiveness of spinal manipulation reviewers' were more likely to judge the intervention as positive if the report was of high quality [20]. Resources exist to help ensure that appropriate methods are used to conduct and report systematic reviews [21][22][23][24]. Another way to improve the quality of reporting of PedCAM systematic reviews is for more pediatric journals to endorse the QUOROM statement [25]. A multi journal evaluation of QUOROM has recently been completed and the results are being written up currently. Examining the quality of reporting is 'after the fact' when the review is already completed. QUOROM can also be used by granting agencies to encourage prospective systematic reviewers to improve the conduct of their reviews. This is already starting to happen in the conduct and reporting of randomized controlled trials [26].
Conducting and reporting systematic reviews with the highest possible quality is likely to minimize the possibility that their results are influenced by bias, enabling clinicians to be more confident of using them in their practice. Reports of PedCAM systematic reviews seem particularly weak in terms of the comprehensiveness in their search to identify primary studies. For example, excluding a report identified uniquely in Embase, compared to its inclusion, can statistically exaggerate the estimates of an intervention's effectiveness by 6%, on average [13].
Systematic reviewers of the CAM literature appear to be more conscious of the consequences excluding primary studies published in languages other than English. Excluding CAM trial reports in languages other than English, compared to their inclusion, is likely to exaggerate the estimates of an intervention's effectiveness by 37%, on average [27]. This result is interesting from two perspectives: it is contrasts that found when examining the conventional medicine interventions (i.e., no effect when excluding reports of trials in languages other than English), and most of the methodological research to date has focused on the impact of bias within conventional interventions. There is an important need to develop a research agenda that focuses specifically on the impact of bias when pooling CAM interventions.
This study had several limitations. Our focus was on the quality of reporting of PedCAM systematic reviews. It is possible that the reviews were appropriately conducted but had deficiencies in their reporting. Despite the paucity of data addressing this important question, the evidence that is available points in the direction of a reasonably good correlation between how investigators conduct their research and how it is subsequently reported [28], [29].
We only selected a sample of CAM studies to compare to reviews of conventional interventions. It is possible that our sample is not representative and that our results cannot be generalized to all PedCAM systematic reviews. We selected the reports to enable comparators investigating the same disease. Given that the quality of reporting of the sample is very similar to the quality of all 47 reviews we believe that our sampling approach is representative and enables us to generalize the observed results.
Evidence based health care continues to make important contributions to the well being of children. To ensure the pediatric community can maximize the potential use of interventions it is important to ensure that systematic reviews are conducted and reported at the highest possible quality. Such reviews will be of benefit to a broad spectrum of interested stakeholders.