Skip to main content

Network meta-analysis: users’ guide for pediatricians



Network meta-analysis (NMA) is a powerful analytic tool that allows simultaneous comparison between several management/treatment alternatives even when direct comparisons of the alternatives (such as the case in which treatments are compared against placebo and have not been compared against each other) are unavailable. Though there are still a limited number of pediatric NMAs published, the rapid increase in NMAs in other areas suggests pediatricians will soon be frequently facing this new form of evidence summary.


Evaluating the NMA evidence requires serial judgments on the creditability of the process of NMA conduct, and evidence quality assessment. First clinicians need to evaluate the basic standards applicable to any meta-analysis (e.g. comprehensive search, duplicate assessment of eligibility, risk of bias, and data abstraction). Then evaluate specific issues related to NMA including precision, transitivity, coherence, and rankings.


In this article we discuss how clinicians can evaluate the credibility of NMA methods, and how they can make judgments regarding the quality (certainty) of the evidence. We illustrate the concepts using recent pediatric NMA publications.

Peer Review reports


Randomized control trials (RCTs) constitute the optimal methodology to determine the effectiveness of medical interventions. When results against placebo or standard care suggest benefits outweigh harms, clinicians, patients and families must choose among several interventions. Making this choice optimally requires access to systematic summaries of the best available evidence.

For decades, investigators have provided these evidence summaries using systematic reviews and meta-analyses. By combining across studies, meta-analyses increase the precision of the effect estimate [1]. Conventional meta-analyses, however, address only single paired comparisons and are therefore of limited use when multiple reasonable options exist. One could envision a series of conventional meta-analyses addressing each possible paired comparison, but these have two major limitations. First, for the clinician or patient consumer, making sense of multiple meta-analyses would be challenging. Second, it is extremely likely that many of the possible paired comparisons will not have direct comparisons available; in such instances, there will be no conventional meta-analysis to consider.

Network meta-analysis (NMA), also known as multiple-treatment comparisons or multiple-treatment meta-analysis, provides a methodology to address this dilemma, taking advantage of two statistical innovations: the first is use of indirect comparisons—we can estimate the effect of A-B indirectly if both A and B have been compared against C (see next section). The second is that NMA statistical methods combining direct and indirect comparisons allow estimates of the relative effect of every alternative versus every other alternative.

Although the majority of published NMAs summarize evidence from RCTs, NMA of cohort studies – most often addressing the evidence regarding adverse events - are increasing [2, 3]. Moreover, given the recent development of the required methods, diagnostic test accuracy NMA may soon be available [4].

The first NMA addressing a pediatric issue evaluated the effects of indomethacin, ibuprofen, and placebo on patent ductus arteriosus closure in preterm infants [5]. Since then, the number of pediatric NMAs has increased [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] and, given development in other fields, one can anticipate a substantial further increase. This increase might, however, occur at a slower rate in the pediatric field because of the smaller number of RCTs relative to the adult literature.

The goal of this paper is to provide a users’ guide for pediatricians considering the application of the results of NMA addressing a therapeutic issue to their practice. Nonetheless, a minimum knowledge on Conventional meta-analysis is needed to understand most of the important concepts of NMA [24]. First, we introduce the reader to NMAs and provide criteria for evaluating the credibility of the NMA method. We then discuss the quality of the evidence (synonyms: certainty or confidence in evidence) obtained from a NMA (the NMA may have used optimal methods, but limitations of the underlying studies may still result in low quality evidence). To illustrate the processes of interpretation and implementation in the context of pediatric literature, we will present an example of the effects of 16 different mechanical ventilation modes on mortality among preterm infants with respiratory distress syndrome (RDS) [9], in addition to other examples from the pediatric literature when we could not illustrate the presented concepts using the mechanical ventilation NMA.


Indirect evidence

Let us suppose that we are interested in the relative merits of two treatments, A and B. It may be that no study has directly compared the two treatments. If, however, investigators have compared both A and B against the same third alternative C, we can infer the relative effect of A-B. We do so by comparing the effect of A-C and B-C (the indirect comparison, Fig. 1.1).

Fig. 1
figure 1

The concept of network meta-analysis. Each node (circle) is considered an intervention (A, B or C), sold lines represent loops of pairwise comparison (direct evidence), and doted lines represent loops of indirect comparison (indirect evidence). Indirect comparisons can be made via deduction from the common comparator. 1.1. Indirect evidence of A versus B inferred from direct estimates of A versus C and B versus C Four studies formed the effect estimate for A-C, and 3 studies formed the effect estimate for C-B. The effect estimate of A-B was obtained from indirect evidence. 1.2. Closed network shows the a closed network meta-analysis in a hypothetical example where all interventions were compared in RCT’s, therefore; direct and indirect evidence is available for all comparisons

For instance, if the relative risk (RR) of death in A-C is 0.5 (A reduces deaths relative to C by 50%) the RR of death in B-C is 1.0 (B has no effect on deaths relative to C), then it would be reasonable to infer that A will reduce death relative to B by 50%. Furthermore, if investigators have conducted both direct and indirect comparisons, we can combine the two and produce a mixed or network estimate (Fig. 1.2).

Network meta-analysis

Ideally, an NMA will depict the available direct evidence in a figure; we refer to as a network graph. The circles (nodes) represent each intervention, and the lines between the nodes (called edges) represent head-to-head comparisons (Fig. 2) [25]. Some network graphs use the size of the nodes and the width of the edges to convey information about the amount of information available (circles convey the sample size of studies of a particular intervention and edges the number, sample size, or variance associated with the related direct comparisons, i.e. large node means larger sample size, and thick edge means increased number of studies included).

Fig. 2
figure 2

The geometry of the mechanical ventilation for premature infants NMA. A/C, assist-control ventilation; VG, volume guarantee ventilation; RM, recruitment maneuver; CMV, continuous mandatory ventilation; HFFIV, high-frequency flow interrupted ventilation; HFJV, high-frequency-jet ventilation; HFOV, high-frequency oscillatory ventilation; IMV, intermittent mandatory ventilation; PSV, pressure support ventilation; PTV, patient-triggered ventilation; SIMV, synchronized intermittent mechanical ventilation; SIPPV, synchronized intermittent positive pressure ventilation; V-C, volume-controlled. Wang C et al. Mechanical ventilation modes for respiratory distress syndrome in infants: a systematic review and network meta-analysis. Critical care (London, England). 2015, reprinted by permission of the publisher [9]

In comparison to conventional meta-analysis that relies exclusively on direct evidence, the NMA provides estimates of relative effectiveness among all interventions being compared, increases precision around effect estimates, ranks treatments, and enhances generalizability [26,27,28].

Credibility of NMA methods

The conduct of NMA should adhere to standards of a traditional systematic review. Like a conventional meta-analysis, a credible NMA requires explicit eligibility criteria, comprehensive search, and assessment of evidence quality (Table 1).

Table 1 Guide for appraising NMA evidence

Did the review explicitly address a sensible question?

A well-formulated clinical question will typically follow the PICO format (P: population, I: intervention, C: comparator, O: outcomes) [29]. NMA uses the same format except that “I” and “C” (intervention and comparisons), include all the interventions compared against each other. Successful definitions of each element of the PICO are required to determine the studies eligible for the review and develop a priori hypotheses to address possible heterogeneity.

Although, the scope of the research question can vary from narrow to broad, it is essential that for any paired comparison within the NMA, it is plausible that we will, for each outcome of interest, observe similar effects across all patient populations being addressed [30, 31]. Eligibility criteria can be wide enough to permit the possibility of differences in effect across the included patients, interventions, and outcomes. For instance, effects may differ – among eligible studies- in more or less severely affected patients; across high and low doses and across shorter and longer follow-up.

An NMA that assessed the efficacy of asthma treatments strategies included all children with chronic asthma [12]. The definition of chronic asthma was not based on the Global Initiative for Asthma (GINA) asthma guidelines staging [32], nor did the authors present data on disease severity, or attempt subgroup analyses. The broad inclusion criteria and lack of subgroup hypotheses fail to address the differences in disease severity that might lead to differences in treatment response [33].

Another example relates to differences in the measurement of outcome [27, 34]. Two systematic reviews in asthma began with the goal of conducting an NMA; only one was successful. The first study evaluated the effectiveness of the various inhalation regimens on FEV1 improvement [18]. The systematic review revealed large variations in the way the 23 trials measured and reported FEV1. This heterogeneity prevented the review team from performing an NMA. The second NMA assessed the efficacy of treatments on reducing exacerbation [12]. Severe exacerbation was defined as patients needing hospital admission, a visit to the emergency department or a standard course of systemic corticosteroids. In this case, outcomes were reported similarly across trials and the authors presented pooled estimates.

Was the search for studies and selection comprehensive?

A comprehensive systematic search that identifies all pertinent available studies minimizes the risk of spurious findings from unrepresentative selection of studies. Since many reviews articles have demonstrated the inadequacy of searching only one database [35,36,37], an optimal search include all relevant electronic databases (e.g., Medline, Embase, Psycinfo, CENTRAL, CINAHL) [38]. Ideally, a search of the grey literature will minimize the risk of publication bias.

Subsequently, the team selects eligible studies [38]. The report should provide evidence of the reproducibility of assessment of study eligibility through review by at least 2 independent assessors, and present a figure summarizing each selection step in the eligibility determination process (identification of titles and abstracts; culling of titles and abstracts; review of full texts; final determination of eligibility) [39].

Did the review assess evidence certainty?

Certainty in effect estimates represents how trustworthy are the results and their conclusions [31]. Within any network, it is likely that the quality of the evidence differs across paired comparisons: high quality evidence may reveal that one treatment is superior to another, whereas we may have only low quality evidence regarding the relative merit of other treatments.

Making that rating requires a sequence of judgments relying on assessments of the quality of the direct and indirect evidence. Three articles were published on 2014 by the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) working group, the Cochrane Collaboration, and the ISPOR-AMCP-NPC good practice task force [30, 31, 40] that extend quality of evidence assessment of meta-analysis to NMA. Following the GRADE approach, the overall confidence starts as high for direct, indirect, and network estimates that are derived from RCTs [31]. The evidence can be rated down from high to moderate, low, or very low quality based on the presence and magnitude of any of the 5 domains: Risk of bias (RoB), indirectness, imprecision, inconsistency, and publication bias [31].

Many prior published NMAs have not explicitly addressed all the recommended elements. Fortunately, however, some present the information required for a reader to make the necessary judgments. If the information is not available, then the credibility of the NMA is compromised [31].

Consider, for instance, the GRADE profile for the direct evidence of an NMA of antidepressant medications for improving depression symptoms in children (Table 2) [41]. The evidence certainty for Fluoxetine versus placebo was rated as very low as a result of high RoB, imprecision, and inconsistency. The Imipramine versus placebo comparison was rated as moderate, the only concern being imprecision. With this variation in evidence certainty, making sense of the results requires ratings of evidence quality for each pairwise comparison.

Table 2 GRADE evidence profile showing differences in the evidence certainty among two direct evidence comparisons in the depression treatment NMA for depression symptoms

How do NMAs conduct analyses and present results?

There are two statistical approaches to perform NMA: a frequentist and a Bayesian approach [41, 42]. The frequentist approach is what clinicians will generally see in individual RCTs and conventional meta-analyses. The additional major aspect of Bayesian approaches is the specification of prior probabilities of treatment effects before beginning the data analysis and combining these priors and their precision with the estimate from the data to produce a posterior probability and its credible interval. Results in NMA are presented as effect estimates, typically odds ratios (ORs), or RR, hazard ratio (HR), mean difference (MD) with their 95% confidence interval (CI) (frequentist approach) or credible interval (CrI) (Bayesian approach), both of which describe the range of plausible truth around the point estimate.

Ideally, NMAs will present direct, indirect, and network estimates for each paired comparison. When, however, there are large numbers of comparisons, this becomes a challenging task. For example, the mechanical ventilation modes for RDS in preterm infants NMA included 16 different ventilation modes, yielding 120 comparisons- this probably requires an online appendix [9]. Ways to deal with this profusion of comparisons is to present effect estimates in a league table (all possible pairwise interventions compared to each other by cross-matching the interventions on the raw with those in the column), forest plots (all pairwise interventions compared to one reference intervention, or to the least efficacious intervention such as placebo), or evidence comparisons (direct, indirect, and NMA) for each intervention compared to one reference [13, 41, 43, 44].

Certainty of NMA evidence

What is the risk of bias of included studies?

RoB conveys the likelihood that limitations in design or conduct of studies will result in estimates of treatment effect that vary systematically from the truth. The greater the RoB, the more appropriate it becomes to rate down the quality of the evidence [45, 46].

For assessing the RoB, authors may use an instrument such as the Cochrane RoB tool for RCTs [38]. This instrument assesses six elements: randomization sequence generation, concealment of allocation, blinding of participants, personnel and outcome assessors, completeness of follow-up, selective outcome reporting, and presence of other biases.

In the NMA of strategies for preventing asthma exacerbations, the authors used the Cochrane instrument to assess RoB [12] and judged all trials to be at low RoB. Although the authors did not provide an overall RoB judgment per comparison, it is possible -although tedious- for the pediatrician to make this rating if the NMA authors have presented ratings of RoB for each study in a table or figure. In this case, it is not a problem: since all studies were at low RoB, there is no need to rate down for RoB for any comparison.

Were the results precise?

The lack of adequate power to inform a particular outcome leads to imprecision [47]. One standard for assessing precision is to consider whether differences between intervention and control exclude chance (i.e. statistically significant). This has two limitations: first results may exclude no effect, but may not exclude an effect too small to be important; second, using this criterion, one would always rate down for precision if results were not statistically significant, no matter how narrow the CI or CrI.

Therefore, we suggest an alternative standard. To assess imprecision, one can consider whether decisions regarding choice of therapy will differ if the upper and lower CI or CrI represents the truth. Another way of thinking about this approach is to consider whether the CI or CrI excludes a minimally important difference (MID). The MID is a measure of the smallest change in the value of a patient-reported outcome, typically applied to outcomes such as quality of life measures [48].

For example, in the NMA of ventilation modes for infants, the comparison of synchronized-intermittent mechanical ventilation with volume-guarantee (SIMV+VG) versus high-frequency-jet-ventilation (HFJV), the point estimate suggests that SIMV+VG reduced mortality (HR = 0.23) [9]. However, the 95%CrI ranged from an extremely large reduction in mortality (HR = 0.03, reduction in hazard by 97%) to an almost doubling of hazard (HR = 1.46). Since the treatment choice will be different at each CrI end, the evidence quality is reduced for imprecision.

On the other hand, for the comparison SIMV+VG versus SIMV with pressure-support ventilation (SIMV+PSV), mortality was lower with SIMV+VG (HR = 0.12; 95%CrI 0.01, 0.86). Here, even the upper suggests a 14% reduction in hazard with SIMV+VG. Therefore, in this instance, there is no need to rate down the quality of the evidence for imprecision. Although, the width of the CrI may still be considered large and thus could be considered imprecise for outcomes such as hospital length of stay, any but the smallest reduction in mortality is critical. The judgment of importance is critically dependent on the absolute difference, in this case the absolute mortality risk difference: for instance, for 27 weeks infants with baseline mortality risk of 10%, the absolute mortality risk reduction with SIMV+VG versus SIMV+PSV would approximately be 9% if the point estimate of the HR (0.12) were accurate, and approximately 1.4% if the upper boundary of the CrI (0.86) represented the truth. The magnitude of the absolute difference is greater for even younger infants with higher mortality, and less for older infants with lower mortality (Table 3) [49,50,51].

Table 3 Anticipated absolute mortality among premature infants using SIMV+VG versus SIMV+PSV

In a complementary approach, authors can, for each direct comparison, assess imprecision by calculating the optimal information size (OIS), the number of patients or events needed for adequately powered individual study to avoid spurious findings [47]. This, however, ignores the contribution of the indirect comparisons to the network estimate. Methods to incorporate indirect estimates of OIS to NMA are under development [26].

Were results consistent across studies?

One can expect variation between treatment effects –we call such variation “heterogeneity”. Heterogeneity can result from chance, or from differences in patients, interventions, comparisons, outcomes and methodology between studies (Table 4).

Table 4 Possible effect modifiers that may contribute to between study variability

Assessing the degree of inconsistency in direct comparisons involves inspecting the point estimates and the degree of confidence or credible intervals overlap of each study in a forest plot. Two methods for formal statistical testing can complement visual inspection of forest plots – the test for heterogeneity (Cochran’s Q-test), and I2 (which quantifies the proportion of the total heterogeneity that is attributable to differences between the studies and ranges from 0 to 100%) [38].

For example, in the chronic asthma NMA, the authors presented direct comparison between low-dose inhaled corticosteroids (ICS-L) and placebo for moderate or severe exacerbation. Six trials contributed to the pooled estimate OR = 0.41 (95%CrI 0.29, 0.56). The forest plot shows similar point estimates, and CIs overlapped across all trials. The P-value for heterogeneity assessment was 0.54 (not significant), and I2 = 0% (Fig. 3), indicating a high level of consistency between results. Conversely, if there is substantial heterogeneity that is unexplained by subgroup analysis or meta-regression, we lose confidence in treatment effects and, in the GRADE approach, rate down the quality of evidence for inconsistency [31, 34, 52].

Fig. 3
figure 3

Forest plot comparing ICS-L vs. placebo for moderate or severe asthma exacerbations. Visual assessment indicates low heterogeneity, similar point estimates, overlapped CI, and I2 = 0 [12]. Zhao Y, et al. Effectiveness of drug treatment strategies to prevent asthma exacerbations and increase symptom-free days in asthmatic children: a network meta-analysis. The Journal of asthma: official journal of the Association for the Care of Asthma. 2015, reprinted by permission of the publisher (Taylor & Francis Ltd., [12]

How trustworthy are the indirect comparisons?

Trustworthiness of indirect comparisons - for instance, inferring the relative effect of A-B from A-C and B-C comparisons -requires similarity of patient population, comparators, outcomes, RoB, and optimal administration of the interventions under consideration (Fig. 4). In other words, A and B must both be optimally administered; the A-C and B-C comparisons must include similar patients; C must be similar; outcomes must be measured similarly; and studies would ideally be at low RoB. We refer to situations when this is not the case as “intransitivity”. Intransitivity reduces confidence in the results of indirect comparisons.

Fig. 4
figure 4

The diagram shows the concept of intransitivity. The doted line AC shows the indirect evidence were inferences are being made. B is not shown as a unique intervention, rather as two different ways of B (Blue and Red). Intransitivity can occur when the distribution of a possible effect modifier is different between two groups

To illustrate the concept of intransitivity consider an NMA of comparative efficacy of psychotherapies for depression in children [10]. The comparison of interest is cognitive behavioral therapy (CBT) versus Problem-solving therapy (PST). We wish to make inferences regarding the effects of CBT versus PST from an indirect comparison: studies have compared both CBT and PST to wait list (WL) controls. The 14 RCTs comparing CBT versus WL used 8 different instruments to define depression; the 3 RCTs comparing PST versus WL (Table 5) used 2 of the 8, and a ninth that was not used at all in the CEB versus WL studies. Use of the different instruments could create differences in depression severity in the population that in turn could influence the magnitude of the treatment effect, suggesting possible intransitivity and consideration of consequent rating down of quality.

Table 5 Depression definition used in the psychotherapies NMA in the wait list (the common comparator) to illustrate the concept of intransitivity in the indirect evidence

Were results consistent between direct and indirect comparisons?

Whenever a closed loop is present (Fig. 1.2, and Table 6) there is a possibility that the available direct and indirect comparisons will yield very different effect estimates, a condition we refer to as incoherence, or “inconsistency” used by other authorities [26, 27, 43, 53, 54]. Incoherence can arise for reasons similar to those that can explain heterogeneity and intransitivity (Table 4).

Table 6 Glossary of terms

One can assess incoherence through inspecting the point estimates and the degree of CI or CrI overlap between direct and indirect evidence. In addition, investigators may conduct statistical tests that addresses whether chance can explain difference between direct and indirect comparisons [55, 56]. Unexplained incoherence requires rating down evidence quality.

In the asthma NMA, the direct evidence comparing ICS-L versus leukotriene receptor antagonists (LTRA) suggested a large reduction in exacerbation favoring ICS-L (OR = 0.38; 95%CrI 0.21, 0.68), and the network estimate showed a significant reduction (OR = 0.56, 95%CrI 0.39, 0.76) [12] – from which, one might infer that the indirect estimate showed a substantially smaller effect or, depending on the amount of indirect evidence, none at all. If the authors had provided the indirect estimate and its CrI, one could make the judgment regarding the degree of incoherence. The authors’ statement that they found no incoherence in the network on the basis of statistical tests is somewhat reassuring.

Like conventional meta-analysis, when heterogeneity is high, NMA can use techniques of subgroup analysis and meta-regression to try and explain heterogeneity by identifying modifiers of treatment effects [57, 58]. For example in the NMA addressing adverse events associated with antidepressant medications in children and adolescents [41], the OR for adverse events associated with sertraline use compared to placebo 2.94 (95%CrI 0.94,17.19, I2 = 79.3). The authors performed a subgroup analysis based on age and found increased adverse events with sertraline compared to placebo; for children age < 13 years (OR = 12.64, 95%CrI 2.72, 678.43), and in children age > 13 years (OR = 0.59, 95%CrI 0.15, 6.03).

A somewhat less satisfactory way of exploring heterogeneity is to omit studies and determine if the omission influences results. For example, in the mechanical ventilation NMA, the authors examined the robustness of the analysis by excluding 2 studies that included only newborns with gestational age 25–26 weeks [9]. The results showed no changes in the effect estimates.

When direct and indirect evidence vary, and the network estimate is between the two and rated down for incoherence, what estimate is the clinician going to believe? The GRADE approach suggests using the effect estimates from the highest quality evidence, which most commonly will be the direct estimate [31]. Other authorities would argue that, having committed oneself to an NMA, one should always use the network estimates.

For example, the pediatric antidepressants medications NMA included a comparison of Fluoxetine versus Placebo (Table 7) [41]. In this comparison, one can infer from the information presented a rating of the quality of the direct evidence as very low, the indirect evidence as moderate, and the network estimate as very low quality. In this case, following the GRADE approach, the clinician is better off using the effect estimates from the indirect evidence.

Table 7 Differences in the evidence certainty across evidence sources in the depression treatment NMA

Is there evidence for publication Bias?

Publication bias results from missing studies [59]. This is because some studies, particularly those with negative results, may never be published. A low risk NMA for publication bias will demonstrate comprehensive search for studies, present symmetrical funnel, and demonstrate insignificant statistical test for publication bias [38]. This assessment requires, however, at least 10 studies. If publication bias is very likely rating down the evidence is warranted.

Were treatment ranks presented and were they trustworthy?

Methods are available that allow NMA authors to rank treatments from best to worst [26, 60]. They are often expressed as probabilities that treatments are 1st, 2nd, 3rd etc. best, either in tables (Table 8) or graphically (rankograms). Surface under the ranking (SUCRA) summarizes the information from the rankograms as a single number. Ranking need be made for each outcome –a treatment that is best for one outcome (e.g. benefit) may be worst for another (e.g. harm) [60].

Table 8 Asthma treatments strategies effectiveness NMA in improving symptom free days

Although intuitively appealing, there are a number of reasons why clinicians should not routinely choose a treatment with the higher rankings [61]. First, a treatment that is best in one outcome (e.g., a benefit outcome) may be the worst in another (e.g., a harm outcome). Second, issues such as cost and a clinician’s familiarity with use of a particular treatment may also bear consideration. Third, rankings do not take into account the magnitude of differences in effects between treatments (a first ranked treatment may be only slightly, or a great deal better than the second ranked treatment). Fourth, chance may explain apparent difference between treatments; the use of a measure of uncertainty such as credible intervals for the SUCRA or p-value might help to consider the precision of these probabilities [62]. Finally, and most important, the evidence on which rankings are based may be very low quality, and therefore untrustworthy [61].

Although the first ranking may be secure, others are not: the asthma NMA showed that the treatment ranks for 2nd, 3rd, and 4th orders were ICS-L, ICS-H, ICS + LTRA (Table 8) for improving symptom-free days [12]. However, the probability for each treatment were close: 0.38, 0.33, 0.24 respectively, the NMA estimates were imprecise, and of low quality evidence. Therefore, the treatment ranks for the 2nd, 3rd, and 4th orders are untrustworthy.


Just as in conventional systematic reviews and pairwise meta-analysis, applicability may be limited by differences between the clinicial setting and the setting in which the trials were conducted. These limitations may include differences in the patients (e.g. the patient may be younger than those included in the trials); the intervention (e.g. the clinician is considering use of doses differing from those tested in the trials); comparators (e.g. trials used standard care as a comparator, and standard care delivered in the trials differs from standard care in the clinician’s setting); and outcomes (e.g. the clinician is concerned about long-term effects of treatment and trials examined only shorter term outcomes). In any of these situations, the clinician must consider the extent to which trial results apply to their patients and, if such differences exist, potentially refer to other evidence or their own experience in deciding on optimal management.


Returning to the NMA of ventilation modes in preterm infants with RDS [9] (P infants with RDS; I and C; all mechanical ventilation modes; O; mortality), the search strategy included 5 databases, and a grey literature search. Two independent reviewers performed title and abstract screening, full text eligibility, data extraction, and quality assessment, resulting in 20 eligible RCTs, comparing 16 ventilation modes in 2832 infants with gestational age 25–32 weeks (Fig. 2). The authors reported baseline characteristics, and assessed RoB using the Jadad instrument [63]. The authors did not present evidence quality assessment but, as we note in the next paragraph, they present enough information to make this judgment.

All included studies were low RoB. The only NMA estimates for mortality in the entire network in which the CrI did not include HR = 1.0 suggested benefit for time-cycled pressure-limited ventilation (TCPL) (HR = 0.29 95%CrI 0.07, 0.97), HFOV (HR = 0.29 95%CrI 0.08, 0.85), SIMV+VG (HR = 0.12, 95%CrI 0.01, 0.86), and V-C (HR = 0.14 95%CrI 0.02, 0.68) modes compared to SIMV+PSV. Although, the upper CrI of those estimates are close to no difference, you decide to not rate them down for imprecision (refer to the earlier discussion on imprecision). The contributing direct comparisons enrolled similar appropriate patients, the interventions appeared to be administered optimally, and the authors reported no heterogeneity or incoherence. You see little reason why, depending on the direction of results, authors would choose not to submit, or editors to publish these studies, and therefore rate publication bias as undetected. All these comparisons constitute high quality evidence. For every other paired comparison in the network, precision is a major concern.

In the ranking, SIMV+VG mode had the highest probability of being ranked first, though that probability was only 29.7%. The V-C mode had the second highest probability of being ranked first, at 22.8%. Given that there is clear difference between these two modes versus only SIMV+PSV (all other CrI were not precise) the only convincing result is that it is wise to avoid using SIMV+PSV. You therefore conclude that use of TCPL, HFOV, SIMV+VG, or V-C – all of which the pediatrician uses regularly - is reasonable and appropriate.


NMA is a powerful analytic tool that offers many advantages over a conventional meta-analysis. NMA may, however, be misleading because of a number of problems. First authors may not have not followed the basic standards applicable to any meta-analysis (e.g. comprehensive search, duplicate assessment of eligibility, risk of bias, and data abstraction). Second, trials may suffer limitations in risk of bias, precision, consistency, and indirectness. Third, there may be limitations specific to NMA including intransitivity, incoherence, or uncritical reliance on rankings. Therefore, evaluating the NMA evidence requires serial judgments on the creditability of the process of NMA conduct, and evidence quality assessment. This introductory guide will assist clinicians in their understanding of NMA.



Assist-control ventilation


Confidence or credible interval


Credible interval


Forced expiratory volume in 1 s


Global initiative for asthma


Grading recommendations assessment development and evaluation


High-frequency oscillatory ventilation


Hazard ratio


Inhaled corticosteroids


Medium or high-dose inhaled corticosteroids


Low-dose inhaled corticosteroids


Intermittent mandatory ventilation


Long-acting b-agonists strategies


Leukotriene receptor antagonists




Mean difference


Network meta-analysis


Odds ratio


Randomized controlled trial


Risk difference


Respiratory distress syndrome


Risk of bias


Risk ratio


Synchronized intermittent mechanical ventilation


Synchronized intermittent positive pressure ventilation


Volume-guarantee ventilation


Wait list


  1. 1.

    Guyatt G, Rennie D, Meade MO, Cook DJ. Chapter 22: the process of a systematic review and meta-analysis. In: Users’ guides to the medical literature: a manual for evidence-based clinical practice. 3rd ed. New York: McGraw-Hill Education; 2015. p. 459–69.

    Google Scholar 

  2. 2.

    Stegeman BH, de Bastos M, Rosendaal FR, van Hylckama Vlieg A, Helmerhorst FM, Stijnen T, Dekkers OM. Different combined oral contraceptives and the risk of venous thrombosis: systematic review and network meta-analysis. BMJ. 2013;347:f5298.

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Sundaresh V, Brito JP, Wang Z, Prokop LJ, Stan MN, Murad MH, Bahn RS. Comparative effectiveness of therapies for graves’ hyperthyroidism: a systematic review and network meta-analysis. J Clin Endocrinol Metab. 2013;98(9):3671–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Menten J, Lesaffre E. A general framework for comparative Bayesian meta-analysis of diagnostic studies. BMC Med Res Methodol. 2015;15:70.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. 5.

    Jones LJ, Craven PD, Attia J, Thakkinstian A, Wright I. Network meta-analysis of indomethacin versus ibuprofen versus placebo for PDA in preterm infants. Arch Dis Child Fetal Neonatal Ed. 2011;96(1):F45–52.

    Article  PubMed  CAS  Google Scholar 

  6. 6.

    Chinnadurai S, Fonnesbeck C, Snyder KM, Sathe NA, Morad A, Likis FE, McPheeters ML. Pharmacologic interventions for infantile hemangioma: a meta-analysis. Pediatrics. 2016;137(2):1–10.

    Article  Google Scholar 

  7. 7.

    Huang J, Wen D, Wang Q, McAlinden C, Flitcroft I, Chen H, Saw SM, Chen H, Bao F, Zhao Y, et al. Efficacy comparison of 16 interventions for myopia control in children: a network meta-analysis. Ophthalmology. 2016;123(4):697–708.

    Article  PubMed  Google Scholar 

  8. 8.

    Littlewood KJ, Higashi K, Jansen JP, Capkun-Niggli G, Balp MM, Doering G, Tiddens HA, Angyalosi G. A network meta-analysis of the efficacy of inhaled antibiotics for chronic Pseudomonas infections in cystic fibrosis. J Cyst Fibros. 2012;11(5):419–26.

    Article  PubMed  CAS  Google Scholar 

  9. 9.

    Wang C, Guo L, Chi C, Wang X, Guo L, Wang W, Zhao N, Wang Y, Zhang Z, Li E. Mechanical ventilation modes for respiratory distress syndrome in infants: a systematic review and network meta-analysis. Critical Care. 2015;19:108.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Zhou X, Hetrick SE, Cuijpers P, Qin B, Barth J, Whittington CJ, Cohen D, Del Giovane C, Liu Y, Michael KD, et al. Comparative efficacy and acceptability of psychotherapies for depression in children and adolescents: a systematic review and network meta-analysis. World Psychiat. 2015;14(2):207–22.

    Article  Google Scholar 

  11. 11.

    Knottnerus BJ, Grigoryan L, Geerlings SE, Moll van Charante EP, Verheij TJ, Kessels AG, ter Riet G. Comparative effectiveness of antibiotics for uncomplicated urinary tract infections: network meta-analysis of randomized trials. Fam Pract. 2012;29(6):659–70.

    Article  PubMed  Google Scholar 

  12. 12.

    Zhao Y, Han S, Shang J, Zhao X, Pu R, Shi L. Effectiveness of drug treatment strategies to prevent asthma exacerbations and increase symptom-free days in asthmatic children: a network meta-analysis. J Asthma. 2015;52(8):846–57.

  13. 13.

    Caldwell DM, Welton NJ, Dias S, Ades AE. Selecting the best scale for measuring treatment effect in a network meta-analysis: a case study in childhood nocturnal enuresis. Res Synthesis Methods. 2012;3(2):126–41.

    Article  Google Scholar 

  14. 14.

    Fang XZ, Gao J, Ge YL, Zhou LJ, Zhang Y. Network meta-analysis on the efficacy of Dexmedetomidine, midazolam, ketamine, Propofol, and fentanyl for the prevention of sevoflurane-related emergence agitation in children. Am J Ther. 2015;23:e1032–42.

  15. 15.

    Huang X, Xu B. Efficacy and safety of tacrolimus versus Pimecrolimus for the treatment of atopic dermatitis in children: a network meta-analysis. Dermatology. 2015;231(1):41–9.

    Article  PubMed  CAS  Google Scholar 

  16. 16.

    Achana FA, Sutton AJ, Kendrick D, Wynn P, Young B, Jones DR, Hubbard SJ, Cooper NJ. The effectiveness of different interventions to promote poison prevention behaviours in households with children: a network meta-analysis. PLoS One. 2015;10(3):e0121122.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. 17.

    Hubbard S, Cooper N, Kendrick D, Young B, Wynn PM, He Z, Miller P, Achana F, Sutton A. Network meta-analysis to evaluate the effectiveness of interventions to prevent falls in children under age 5 years. Inj Prevent. 2015;21(2):98–108.

    Article  Google Scholar 

  18. 18.

    van der Mark LB, Lyklema PH, Geskus RB, Mohrs J, Bindels PJ, van Aalderen WM, Ter Riet G. A systematic review with attempted network meta-analysis of asthma therapy recommended for five to eighteen year olds in GINA steps three and four. BMC Pulmonary Med. 2012;12:63.

    Article  CAS  Google Scholar 

  19. 19.

    Guo C, Sun X, Wang X, Guo Q, Chen D. Network meta-analysis comparing the efficacy of therapeutic treatments for bronchiolitis in children. JPEN J Parenter Enteral Nutr. 2018;42(1):186–95.

    PubMed  CAS  Google Scholar 

  20. 20.

    Padilha S, Virtuoso S, Tonin FS, Borba HHL, Pontarolo R. Efficacy and safety of drugs for attention deficit hyperactivity disorder in children and adolescents: a network meta-analysis. Eur Child Adol Psychiat. 2018;

  21. 21.

    Zeng L, Tian J, Song F, Li W, Jiang L, Gui G, Zhang Y, Ge L, Shi J, Sun X, et al. Corticosteroids for the prevention of bronchopulmonary dysplasia in preterm infants: a network meta-analysis. Arch Dis Child Fetal Neonatal Ed. 2018;

  22. 22.

    Fu H-D, Qian G-L, Jiang Z-Y. Comparison of second-line immunosuppressants for childhood refractory nephrotic syndrome: a systematic review and network meta-analysis. J Investig Med. 2017;65(1):65–71.

    Article  PubMed  Google Scholar 

  23. 23.

    Gutierrez-Castrellon P, Indrio F, Bolio-Galvis A, Jimenez-Gutierrez C, Jimenez-Escobar I, Lopez-Velazquez G. Efficacy of lactobacillus reuteri DSM 17938 for infantile colic: systematic review with network meta-analysis. Medicine. 2017;96(51):e9375.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Murad MH, Montori VM, Ioannidis JP, Jaeschke R, Devereaux PJ, Prasad K, Neumann I, Carrasco-Labra A, Agoritsas T, Hatala R, et al. How to read a systematic review and meta-analysis and apply the results to patient care: users’ guides to the medical literature. JAMA. 2014;312(2):171–9.

    Article  PubMed  CAS  Google Scholar 

  25. 25.

    Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR task force on indirect treatment comparisons good research practices: part 1. Value Health. 2011;14(4):417–28.

    Article  PubMed  Google Scholar 

  26. 26.

    Cipriani A, Higgins JP, Geddes JR, Salanti G. Conceptual and technical challenges in network meta-analysis. Ann Intern Med. 2013;159(2):130–7.

    Article  PubMed  Google Scholar 

  27. 27.

    Salanti G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synthesis Methods. 2012;3(2):80–97.

    Article  Google Scholar 

  28. 28.

    Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma C, Thompson D, Larholt KM, Diaz M, et al. Conducting indirect-treatment-comparison and network-meta-analysis studies: report of the ISPOR task force on indirect treatment comparisons good research practices: part 2. Value Health. 2011;14(4):429–37.

    Article  PubMed  Google Scholar 

  29. 29.

    Thabane L, Thomas T, Ye C, Paul J. Posing the research question: not so simple. Can J Anaesth. 2009;56(1):71–9.

    Article  PubMed  Google Scholar 

  30. 30.

    Salanti G, Del Giovane C, Chaimani A, Caldwell DM, Higgins JP. Evaluating the quality of evidence from a network meta-analysis. PLoS One. 2014;9(7):e99682.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. 31.

    Puhan MA, Schunemann HJ, Murad MH, Li T, Brignardello-Petersen R, Singh JA, Kessels AG, Guyatt GH. A GRADE working group approach for rating the quality of treatment effect estimates from network meta-analysis. BMJ. 2014;349:g5630.

    Article  PubMed  Google Scholar 

  32. 32.

    Reddel HK, Levy ML. The GINA asthma strategy report: what's new for primary care? NPJ Prim Care Resp Med. 2015;25:15050.

    Article  Google Scholar 

  33. 33.

    Expert Panel Report 3 (EPR-3). Guidelines for the diagnosis and Management of Asthma-Summary Report 2007. J Allergy Clin Immunol. 2007;120(5 Suppl):S94–138.

    Google Scholar 

  34. 34.

    Jansen JP, Naci H. Is network meta-analysis as valid as standard pairwise meta-analysis? It all depends on the distribution of effect modifiers. BMC Med. 2013;11:159.

    Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Betran AP, Say L, Gulmezoglu AM, Allen T, Hampson L. Effectiveness of different databases in identifying studies for systematic reviews: experience from the WHO systematic review of maternal morbidity and mortality. BMC Med Res Methodol. 2005;5(1):6.

    Article  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Kwon Y, Powelson SE, Wong H, Ghali WA, Conly JM. An assessment of the efficacy of searching in biomedical databases beyond MEDLINE in identifying studies for a systematic review on ward closures as an infection control intervention to control outbreaks. Systematic Rev. 2014;3:135.

    Article  Google Scholar 

  37. 37.

    Dickersin K, Scherer R, Lefebvre C. Identifying relevant studies for systematic reviews. BMJ. Br Med J. 1994;309(6964):1286–91.

    Article  CAS  Google Scholar 

  38. 38.

    Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from

  39. 39.

    Hutton B, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, Ioannidis JP, Straus S, Thorlund K, Jansen JP, et al. The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med. 2015;162(11):777–84.

    Article  PubMed  Google Scholar 

  40. 40.

    Jansen JP, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, Salanti G. Indirect treatment comparison/network meta-analysis study questionnaire to assess relevance and credibility to inform health care decision making: an ISPOR-AMCP-NPC good practice task force report. Value Health. 2014;17(2):157–73.

    Article  PubMed  Google Scholar 

  41. 41.

    Cipriani A, Zhou X, Del Giovane C, Hetrick SE, Qin B, Whittington C, Coghill D, Zhang Y, Hazell P, Leucht S, et al. Comparative efficacy and tolerability of antidepressants for major depressive disorder in children and adolescents: a network meta-analysis. Lancet. 2016;388:881–90.

  42. 42.

    Windecker S, Stortecky S, Stefanini GG, da Costa BR, Rutjes AW, Di Nisio M, Silletta MG, Maione A, Alfonso F, Clemmensen PM, et al. Revascularisation versus medical treatment in patients with stable coronary artery disease: network meta-analysis. BMJ Clin Res ed. 2014;348:g3859.

    Google Scholar 

  43. 43.

    Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med. 2002;21(16):2313–24.

    Article  PubMed  Google Scholar 

  44. 44.

    Friedrich JO, Adhikari NK, Beyene J. Ratio of means for analyzing continuous outcomes in meta-analysis performed as well as mean difference methods. J Clin Epidemiol. 2011;64(5):556–64.

    Article  PubMed  Google Scholar 

  45. 45.

    Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352(9128):609–13.

    Article  PubMed  CAS  Google Scholar 

  46. 46.

    Chaimani A, Vasiliadis HS, Pandis N, Schmid CH, Welton NJ, Salanti G. Effects of study precision and risk of bias in networks of interventions: a network meta-epidemiological study. Int J Epidemiol. 2013;42(4):1120–31.

    Article  PubMed  Google Scholar 

  47. 47.

    Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, et al. GRADE guidelines 6. Rating the quality of evidence--imprecision. J Clin Epidemiol. 2011;64(12):1283–93.

    Article  PubMed  Google Scholar 

  48. 48.

    Schunemann HJ, Guyatt GH. Commentary--goodbye M (C) ID! Hello MID, where do you come from? Health Serv Res. 2005;40(2):593–7.

    Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Counseling parents before high-risk delivery. In: Lacy GT, Eyal FG, Zenk KE, editors. Neonatology: management, procedures, on-call problems, diseases, and drugs. Edn. Stamford: Appleton & Lange; 1999. p. 223.

    Google Scholar 

  50. 50.

    Ancel PY, Goffinet F, Kuhn P, Langer B, Matis J, Hernandorena X, Chabanier P, Joly-Pedespan L, Lecomte B, Vendittelli F, et al. Survival and morbidity of preterm children born at 22 through 34 weeks’ gestation in France in 2011: results of the EPIPAGE-2 cohort study. JAMA Pediatr. 2015;169(3):230–8.

    Article  PubMed  Google Scholar 

  51. 51.

    Sun H, Cheng R, Kang W, Xiong H, Zhou C, Zhang Y, Wang X, Zhu C. High-frequency oscillatory ventilation versus synchronized intermittent mandatory ventilation plus pressure support in preterm infants with severe respiratory distress syndrome. Respir Care. 2014;59(2):159–69.

    Article  PubMed  Google Scholar 

  52. 52.

    Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, Alonso-Coello P, Falck-Ytter Y, Jaeschke R, Vist G, et al. GRADE guidelines: 8. Rating the quality of evidence--indirectness. J Clin Epidemiol. 2011;64(12):1303–10.

    Article  PubMed  Google Scholar 

  53. 53.

    Higgins J. Identifying and addressing inconsistency in network meta-analysis. In: Cochrane comparing multiple interventions methods group Oxford training event, vol. 2013: Cochrane Collaboration; 2013.

  54. 54.

    Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland R, Chen Y-F, Glenny A-M, Deeks JJ, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ. 2011;343:d4909.

    Article  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med. 2010;29(7–8):932–44.

    Article  PubMed  CAS  Google Scholar 

  56. 56.

    Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med Decis Mak. 2013;33(5):641–56.

    Article  Google Scholar 

  57. 57.

    Oxman AD, Guyatt GH. A consumer's guide to subgroup analyses. Ann Intern Med. 1992;116(1):78–84.

    Article  PubMed  CAS  Google Scholar 

  58. 58.

    Sun X, Ioannidis JP, Agoritsas T, Alba AC, Guyatt G. How to use a subgroup analysis: users’ guide to the medical literature. JAMA. 2014;311(4):405–11.

    Article  PubMed  CAS  Google Scholar 

  59. 59.

    Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, Alonso-Coello P, Djulbegovic B, Atkins D, Falck-Ytter Y, et al. GRADE guidelines: 5. Rating the quality of evidence--publication bias. J Clin Epidemiol. 2011;64(12):1277–82.

    Article  PubMed  Google Scholar 

  60. 60.

    Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from multiple-treatment meta-analysis: an overview and tutorial. J Clin Epidemiol. 2011;64(2):163–71.

    Article  PubMed  Google Scholar 

  61. 61.

    Mbuagbaw L, Rochwerg B, Jaeschke R, Heels-Andsell D, Alhazzani W, Thabane L, Guyatt GH. Approaches to interpreting and choosing the best treatments in network meta-analyses. Systematic Rev. 2017;6(1):79.

    Article  CAS  Google Scholar 

  62. 62.

    Veroniki AA, Straus SE, Rücker G, Tricco AC. Is providing uncertainty intervals in treatment ranking helpful in a network meta-analysis? J Clin Epidemiol. 2018;

  63. 63.

    Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, McQuay HJ. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials. 1996;17(1):1–12.

    Article  PubMed  CAS  Google Scholar 

Download references

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Author information




RA: conceptualized and designed the study, drafted the initial manuscript, and approved the final manuscript as submitted; IF: conceptualized the study, reviewed the, and approved the final manuscript as submitted; GG: conceptualized the study, reviewed the, and approved the final manuscript as submitted; LT: conceptualized the study, reviewed the, and approved the final manuscript as submitted.

Corresponding author

Correspondence to Reem Al Khalifah.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Al Khalifah, R., Florez, I.D., Guyatt, G. et al. Network meta-analysis: users’ guide for pediatricians. BMC Pediatr 18, 180 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Network meta-analysis
  • Multiple treatment comparisons
  • Multiple-treatment meta-analysis evidence synthesis
  • Evidence credibility
  • Evidence certainty
  • Pediatric