Skip to main content

Machine learning for prediction of bronchopulmonary dysplasia-free survival among very preterm infants



Bronchopulmonary dysplasia (BPD) is one of the most common and serious sequelae of prematurity. Prompt diagnosis using prediction tools is crucial for early intervention and prevention of further adverse effects. This study aims to develop a BPD-free survival prediction tool based on the concept of the developmental origin of BPD with machine learning.


Datasets comprising perinatal factors and early postnatal respiratory support were used for initial model development, followed by combining the two models into a final ensemble model using logistic regression. Simulation of clinical scenarios was performed.


Data from 689 infants were included in the study. We randomly selected data from 80% of infants for model development and used the remaining 20% for validation. The performance of the final model was assessed by receiver operating characteristics which showed 0.921 (95% CI: 0.899–0.943) and 0.899 (95% CI: 0.848–0.949) for the training and the validation datasets, respectively. Simulation data suggests that extubating to CPAP is superior to NIPPV in BPD-free survival. Additionally, successful extubation may be defined as no reintubation for 9 days following initial extubation.


Machine learning-based BPD prediction based on perinatal features and respiratory data may have clinical applicability to promote early targeted intervention in high-risk infants.

Peer Review reports


Bronchopulmonary dysplasia (BPD) was first described by Northway et al. in 1967 as a new lung disease of preterm infants following respiratory distress syndrome [1]. Over the past 50 years since its first characterization, medical technology and clinical management have evolved. Accordingly, the pathology of BPD evolved from primarily necrotic bronchiolitis and fibrotic changes in the lung tissues to alveolar simplification in the post-surfactant era [1, 2]. BPD is associated with long-term cardiopulmonary complications as well as neurodevelopmental disadvantages, including cerebral palsy, vision and hearing deficits, mental and psychomotor impairments [3,4,5,6,7,8,9].

Due to how its defined clinically, the diagnosis of BPD is rather subjective. Most contemporary practices follow the 2001 NICHD Workshop definition to diagnose BPD at 36 weeks postmenstrual age (PMA) [10]. Severity stratification was based on the duration of supplemental oxygen and respiratory support use. A revised operational definition was proposed after the 2016 Workshop [11]. In 2019, a systematic approach correlating 18 operational definitions of BPD with toddler-age respiratory and neurodevelopmental outcomes suggested that the best way to diagnose and grade BPD was by respiratory support mode at 36 weeks PMA. Notably, the extent of supplemental oxygen use was not needed [12].

To our knowledge, the concept of a developmental origin of BPD was first explicitly mentioned by Thébaud et al. recently in their extensive review article on BPD [13]. Indeed, gestational age (GA) is the best predictor of BPD [14]. Additionally, genetic factors such as race/ethnicity and sex also play a role [15,16,17,18]. Moreover, modifiable perinatal factors such as maternal smoking, chorioamnionitis, placental insufficiency, as well as volutrauma, barotrauma, oxygen toxicity, and inflammation from prolonged mechanical ventilation use have been associated with BPD [19,20,21,22,23]. The modifiable factors may reflect on the trajectory of respiratory support in the early postnatal period. As a result, distinct patterns of lung disease based on supplemental oxygen use have been characterized as early as the first two weeks of life [24, 25]. These early predictors led to the development of the BPD outcome estimator using 4 demographic and 2 respiratory factors which allowed risk stratification at six different days of life in the first 4 weeks of life [26, 27]. After external validation has been performed, the predictive power was found to be lacking, with the area under the receiver’s operating characteristic curves ranging only between 0.73–0.76 [26, 28, 29]. We and others also did not find it particularly useful clinically [30, 31]. Center effect likely plays an impactful role, and a cutoff probability level needs to be individualized for each unit [29, 30, 32].

In recent years, machine learning algorithms have become more accessible to clinical researchers. Machine learning algorithms are able to detect patterns in data that are invisible to the human eye, and it can be used as a more appropriate data analysis strategy for multifactorial pathologies such as BPD. In this project, we sought to investigate whether perinatal features and the trajectory of early-life respiratory support can be used to predict BPD-free survival using a machine learning algorithm.


Study participants

Machine learning-based predictive modeling analyzed a retrospective dataset encompassing infants born between 2013 to 2020 with a birth gestation of 30 weeks 3 days or less admitted to the neonatal intensive care unit (NICU) at the Loma Linda University Children’s Hospital (LLUCH) or at Riverside University Health System (RUHS) [33]. The study was approved by the Institutional Review Boards (LLUCH IRB#: 520338; RUHS IRB#: 1689889) of both institutes with waiver of informed consent due to the retrospective nature of the study. Infants without complete data for perinatal and respiratory features, those that died in the first 28 days of life (DOL), or those that were transferred out of NICU before an assessment for a BPD diagnosis could be made, were excluded. Infants with gestation longer than 30 weeks 3 days were not included because their risk for BPD were low.

Feature engineering and data extraction

We followed the study by Morrow et al. on the risk analysis of perinatal factors in association with BPD development to design perinatal features [18], which included assigned sex (female or male), self-reported race/ethnicity (White, Black, Hispanic, Asian, or other), birth gestation, birth weight category, and maternal smoking during pregnancy. For birth gestation, instead of categorizing based on complete weeks, we used the closest gestation in full week for assignment. For example, an infant born at gestational age 29 weeks 3 days would be assigned to the 29-week category, whereas an infant born at gestational age 29 weeks 4 days would be assigned to the 30-week category. For birth weight category assignment, we obtained sex- and gestational age-specific birth weight percentile using the 2013 Fenton growth charts [34]. Infants were then categorized into small (< 10th percentile), appropriate (10th-90th percentile), or large (> 90th percentile) for gestational age.

For respiratory features, we only included the respiratory support modes during model development. Our reasoning was that the respiratory support mode should reflect the severity, reversibility of illness and respiratory maturity levels, noting that it is the only variable strictly based on the current physician’s orders. The respiratory support modes were classified into five categories: (1) high-frequency ventilation (HFV, including high-frequency jet ventilator or high-frequency oscillator), (2) conventional mechanical ventilation (CMV, including pressure-control mode, volume-control mode, or invasive neurally-adjusted ventilatory assist (NAVA)), (3) non-invasive mechanical ventilation (NIMV, defined as non-invasive ventilatory mode which provides a peak inspiratory pressure, including non-invasive positive pressure ventilation (NIPPV) or non-invasive NAVA), (4) continuous positive airway pressure (CPAP) or high-flow nasal cannula (HFNC), and (5) low-flow nasal cannula (LFNC) or no support. Notably, the reason we grouped CPAP and HFNC together was that some providers in our group had a preference of using HFNC at 8–10 L per minute to “mimic” CPAP.

Perinatal data were extracted from the backend database of the electronic medical records (A.N.). Respiratory mode data were available in the daily flowsheets documented by the respiratory therapists. These data were first extracted from the backend database (A.N.), followed by manual curation using a custom web app designed specifically for the purpose. Respiratory mode data were collected based on the respiratory mode the infants were receiving at the end of each 24-h interval after birth for 14 consecutive intervals (instead of by calendar dates). Each DOL indicates a 24-h interval, or a complete day, throughout the manuscript.

Outcome definition

BPD-free survival outcome was defined as survival until at least 36 weeks PMA and no respiratory support needs at 36 weeks PMA. Notably, BPD was defined based on the 2019 Jensen criteria, meaning that respiratory support in the first 28 days of life was not taken into consideration for BPD diagnosis [12].

Model training and validation

Supervised machine learning based on the defined outcome as described above using a random forest algorithm was performed on a randomly selected 80% of the complete dataset. The remaining 20% of the data was used for internal validation. A random forest algorithm, a decision tree-based algorithm, was the algorithm of choice for this study partly because all features were categorical in nature. Four random-forest models each with 500 trees planted and a tenfold cross-validation repeated 10 times were trained using the caret (6.0–89) and ranger (0.13.1)packages for R [35,36,37].

  • Model 1: perinatal features only

  • Model 2: Respiratory data from DOL1

  • Model 3: Respiratory data from DOL1-7

  • Model 4: Respiratory data from DOL1-14

The probabilities of outcome prediction from the above four random-forest models were subsequently used to develop additional ensemble models by using a generalized logistic regression algorithm:

  • Model 5: Model 1 and Model 2

  • Model 6: Model 1 and Model 3

  • Model 7: Model 1 and Model 4

An interaction term to assess the interaction of probabilities from the two random-forest models was introduced in generalized logistic regression modeling.

Model performance was assessed by the receiver operating characteristic area under the curve (ROC AUC) as well as by overall accuracy, positive predictive value, and negative predictive value. Youden’s J statistics was utilized to calculate the optimal cutoff threshold for binary outcome prediction (Yes or No for BPD-free survival) which was needed in order to obtain the latter performance parameters. The pROC(1.18.0) package for R was used for these analyses [38]. During the validation process using the testing dataset, the same cutoff threshold generated from applying the training dataset to the model was applied for each corresponding model.


Three scenarios were designed for simulation. The goal for the first simulation scenario was to validate the model. The goal for the second and the third simulation scenarios assessed whether the model could be used to answer common clinical questions about BPD.

  1. 1.

    Scenario 1

    In this scenario, we explored if the projection of BPD-free survival prediction increased with increasing birth gestation and decreased with longer intubation time following birth. To test, we created three sets of simulated patients, each set born at a different gestational age (23, 26, and 29 weeks). All infants were intubated within DOL1 and placed on HFV. Each set contained 5 patients, all extubated at a different timepoint (after 1, 3, 6, and 10 complete days, or remained intubated for all 14 days). All the simulated infants were non-Hispanic White, female, appropriate for gestational age, and without in utero exposure to smoking.

  2. 2.

    Scenario 2

    In this scenario, there were two sets of appropriate-for-gestational age non-Hispanic White female infants born at 26 weeks’ gestation without antenatal smoking exposure. The infants were all intubated within the first 24 hrs of life and placed on HFV. In each set, infants were extubated at different time points (after DOL 1, 3, 6, or 10) to either NIMV or CPAP/HFNC. This scenario was used to assess the differences in BPD-free survival between the two non-invasive modes. Of note, the scenario was not designed to compare superiority between NIMV vs. CPAP/HFNC, and should not be confused with clinical trials.

  3. 3.

    Scenario 3

    In this scenario, a group of appropriate-for-gestational age, non-Hispanic White, female infants without antenatal smoking exposure were born at 26 weeks’ gestation. The infants were intubated by the end of DOL1, subsequently extubated to CPAP by the end of DOL2, followed by reintubation after various periods of time ranging from 1 to 12 complete days of extubation. After reintubation, the infant was placed on CMV. The control infant did not require reintubation. This scenario assessed the duration of time needed for the infant to remain extubated in order to have comparable BPD-free survival as compared to the control infant.

Statistical analysis

Descriptive statistics were performed for demographic comparison. χ2 tests were used for categorical variables. Student’s t-tests or Mann–Whitney U tests were employed for continuous variables. P-value < 0.05 was considered statistically significant.

BPD-free survival prediction probabilities were compared by supplying probability as mean and standard error multiplied by the square root of total trees planted (n = 500) as standard deviation using a Welch-modified two-sample t-test assuming unequal variance. P-value < 0.05 was considered as statistically significant.

All analyses were performed at Loma Linda University. Only IRB approved study personnel had access to private health information.


Out of a total of 1,191 infants who met the gestational age criteria, 128 died before 28 days of life and were excluded. Among the remaining 1,063 infants, outcome data were available for 935 infants, complete data for perinatal features were available in 847 infants, and respiratory data were available in 689 infants. Perinatal and respiratory data from these infants were used in the study (Fig. 1). Data from randomly selected 552 (80%) infants were used for model training, and data from the remaining 137 (20%) infants were used for validation (Fig. 1). Demographic information and the number of infants with each respiratory support mode at various DOL are summarized in Tables 1 and 2, respectively.

Fig. 1
figure 1

A flow chart depicting the selection of study participants

Table 1 Demographic characteristics of the study participants
Table 2 A table depicting the number of infants on each indicated respiratory support mode on Day of life 1, 7, and 14

The performances of the random forest and the ensemble models were listed in Table 3. Model 1 (perinatal features only) had a ROC AUC of 0.861 with the training dataset and 0.786 with the testing dataset. Using the probability cutoff threshold calculated for binary outcome prediction based on the training dataset (55%), we found positive and negative predictive values to be 0.773 and 0.802, respectively, upon model validation with the testing dataset. The ROC AUC increased with more respiratory data available for training, from 0.724 with only 1 day of data to 0.900 with all 14 days of data. The ensemble model combining perinatal features and all 14 days of respiratory data (Model 7) provided the best prediction with ROC AUC of 0.921 in the training dataset, and 0.899 in the testing dataset. Using a cutoff threshold of 48.8% for binary outcome prediction, the overall predicting accuracy was 85.1% and 81.0% in the training and the testing dataset, respectively. In the testing dataset, the positive predictive value was 0.855 and the negative predictive value was 0.773.

Table 3 A table detailing performance measures for various random forest models predicting bronchopulmonary dysplasia-free survival. The training dataset was used for model development. The testing dataset was used for model validation

Using permutation to assess the relative importance of each feature, we found that gestational age was the most influential among all perinatal features, followed by birth weight z-score, male sex, maternal smoking, and race/ethnicity in the order of importance (Fig. 2A). For respiratory feature, CPAP/HFNC use was the most predictive of the outcome (Fig. 2B-D).

Fig. 2
figure 2

Feature importance scores for A Model 1 – five perinatal features, B Model 2—respiratory model data for day of life 1, C Model 3 – respiratory model data for day of life 1–7, and D Model 4 – respiratory model data for day of life 1–14. Feature importance scores were calculated based on permuting the values of the indicated feature followed by re-building the model and calculating the decrease in prediction accuracy. The scores were normalized between 0 and 100, with 0 being least important, and 100 being most important. The scores were obtained by running the varImp() function from the caret package

It is well established that lower gestational age at birth and longer intubation time are associated with higher risk of BPD. To confirm these facts in our model, Model 7 (perinatal features and all 14 days of respiratory data) was used to predict the probability of BPD-free survival (Scenario 1). As shown in Fig. 3, lower birth gestation and longer intubation time were individually associated with a reduced probability of BPD-free survival, providing assurance for the validity of the model.

Fig. 3
figure 3

BPD-free survival probabilities of female, appropriate for gestational age, white, antenatally non-smoking exposed infants born at 23, 26, and 29 weeks of gestation intubated at birth for indicated periods. The error bars indicate standard errors of the probabilities. This plot depicts the simulated results from Scenario 1 (see text)

The model continued to be utilized for further clinical simulations. In scenario 2, we found that extubating to NIMV did not lead to a better respiratory prognosis, noting that the probability of BPD-free survival remained less than 20% even after only one day of intubation following birth, and that no statistically significant difference was shown between extubation and no extubation. On the other hand, if an infant were able to maintain adequate respiration on CPAP after extubation, the probability of BPD-free survival was significantly higher, although the probability decreased with more time spent intubated (Fig. 4).

Fig. 4
figure 4

BPD-free survival probabilities of female, appropriate for gestational age, white, antenatally non-smoking exposed infants born at 26 weeks of gestation intubated at birth for the indicated periods followed by extubating to either continuous positive airway pressure (CPAP)/high-flow nasal cannula (HFNC) or to non-invasive positive pressure ventilation (NIPPV)/non-invasive neurally adjusted ventilatory assist (nNAVA). The error bars indicate standard errors of the probabilities. This plot depicts the simulated results from Scenario 2 (see text)

In Scenario 3, we observed that the longer an infant remained extubated before reintubation, the higher the probability of BPD-free survival (Fig. 5). After staying extubated for 9 complete days, there was no statistically significant difference in the probabilities of BPD-free survival between the reintubated infant and the non-reintubated (control) infant.

Fig. 5
figure 5

BPD-free survival probabilities of female, appropriate for gestational age, white, antenatally non-smoking exposed infants born at 26 weeks of gestation intubated at birth for one full day, followed by extubation between day of life 1 and 2, and reintubation following the indicated periods of time. In the control infant, there was no reintubation. Statistical comparison of the probability of BPD-free survival was made between the control infant and each of the infants who were reintubated individually using Student’s t-test. The asterisk sign (*) indicates p-value < 0.05. The error bars indicate standard errors of the probabilities. This plot depicts the simulated results from Scenario 3 (see text)


The literature suggests a developmental origin of BPD [13, 17, 18, 24, 26, 39,40,41]. In this study, we expanded on this concept to develop a machine learning BPD-free survival prediction model. Five non-modifiable perinatal factors combined with respiratory support mode data from the first 14 days of life were used in this model. The five perinatal factors selected for inclusion are based on the elegant work of Morrow et al. where biological, social, and antenatal exposure determinants were assessed for their association with BPD. While sex and race/ethnicity are strictly non-modifiable, the impact of maternal smoking, fetal weight gain, and birth gestation are factors that may be modifiable from the obstetric standpoint. Together, these factors indicate that low lung tissue volume at birth plays a determinant role in the development of BPD. We did not include maternal conditions (chorioamnionitis, preeclampsia, prolonged preterm rupture of membrane, etc.) or maternal medications (antenatal steroid, magnesium sulfate, antibiotics, etc.) due to inconsistent reliability of the information. Moreover, responses to these exposures were expected to be reflected on the trajectory of respiratory support mode use, including the reversibility of their influences.

For respiratory data, we chose to only include respiratory support mode data instead of other respiratory parameters such as the fraction of inspired oxygen, oxygen saturation, mean airway pressure, blood gas results, etc., because the respiratory support mode is the most reliable variable guided by physician orders and is less likely to be influenced by clinical subjectivity. Additionally, documentation of respiratory mode change is less likely to be associated with errors. Moreover, respiratory parameters with normal ranges or goal ranges are not useful by themselves because there would not be a clear distinction between the two outcomes groups to guide prediction. On the other hand, the decision to choose a respiratory support mode over another is heavily influenced by local respiratory protocols. Consequently, a model developed with local data may not be as easily generalizable. It is important to note that the study was not designed to assess the superiority of one specific respiratory mode over another within the same group.

The most well-developed and widely used prediction tool for BPD is the NICHD Neonatal Bronchopulmonary Dysplasia Outcome Estimator [27]. The model was developed with data from 2,415 infants that underwent internal and external validation with additional 1,214 and 722 infants, respectively, during development [26]. The model was further validated by two additional study cohorts (PreVILIG and PREMILOC), both showing fair discrimination [28, 29]. We and others showed that the Estimator performed well in predicting death or severe BPD [30, 31]. Unfortunately, the Estimator does have limitations as it does not have an option for infants of Asian descent nor an option for NIMV. One model was developed for each DOL which does not take into consideration time as a continuous factor. In their assessment, demographic factors play a more crucial role during the early days of life, which they justified based on a sequential addition of variables and the related ROC AUC values. Notably, this method has trouble assessing the contribution of respiratory trajectories to BPD risks across days of life because different models were built for each DOL.

In our design, we separated perinatal factors and respiratory data into two models during initial training. We combined the two models into a final ensemble model to ensure both perinatal and respiratory features were taken into consideration with equal weight. By taking this approach, the final model does not add more weight on the respiratory data for prediction. From the clinical standpoint, this approach not only provided us with an opportunity to assess the developmental origin of bronchopulmonary dysplasia, but will also allow us to assess BPD risks at four time points: at birth, at the end of the first 24 h, as well as at 7 and 14 complete days of life. The most appropriate model for risk stratification can be picked for use in future studies comparing efficacy of interventional therapies to prevent BPD development depending on the timing of the intervention. Although there is not an easily accessible and suitable algorithm in classic machine learning which considers repeated measurement in a way that is reminiscent of mixed-modeling, we felt that using a decision tree-based algorithm on a dataset with each DOL as one feature allows the model to assess the interrelationship of respiratory support modes across all days of life for each infant. This serves as an alternative approach to assessing changes in respiratory support mode over time.

The lack of improvement in BPD-free survival in infants extubated to a NIMV mode was unexpected (Scenario 2). Notably, more than 80% of the infants receiving NIMV received unsynchronized NIPPV instead of non-invasive NAVA. A multi-center prospective trial comparing NIPPV and CPAP use in infants born < 1,000 g (about 2.2 lbs), and < 30 weeks’ gestation showed no difference in the risks of BPD-free survival [42]. A subgroup analysis showed no difference in BPD risks among infants with prior intubation. The use of NIPPV did not appear to provide benefit for preventing subsequent reintubation which occurred in about 60% of the infants in both groups. The severity of illness from morbidities or comorbidities of prematurity did not differ between the two groups. These findings were different from other trials which assessed the early use of NIPPV compared to CPAP in infants born at a later gestational age, which showed reduced intubation needs and a lower risk of BPD [43,44,45]. A Cochrane review also found no difference in BPD risks between NIPPV and CPAP use [46]. In the input data for model training, we observed a higher percentage of infants without BPD who received CPAP/HFNC rather than NIMV (the majority received NIPPV) or an invasive ventilatory mode. Clinically, sicker infants were more likely to require longer NIMV support due to unreliable respiratory drives in order to avoid intubation/reintubation. Our model reflected that, which was different from the above-mentioned randomized controlled trial. Interestingly, a recent meta-analysis on mixed treatment comparisons found that surfactant administration followed by CPAP use provided superior respiratory outcomes to NIPPV use [47]. In clinical practice, it would be reasonable to trial CPAP with early use of pharmacological intervention to prevent apnea of prematurity on infants who are able to sustain a reliable respiratory drive, and reserve NIPPV to those infants who require artificial breath.

Prolonged intubation may be associated with barotrauma and volutrauma, while invasive ventilation may be necessary in poor pulmonary compliance or intolerance of non-invasive support. In the literature, extubation failure in preterm infants was inconsistently defined as reintubation between 48 h and 7 days [48,49,50,51]. Reintubation may be associated with respiratory setback, leaving us with the question of whether the interval between first extubation and reintubation conveys any significance in influencing long-term respiratory outcomes. Our prediction model provided an objective way of assessing what constitutes as the best definition for “successful extubation” (Scenario 3). We explored this by comparing the probability of BPD-free survival to a control simulated infant who did not require reintubation after initial extubation. Based on our simulation using BPD-free survival probability as outcome correlation, the duration required to maintain extubated is at least 48 h longer than the definitions used in most studies.

One major drawback of our model is that the source of the data was from infants cared for by the same group of neonatologists, so the study was considered as a single-center study, making external application of the model difficult. Nonetheless, studies have already suggested a strong influence of center effect on BPD rate and predictability [29, 32]. Plus, the goal of a prediction model is different from that of a statistical model. A prediction tool may be considered useful if it provides a reasonable balance between bias and variance, and is generalizable to a target population (e.g., preterm infants receiving care in one NICU or by one group of neonatal providers). Additional limitation of our prediction model was binary prediction of BPD-free survival, rather than prediction of BPD grade or severity. Also, the definition of BPD remained operational and subjective, affecting the accuracy of outcome labeling.


In conclusion, we developed a prediction tool for BPD-free survival based on the developmental origin of BPD. The predictability of BPD using early-life predictors provides an opportunity for personalized early intervention targeting only the at-risk subpopulation to minimize potential harm to the low-risk group. Moreover, the ability to simulate respiratory support use provides clinicians an objective way of assessment for an optimal respiratory support mode to endow actual benefit to the infant. A web app created to demonstrate the use of the prediction tool developed in this study is accessible at

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.



Bronchopulmonary dysplasia


Postmenstrual age


Intrauterine growth restriction


Neonatal intensive care unit


High-frequency ventilation


Conventional mechanical ventilation


Non-invasive mechanical ventilation


Non-invasive positive pressure ventilation


Neurally adjusted ventilatory assist


Continuous positive airway pressure


High-flow nasal cannula


Low-flow nasal cannula


Day of life


Receiver operating characteristic area under the curve


  1. Northway WH, Rosan RC, Porter DY. Pulmonary disease following respirator therapy of hyaline-membrane disease. Bronchopulmonary dysplasia N Engl J Med. 1967;276:357–68.

    PubMed  Article  Google Scholar 

  2. Husain AN, Siddiqui NH, Stocker JT. Pathology of arrested acinar development in postsurfactant bronchopulmonary dysplasia. Hum Pathol. 1998;29:710–7.

    CAS  PubMed  Article  Google Scholar 

  3. Jensen EA, Schmidt B. Epidemiology of bronchopulmonary dysplasia. Birth Defects Res A Clin Mol Teratol. 2014;100:145–57.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Bolton CE, et al. The EPICure study: association between hemodynamics and lung function at 11 years after extremely preterm birth. J Pediatr. 2012;161:595-601.e2.

    PubMed  PubMed Central  Article  Google Scholar 

  5. Berkelhamer SK, Mestan KK, Steinhorn RH. Pulmonary hypertension in bronchopulmonary dysplasia. Semin Perinatol. 2013;37:124–31.

    PubMed  PubMed Central  Article  Google Scholar 

  6. Bhat R, Salas AA, Foster C, Carlo WA, Ambalavanan N. Prospective analysis of pulmonary hypertension in extremely low birth weight infants. Pediatrics. 2012;129:e682–9.

    PubMed  PubMed Central  Article  Google Scholar 

  7. Short EJ, et al. Cognitive and academic consequences of bronchopulmonary dysplasia and very low birth weight: 8-year-old outcomes. Pediatrics. 2003;112:e359.

    PubMed  Article  Google Scholar 

  8. Majnemer A, et al. Severe bronchopulmonary dysplasia increases risk for later neurological and motor sequelae in preterm survivors. Dev Med Child Neurol. 2000;42:53–60.

    CAS  PubMed  Article  Google Scholar 

  9. Ehrenkranz RA, et al. Validation of the National Institutes of Health consensus definition of bronchopulmonary dysplasia. Pediatrics. 2005;116:1353–60.

    PubMed  Article  Google Scholar 

  10. Jobe AH, Bancalari E. Bronchopulmonary dysplasia. Am J Respir Crit Care Med. 2001;163:1723–9.

    CAS  PubMed  Article  Google Scholar 

  11. Higgins RD, et al. Bronchopulmonary dysplasia: executive summary of a workshop. J Pediatr. 2018;197:300–8.

    PubMed  PubMed Central  Article  Google Scholar 

  12. Jensen EA, et al. The diagnosis of bronchopulmonary dysplasia in very preterm infants. An evidence-based approach. Am J Respir Crit Care Med. 2019;200:751–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. Thébaud B, et al. Bronchopulmonary dysplasia Nat Rev Dis Primers. 2019;5:78.

    PubMed  Article  Google Scholar 

  14. Stoll BJ, et al. Neonatal outcomes of extremely preterm infants from the NICHD Neonatal Research Network. Pediatrics. 2010;126:443–56.

    PubMed  Article  Google Scholar 

  15. Rojas MA, et al. Changing trends in the epidemiology and pathogenesis of neonatal chronic lung disease. J Pediatr. 1995;126:605–10.

    CAS  PubMed  Article  Google Scholar 

  16. Lemons JA, et al. Very low birth weight outcomes of the National Institute of Child health and human development neonatal research network, January 1995 through December 1996. NICHD Neonatal Research Network. Pediatrics. 2001;107:E1.

    CAS  PubMed  Article  Google Scholar 

  17. Bose C, et al. Fetal growth restriction and chronic lung disease among infants born before the 28th week of gestation. Pediatrics. 2009;124:e450–8.

    PubMed  Article  Google Scholar 

  18. Morrow LA, et al. Antenatal determinants of bronchopulmonary dysplasia and late respiratory disease in preterm infants. Am J Respir Crit Care Med. 2017;196:364–74.

    PubMed  PubMed Central  Article  Google Scholar 

  19. Van Marter LJ, et al. Chorioamnionitis, mechanical ventilation, and postnatal sepsis as modulators of chronic lung disease in preterm infants. J Pediatr. 2002;140:171–6.

    PubMed  Article  Google Scholar 

  20. Goldenberg RL, et al. The Alabama preterm birth study: umbilical cord blood ureaplasma urealyticum and mycoplasma hominis cultures in very preterm newborn infants. Am J Obstet Gynecol. 2008;198(43):e1-5.

    Google Scholar 

  21. Kramer BW, Kallapur S, Newnham J, Jobe AH. Prenatal inflammation and lung development. Semin Fetal Neonatal Med. 2009;14:2–7.

    PubMed  Article  Google Scholar 

  22. Hartling L, Liang Y, Lacaze-Masmonteil T. Chorioamnionitis as a risk factor for bronchopulmonary dysplasia: a systematic review and meta-analysis. Arch Dis Child Fetal Neonatal Ed. 2012;97:F8–17.

    PubMed  Article  Google Scholar 

  23. Plakkal N, Soraisham AS, Trevenen C, Freiheit EA, Sauve R. Histological chorioamnionitis and bronchopulmonary dysplasia: a retrospective cohort study. J Perinatol. 2013;33:441–5.

    CAS  PubMed  Article  Google Scholar 

  24. Laughon M, et al. Patterns of respiratory disease during the first 2 postnatal weeks in extremely premature infants. Pediatrics. 2009;123:1124–31.

    PubMed  Article  Google Scholar 

  25. Nobile S, et al. New insights on early patterns of respiratory disease among extremely low gestational age newborns. Neonatology. 2017;112:53–9.

    PubMed  Article  Google Scholar 

  26. Laughon MM, et al. Prediction of bronchopulmonary dysplasia by postnatal age in extremely premature infants. Am J Respir Crit Care Med. 2011;183:1715–22.

    PubMed  PubMed Central  Article  Google Scholar 

  27. Laughon MM, et al. NICHD neonatal research network neonatal BPD outcome estimator. 2011.

    Google Scholar 

  28. Onland W, et al. Clinical prediction models for bronchopulmonary dysplasia: a systematic review and external validation study. BMC Pediatr. 2013;13:207.

    PubMed  PubMed Central  Article  Google Scholar 

  29. Baud O, Laughon M, Lehert P. Survival without bronchopulmonary dysplasia of extremely preterm infants: a predictive model at birth. Neonatology. 2021;118(4):385–93.

    Article  PubMed  Google Scholar 

  30. Leigh R, et al. Combining probability scores to optimize clinical use of the NICHD Neonatal BPD outcome estimator. Neonatology Today. 2021;16:3–13.

    Article  Google Scholar 

  31. Baker EK, Davis PG. Bronchopulmonary dysplasia outcome estimator in current neonatal practice. Acta Paediatr. 2021;110:166–7.

    PubMed  Article  Google Scholar 

  32. Lapcharoensap W, et al. Hospital variation and risk factors for bronchopulmonary dysplasia in a population-based cohort. JAMA Pediatr. 2015;169:e143676.

    PubMed  Article  Google Scholar 

  33. Banerji AI, Hopper A, Kadri M, Harding B, Phillips R. Creating a small baby program: a single center’s experience. J Perinatol. 2022.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Fenton TR, Kim JH. A systematic review and meta-analysis to revise the Fenton growth chart for preterm infants. BMC Pediatr. 2013;13:59.

    PubMed  PubMed Central  Article  Google Scholar 

  35. R Core Team. A language and environment for statistical computing. 2018.

    Google Scholar 

  36. Kuhn M. Classification and regression training [R package caret version 6.0–86]. 2020.

    Google Scholar 

  37. ranger: A Fast Implementation of Random Forests. Comprehensive R Archive Network (CRAN)

  38. Display and analyze ROC curves [R package pROC version 1.18.0]. 2021 [cited 11 Sep 2022]. Available:

  39. McGowan S. Understanding the developmental pathways pulmonary fibroblasts may follow during alveolar regeneration. Cell Tissue Res. 2017;367:707–19.

    CAS  PubMed  Article  Google Scholar 

  40. Carmichael SL, et al. Maternal prepregnancy body mass index and risk of bronchopulmonary dysplasia. Pediatr Res. 2017;82:8–13.

    CAS  PubMed  Article  Google Scholar 

  41. Kuiper-Makris C, Selle J, Nüsken E, Dötsch J, Alejandre Alcazar MA. Perinatal nutritional and metabolic pathways: early origins of chronic lung diseases. Front Med. 2021;8:667315.

    Article  Google Scholar 

  42. Kirpalani H, et al. A trial comparing noninvasive ventilation strategies in preterm infants. N Engl J Med. 2013;369:611–20.

    CAS  PubMed  Article  Google Scholar 

  43. Kugelman A, et al. Nasal intermittent mandatory ventilation versus nasal continuous positive airway pressure for respiratory distress syndrome: a randomized, controlled, prospective study. J Pediatr. 2007;150:521–6, 526.e1.

    PubMed  Article  Google Scholar 

  44. Sai Sunil Kishore M, Dutta S, Kumar P. Early nasal intermittent positive pressure ventilation versus continuous positive airway pressure for respiratory distress syndrome. Acta Paediatr. 2009;98:1412–5.

    CAS  PubMed  Article  Google Scholar 

  45. Armanian A-M, Badiee Z, Heidari G, Feizi A, Salehimehr N. Initial treatment of respiratory distress syndrome with nasal intermittent mandatory ventilation versus nasal continuous positive airway pressure: A randomized controlled trial. Int J Prev Med. 2014;5:1543–51.

    PubMed  PubMed Central  Google Scholar 

  46. Lemyre B, Davis PG, De Paoli AG, Kirpalani H. Nasal intermittent positive pressure ventilation (NIPPV) versus nasal continuous positive airway pressure (NCPAP) for preterm neonates after extubation. Cochrane Database Syst Rev. 2017;2:CD003212.

    PubMed  Google Scholar 

  47. Isayama T, Iwami H, McDonald S, Beyene J. Association of noninvasive ventilation strategies with mortality and bronchopulmonary dysplasia among preterm infants: a systematic review and meta-analysis. JAMA. 2016;316:611–24.

    PubMed  Article  Google Scholar 

  48. Giaccone A, Jensen E, Davis P, Schmidt B. Definitions of extubation success in very premature infants: a systematic review. Arch Dis Child Fetal Neonatal Ed. 2014;99:F124–7.

    PubMed  Article  Google Scholar 

  49. Manley BJ, Doyle LW, Owen LS, Davis PG. Extubating extremely preterm infants: predictors of success and outcomes following failure. J Pediatr. 2016;173:45–9.

    PubMed  Article  Google Scholar 

  50. Chawla S, et al. Markers of successful extubation in extremely preterm infants, and morbidity after failed extubation. J Pediatr. 2017;189:113-119.e2.

    PubMed  PubMed Central  Article  Google Scholar 

  51. Gupta D, et al. A predictive model for extubation readiness in extremely preterm infants. J Perinatol. 2019;39:1663–9.

    PubMed  Article  Google Scholar 

Download references




There was no funding support for this project.

Author information

Authors and Affiliations



Rebekah M. Leigh: Analyzed and interpreted data and wrote the initial manuscript. Andrew Pham: Curated respiratory support data and provided intellectual input to the manuscript. Srinandini S. Rao: Collected perinatal data and contributed to manuscript writing. Farha M. Vora: Participated in simulation scenario development, provided intellectual input to project development, and critically reviewed the initial manuscript. Gina Hou: Collected perinatal data and critically reviewed the initial manuscript. Chelsea Kent: Collected perinatal data and critically reviewed the initial manuscript. Abigail Rodriguez: Collected perinatal data and critically reviewed the initial manuscript. Arvind Narang: Extracted data from the clinical database and critically reviewed the initial manuscript. John B. C. Tan: Supervised machine learning model development, wrote and critically reviewed the initial and the revised manuscripts. Fu-Sheng Chou: Conceptualized the project, supervised data collection, collected outcome data, built machine learning models, created simulation scenarios, wrote and critically reviewed the initial and the revised manuscripts. The author(s) read and approved the final manuscript.

Authors’ information

Dr. Fu-Sheng Chou is a practicing neonatologist with additional training in data science and classic machine learning. His academic interest is in the developmental origins of health and disease concerning preterm infants. Dr. Chou is dedicated to applying emerging principles in data science and big-data research to his scholarly endeavor to better inform neonatal outcomes and improve neonatal care.

Corresponding author

Correspondence to Fu-Sheng Chou.

Ethics declarations

Ethics approval and consent to participate

The study was approved by Loma Linda University Research Affairs Human Research Protection Program Institutional Review Board (IRB#: 520338) and Riverside University Health System Institutional Review Board (IRB#: 1689889). Requirement of the informed consent waived by Loma Linda University Research Affairs Human Research Protection Program Institutional Review Board (IRB#: 520338) and Riverside University Health System Institutional Review Board (IRB#: 1689889), due to the retrospective nature of the study. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interest

The authors declared no conflict of interest associated with this study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Leigh, R.M., Pham, A., Rao, S.S. et al. Machine learning for prediction of bronchopulmonary dysplasia-free survival among very preterm infants. BMC Pediatr 22, 542 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Bronchopulmonary dysplasia
  • Machine learning
  • Predictive modeling
  • Preterm infants