Study design
This was a prospective, multi-center, observational study conducted in the outpatient clinics and inpatient newborn units at three U.S. hospitals (Children’s Mercy, Kansas City, MO; Hershey Medical Center, Hershey, PA; Wesley Medical Center, Wichita, KS).
Device
Details surrounding the method on which the babyTAPE device (i.e. the study device) is based have been previously published [10]. The babyTAPE device is a flexible, paper-based strip printed on both sides with one side (yellow) designated for head circumference measurements and the other (blue) designated for chest circumference measurement (Fig. 1). The “start” end on each side of the device is marked with a large contrasting triangle. Along the length of the babyTAPE are 1 cm (cm) “bins” of alternating color that have additional markings which correspond to fractional weight values. The estimated weight of the infant (in hundredths of a kilogram) is obtained by summing the fractional weights derived from the two measurements. Since raters had knowledge of the infant’s weight, a masked version of the device replaced fractional weight values with arbitrary alphanumeric characters to minimize the potential for bias. The characters were organized so that we could discern whether the correct side and the correct starting end of the investigational device were used to perform the measurements. The code was broken only after enrollment closed at all participating sites. Study devices were printed on paper, and checked against a National Institute of Standards and Technology (NIST)-certified ruler in compliance with International Organization for Standardization (ISO) 9000 standards.
Participant infants
All infants presenting to the participating institutions of any gestational age who were 0–90 days of life were eligible for enrollment. Infants were stratified into 9 postmenstrual age blocks to ensure balanced enrollment and an even distribution of participants across weight and length. Participants were excluded if there were known or apparent anatomical deformities, external medical equipment that would impair the determination of actual weight, if they were incapable of having the measurements performed, or if the investigator or treating physician perceived contraindications to their inclusion. All infants were enrolled with informed parental permission under a protocol that was reviewed and approved by the Institutional Review Boards of the respective study sites.
Participant raters
Study raters were required to qualify for participation by demonstrating accuracy and reproducibility measuring head- and chest-circumference. Prospective raters made three sets of non-sequential measurements on three infant sized mannequins using a standard tape measure and the babyTAPE. Pre-study training was provided to ensure that the raters could identify the correct anatomic landmarks and read a standard tape measure. However, specific training on application of the babyTAPE was not provided. Rather, raters were provided with the babyTAPE “Instructions for Use” (Additional file 1) and evaluated according to their ability to execute the measurements based on these instructions. Intra-rater variance was calculated for each rater and could not exceed 5% for any measurement. Raters that failed qualification were remediated and given the opportunity to repeat the qualification. Raters that failed the second qualification were not be permitted to participate in the study.
Measurements
At enrollment we recorded participants’ gestational age, postnatal age, sex, race, and ethnicity. Weight was determined using a calibrated infant scale after removing clothing and diapers. Length was obtained using standard medical equipment available in patient care areas of the participating institution. Circumferential measures (in millimeters) were performed with a standard vinyl tape measure that was checked against an NIST-certified ruler. Chest circumference was determined with the infant’s arms extended outward to shoulder level and the tape measure placed under the axilla and around the chest, passing by the xyphoid process at the level of the nipple. Every effort was made to record chest circumference at the end of exhalation. To obtain head circumference, the tape measure was placed around the infant’s head so that it lay across the frontal bones, slightly above the eyebrows and ears, over the occipital prominence at the back of the head, perpendicular to the long axis of the face. The same technique for measuring head and chest circumference was applied for both the standard tape measure and the babyTAPE. All measurements were performed at the same time by a single rater. Approximately 10% of participants at each site were selected for multi-rater assessment to examine inter-rater reliability. After completion of measurements, infants were observed for an additional 10 min to assess study-related adverse device effects (ADEs).
After performing all study-related measures, subjective assessments of the babyTAPE were provided by each rater for each enrolled child via the following questions: Did you have any trouble identifying the proper landmarks on the infant? Could you correctly identify the proper starting ends of the babyTAPE for the measurements? Did you experience any difficulty performing the measurements on this infant with the babyTAPE? Were the circumference markings on the babyTAPE easy for you to read? Using the same babyTAPE you just used, how confident are you that you would obtain the same readings if you repeated them right now on the same infant?
Data analysis
Three critical tasks were defined a priori: 1) identification of the correct anatomic landmarks, 2) proper use and orientation of the device, and 3) accurate observation and recording of the device outputs. Critical task #1 was examined by determining whether recorded chest and head circumferences measured using the babyTAPE were within the expected range for each participant’s age and weight. Chest circumference measurements were compared to reference data collected in a published anthropometric study [13]. Head circumference measurements were compared to the Center for Disease Control and Prevention-National Health and Nutrition Examination Survey reference data [14]. Absolute z-scores of > 3 were classified as extreme outliers and examined for a possible misidentification of anatomical landmarks in conjunction with the infants z-score for length and weight.
Critical task #2 was examined by determining whether measurements recorded for the babyTAPE and reference tape were concordant. Reference tape measurements were binned to the nearest centimeter before evaluation. Discrepancies of ≥3 bins were identified as indicative of errors with the measurement and/or recording of reference tape or babyTAPE-values. In these cases, the estimated weight assigned by each device was evaluated in an attempt to identify the clinical significance of the erroneous measurement.
Critical task #3 was examined by determining whether recorded values for babyTAPE circumferences measurements appeared on the device as printed and appeared on the side of the device indicated by the user. The number and percentage of measurements with observation or recording errors were summarized.
The predictive performance of the babyTAPE was established by comparing the babyTAPE predicted weights with weights measured on a calibrated medical scale. The difference between these measurements were summarized using statistics that include the mean error, mean squared error, and proportion within 10 and 15% of actual weight. Parameter estimates and 95% confidence intervals were compared with results found from the validation study of the method that the babyTAPE embodies.
Between-user variability was examined by comparing the babyTAPE and reference tape circumference measures from the multi-rater assessment. Between-user variability was characterized through estimation of the intraclass correlation coefficient (ICC), the accompanying 95% confidence intervals, and the proportion of observations ≥10% apart. Agreement between estimated weight and actual weight was determined using Bland-Altman plots with log-transformation. IBM SPSS version 24 (IBM Corp., Armonk, NY) and SAS (version 9.3, SAS Institute, Inc., Cary, NC) were used for all analyses.
Sample size calculation
Sample size was estimated based on the ability to discriminate device-estimated weight from actual weight. Assuming an observed proportion of estimated weights that differ from actual weight by ≤10% of 0.8, a two-sided 95% Wilson’s score confidence interval is (0.76, 0.83) with 460 participants after accounting for drop-outs. Additionally, selection of at least 50 participants for multi-rater assessment provides > 80% power to conclude that the ICC is > 0.8 when the true ICC is 0.9.