Quality of EHR data extractions for studies of preterm birth in a tertiary care center: guidelines for obtaining reliable data

Table 2 Demographic parameters compared

	1. Manually abstracted database, # of subjects	2. EHR extract-ion, # of subjects	3. Discrepancy (% and # of subjects) between the databases ^a	4. Manually abstracted database errors	5. EHR-extracted data errors	6. Median discrepancy	7. Discrepancy range
Gestational age	1772	700	2.6 % (18)	1.0 % (7)	1.3 % (9)	1 week	1–10 weeks
Birthweight	1772	735	9.7 % (71)	1.5 % (11)	8.0 % (59) ^{c ****}	13 g	2–548 gm
Neonate race ^b	1758	1384	3.2 % (44)	!-	!-	NA	NA
Neonate ethnicity	1757	596	1.5 % (9)	!-	!-	NA	NA
Mother race ^b	1749	1378	3.2 % (45)	!-	!-	NA	NA
Mother ethnicity	1739	595	5.0 % (30)	!-	!-	NA	NA

Demographic parameters compared in the paper. The denominator for the percentage is the smaller of the corresponding values in the first two columns
! – EHR manual review data could not be used as a gold standard – often recorded as unknown or null, while the manually collected data was based on patient interviews and was more detailed. *P0.05; **P0.01; ***P0.001; ****P0.0001
^a - In general, the sum of the error counts in columns 4 and 5 do not add up to the number in column 3, because the error occurred in both manually and electronically extracted data, or the cause was ambiguous
^b - Re-calculated discrepancies after adjusting for the inappropriate Hispanic category in the race column
^c - Difference statistically significant, p = 4.3 × 10⁻⁹ by Chi-square test

ISSN: 1471-2431