Diagnostic test accuracy of new generation tympanic thermometry in children under different cutoffs: a systematic review and meta-analysis.

BACKGROUND
The infrared tympanic thermometer (IRTT) is a popular method for temperature screening in children, but it has been debated for the low accuracy and reproducibility compared with other measurements. This study was aimed to identify and quantify studies reporting the diagnostic accuracy of the new generation IRTT in children and to compare the sensitivity and specificity of IRTT under different cutoffs and give the optimal cutoff.


METHODS
Articles were derived from a systematic search in PubMed, Web of Science Core Collection, and Embase, and were assessed for internal validity by the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). The figure of risk of bias was created by Review Manager 5.3 and data were synthesized by MetaDisc 1.4.


RESULTS
Twelve diagnostic studies, involving 4639 pediatric patients, were included. The cut-offs varied from 37.0 °C to 38.0 °C among these studies. The cut-off 37.8 °C was with the highest sROC AUC (0.97) and Youden Index (0.83) and was deemed to be the optimal cutoff.


CONCLUSION
The optimal cutoff for infrared tympanic thermometers is 37.8 °C. New Generation Tympanic Thermometry is with high diagnostic accuracy in pediatric patients and can be an alternative for fever screening in children.

may cause emotional distress, and -although very rarebrings possible complications such as perforation or transmission of micro-organisms [5,10]. And therefore infants, health workers and parents more or less express reluctance to perform it [3].
The forehead skin thermometer (FST) and infrared tympanic thermometer (IRTT) are popular alternatives for the traditional measures. The FST uses a sensor probe to measure the amount of infrared heat produced by the temporal arteries [8]. The IRTT detects the radiation of tympanic membrane and the ear canal, which share the blood supply with the hypothalamus, the thermoregulatory center of the human body [11,12]. Both these two methods are safe, easy to use, comfortable and quick. But compared to the FST, the IRTT is more consistent with rectal temperature and is more convincing [3,8,13]. Using the aural temperature is less traumatic and allows a faster triage [14], but it has been debated for the low accuracy and reproducibility compared with other measurements [1,[14][15][16][17][18]. Over the past years, however, the IRTT have been developed and updated, and some older versions have been obsolete. The new generation IRTT used various brand-specific ways to enhance accuracy, for example, improvements of geometry and algorithms, a wider angle measurement, displaying temperature on multiple samples and equipping with a heat probe [11,19]. Synthesizing studies applying obsolete IRTT with the new ones is unreasonable and may underestimate the IRTT test accuracy. Furthermore, the cutoffs of the IRTT used in fever detection are diverse, and the optimal cut-off has no consensus. The cutoff means a temperature threshold that divides pediatric patients into fever and non-fever, and the diagnostic accuracy of IRTT various under different cutoffs [3,13,20,21]. It is inappropriate to synthesize studies applying different cutoffs and the results are unreliable.
The aims of this systematic review were (1) to identify and quantify studies reporting the diagnostic accuracy of the new generation of the IRTT in children (By new generation, we meant the IRTT that were still in production and on sale according to the official websites of the manufacturers as we started our study); (2) to compare the sensitivity and specificity under different cutoffs of the IRTT and give the optimal cutoff.

Search strategies
The conduct of this systematic review and meta-analysis was based on the Test Accuracy Working Group of the Cochrane Collaboration and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy Studies statement (The PRISMA-DTA Statement) guidelines [22,23]. A systematic literature search of multiple electronic databases (PubMed, Web of Science Core Collection, EMBASE) was conducted by two trained reviewers (D.S. and LY.Z.) independently from inception to February 2nd, 2019. The following search terms ((tympanic thermometer OR ear thermometer OR infrared thermometry OR ear thermometry OR tympanic scan OR tympanic temperature OR ear temperature OR infrared thermometer OR ear thermometer)) AND (pediatric OR child OR kid OR newborn OR baby OR infant OR toddler) in All Fields (PubMed, EMBASE) or Topic (Web of Science Core Collection) were used. The languages were restricted to English and species were restricted to humans. The bibliographies of included studies were also searched to identify additional studies.

Study selection
Observational studies, detecting fever by aural and rectal thermometers, were deemed acceptable. Inclusion criterion included (1) studies recruiting pediatric subjects (age < 18 years), (2) diagnostic test accuracy studies, (3) studies detecting fever by new generation IRTT, and (4) studies using rectal thermometers as the reference standard. Exclusion criterion included (1) studies unrelated to the accuracy of IRTT, (2) reviews, proceedings papers, meeting abstracts, letters, notes and editorial materials, and (3) studies lacking essential data.
Two reviewers (D.S. and LY.Z.) independently reviewed the titles and abstracts of these studies. Papers deemed to match the predefined inclusion criteria or without consensus were reviewed in full text. Disagreements were resolved through discussions and scientific consultations.

Quality assessment and data extraction
We adopted the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2, [24] for quality assessment and used Review Manager 5.3 for creating the figures of risk of bias and applicability concerns [25]. Two independent reviewers (D.S. and LY.Z.) assessed the methodological quality of the included studies independently and disagreements were also resolved through discussions and scientific consultations.
The following data were extracted by two independent reviewers (D.S. and LY.Z) from the included studies: (1) descriptive aspects: primary author, year of publication, country, setting, age, types of tympanic thermometer and reference standard; (2) statistical aspects: the size, number of observations, the cut-off of tympanic thermometer, the True Positive (TP), the False Negative (FN), the False Positive (FP) and the True Negative (TN), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV).

Statistical analysis
Meta-analyses of TP, FN, FP and TN were performed to compare the test accuracy between tympanic temperature and the gold standard (rectal temperature) by MetaDiSc 1.4 [26]. Threshold analysis was conducted to evaluate the threshold effect [27]. The inconsistency index (I 2 ) test was used to estimate heterogeneity between studies and I 2 > 75% was considered to be with high heterogeneity [28]. Data were synthesized by using the random-effects model which was recommended in pooled estimates of diagnostic meta-analyses [29]. The area under the curve (AUC), Youden index and index Q* were used to measure test accuracy [30][31][32].

Selection process
Initially, 611, 468 and 276 articles were retrieved from PubMed, Web of Science Core Collection and EMBASE respectively. Secondly, 332 duplicates were removed. Thirdly, the titles and abstracts of the remaining 1023 articles were examined and 975 articles were excluded for diverse reasons. Finally, 11 articles were selected after the full text review and 1 article [33] was added by reviewing references. The process and outcome of the literature selection are presented in detail in Fig. 1.  Fig. 3 showed the risk of bias and applicability concerns in different domains. Among these 12 included articles, 4 had a high risk of bias on "flow and timing", "patient selection", "index test", and "reference standard", indicting the quality Methodological quality of included studies was moderate. Eight out of twelve studies had low applicability concerns in all domains and the applicability concerns was low.

Characteristics of selected studies
Twelve included studies were published from 2010 to 2018. All these studies applied the tympanic thermometer and set the rectal thermometer as reference standard. The descriptive and statistical characteristics of the 12 studies were presented in Table 1 and Table 2 respectively.

Accuracy of tympanic thermometry in children under different cut-offs
The 12 studies involved 4639 children. The cut-off points were various. Among the included articles, 7 [5,8,18,[33][34][35][36] studies set the optimal cut-off and the other 5 [3,13,14,20,21] studies analyzed the diagnostic test accuracy of tympanic thermometry under different cut-offs. The range of the cut-off point was from 37.0°C to 38.0°C. Studies had data under same cut-off were synthesized.

Accuracy under the cut-off of 37.0°C
There was only one study [3] reported diagnostic test accuracy under the cut-off 37.0°C. In this study, for ear temperature (37.0°C), sensitivity, specificity, PPV, and NPV were 0.89, 0.84, 0.91, and 0.81 respectively.
Accuracy under the cut-off of 37.25°C Only one study [34] gave the optimal cut-off 37.25°C and sensitivity, specificity, PPV, and NPV were 0.83, 0.86, 0.88, and 0.80 respectively.
Accuracy under the cut-off of 37.4°C There was only one study [20] reported diagnostic test accuracy under the cut-off 37.4°C. In this study, for ear temperature (37.4°C), sensitivity, specificity, PPV, and NPV were 0.96, 0.36, 0.82, and 0.73 respectively.

Accuracy under the cut-off of 37.7°C
There was only one study [20] reported diagnostic test accuracy under the cut-off 37.7°C. In this study, for ear temperature (37.7°C), sensitivity, specificity, PPV, and NPV were 0.91, 0.60, 0.87, and 0.68 respectively.
The diagnostic test accuracy of the tympanic thermometry under different Cut-offs in the detection of pediatric fever is summarized in Table 3. The cut-off 37.8 is with the highest sROC AUC and Youden Index and is deemed to be the optimal cutoff.

Discussion
We conducted this study to assess the discriminant validity of the new generation IRTT for detecting pediatric fever determined by rectal thermometry and to find the optimal cutoff. Twelve studies, including 4639 children, were included. The results indicated that IRTT was a good alternative for rectal thermometry in pediatric patients, and the optimal cut-off of ear temperature for screening fever in children was 37.8°C. Under this cut-    off, pooled sensitivity was 0.92 (95% CI 0.90-0.94), pooled specificity was 0.91 (95% CI 0.89-0.92), sROC AUC was 0.97 (SE = 0.02) and Q* value was 0.91 (SE = 0.03).
One major strength of this study was that it estimated the test accuracy of new generation IRTT. Although the IRTT may provide a good alternative for traditional measurements, it has been debated for the low reproducibility. However, since the ear thermometer came out, it has been constantly updated and upgraded. Some techniques have been used to improve the test accuracy, such as the Braun Welch Allyn Pro 4000 Thermoscan,  where a heating element in the sensor heats the probe tip to just below normal body temperature to avoid cooling the ear canal [19]. And the improvements of geometry and algorithms have been developed to ensure that the displayed result reflects the tympanic temperature accurately [11]. Hence, the newer versions of tympanic thermometers might meet the clinicians' requested improvements of repeatability in noninvasive temperature assessments. By new generation, we mean the IRTT that were still in production and on sale according to the official websites of the manufacturers as we started our study. We included the tympanic thermometers under use and excluded the outdated ones so that the results could provide a reference for current clinical practice.
Another strength of this study was that it estimated the test accuracy of new generation IRTT under different cutoffs. The synthesis of data under different cutoffs may underestimated the test accuracy of IRTT, because the diagnostic accuracy of IRTT varied under different cutoffs [3,13,20,21]. The cutoffs of IRTT ranged from 37.0°C to 38°C among these 12 included studies. After the synthesis of three studies, including 1795 children, we found the optimal cut-off of tympanic thermometry is 37.8°C. And under this cutoff, the pooled sensitivity was 0.92 (95% CI 0.90-0.94), pooled specificity was 0.91 (95% CI 0.89-0.92), sROC AUC was 0.97 (SE = 0.02) and Q* value was 0.91 (SE = 0.03).
The diagnostic accuracy in this study under the optimal cutoff was far higher than a former systematic review [27], in which pooled sensitivity was 0.70 (95% CI 0.68-0.72), pooled specificity was 0.86 (95% CI 0.85-0.88), sROC AUC was 0.94, and Q* value was 0.87. Excluding articles applying obsolete tympanic thermometers and analyzing diagnostic test accuracy under different cut-offs may be the major reasons for this gap.
The 12 included studies are with high homogeneity, because they have the same study type, study population, reference standard and et al. And data were synthesized by using the random-effects model. What should be  underlined is that the heterogeneity between the articles is very high, from 81.6 to 94.5%. The study population of included studies are all children, who age from 0 to 18year-old. But the age groups are various, for example, Duru et al. [35] admitted neonates whose mean age is 6.63 ± 6.98 days, while Allegaert et al. [5] enrolled children with a median age of 3.2 years (range 0.02 years to 17 years). The variation of age groups may be the major contribution to the high heterogeneity and further studies focusing on different age groups are needed.
Although the results of our study can provide an important reference for subsequent researches and clinical applications, there are two limitations in our present study. We performed different sub-group meta-analyses based on the different cut-offs used. Unfortunately, in many of these analyses a limited number of studies are included. We concluded that 37.8°C was the optimal cut-off just based on three studies, which seemed unconvincing. But considering that 1795 subjects were included for analysis under the cut-off 37.8°C, the conclusion was much more convincing.
According to the findings, ear canal temperature can be confidently implemented as a screening measure in the pediatric fever detection. This application of IRTT would effectively decrease the number of children who require the rectal temperature method for fever detection [7]. However, there are some situations, such as uncertain diagnosis [7], during exercise [37,38], change of environmental temperatures [39], that tympanic temperature should not be used as a surrogate for rectal temperature.

Conclusion
Tympanic thermometry has a high diagnostic accuracy and is a good alternative for temperature screening in pediatric patients. The optimal cut-off of ear temperature for screening fever in children is 37.8°C. Tympanic thermometry may not be an alternative for rectal temperature after intense exercise or exertion heat stroke.