The value of urinary gonadotropins in the diagnosis of central precocious puberty: a meta-analysis
BMC Pediatrics volume 22, Article number: 453 (2022)
The gonadotropin-releasing hormone (GnRH) stimulation test is time-consuming, invasive, and costly. However, it is the diagnostic gold standard for central precocious puberty (CPP), which in girls is defined as the onset of secondary sexual characteristics before the age of 8 years accompanied by breast buds, accelerated growth, and advanced bone age. This meta-analysis was performed to compare the diagnostic value of urinary gonadotropins and the GnRH stimulation test for CPP.
We searched six databases for relevant literature. In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we estimated the sensitivity, specificity, area under the summary receiver operating characteristic curve (AUC), and publication bias.
Six eligible trials fulfilled the inclusion criteria. In the meta-analysis of urinary luteinizing hormone (ULH), after excluding the data of one study, we obtained an AUC of 0.90 (sensitivity = 0.81, specificity = 0.85). The meta-analysis of the ULH to urinary follicle-stimulating hormone (UFSH) ratio revealed an AUC of 0.8116 (sensitivity = 0.79, specificity = 0.84).
Both the ULH level and ULH:UFSH ratio are effective and available approaches for CPP diagnosis.
Puberty is a complex progression of hormonal alterations leading to the achievement of mature reproductive capacity. The onset of puberty is activated by pulsatile release of gonadotropin-releasing hormone (GnRH). With the development of the social economy and improvements in living conditions, the age at pubertal onset has advanced worldwide, and the incidence and morbidity of precocious puberty is increasing annually [1,2,3]. Precocious puberty is more common in girls than in boys. In girls, the onset of secondary sexual characteristics before the age of 8 years is considered precocious puberty. It can be divided into two kinds, central precocious puberty (CPP) and peripheral precocious puberty . CPP may result in accelerated growth and an early age at menarche, and then it would lead to a decreased final adult height and some psychological and health problems in adulthood, taking diabetes, cardiovascular disease for example [5, 6]. Therefore, in the long run, early and accurate diagnosis of CPP is especially significant. In addition to examinations of secondary sexual characteristics and bone age, the GnRH stimulation test has been indispensable in the diagnosis of CPP. However, this test, while requiring 3 collections (0, 30 and 60 min), is often done with 5 samples, including 90 and 120 min time points. Not only is the GnRH stimulation test invasive, time-consuming, and expensive, but patient cooperation is also sometimes difficult. To explore convenient and accurate diagnostic procedures for CPP, many researchers have assessed the value of urinary gonadotropins from urinary samples, including first-voided urine samples and random urine samples. Both types of samples have been used to evaluate the levels of urinary luteinizing hormone (ULH) and urinary follicle-stimulating hormone (UFSH). First-voided urine is collected from patients who were informed to empty their bladder before going to bed and to refrain from voiding until the next morning. Random urine samples are collected at any time during the period of the GnRH stimulation test. To date, many authors have found that urinary gonadotropin measurements are a potential alternative approach for the diagnosis of CPP. However, this remains a controversial issue because of the absence of unified standards and evidence-based support for this approach.
Nocturnal ULH and UFSH can represent gonadotropin excretion in children with normal and early puberty . Therefore, this meta-analysis was performed to assess the value of first-voided ULH and the ratio of ULH to UFSH in the diagnosis of female CPP and to compare the accuracy between urinary gonadotropins and serum GnRH-stimulated gonadotropins.
This meta-analysis is reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The protocol for this systematic review was registered on INPLASY (registration number: INPLASY 2021120076) and is available in full on inplasy.com (https://doi.org/10.37766/inplasy2021.12.0076).
We searched the databases of PubMed, Embase, Cochrane Library, Web of Science, China National Knowledge Infrastructure (CNKI), and Wanfang for relevant literature published until 7 December 2020 in all languages. The search was performed using a combination of keywords and free words. The keywords were “puberty, precocious,” “urinary luteinizing hormone,” and “urofollitropin.” According to the PRISMA diagnostic test accuracy guidelines, the keywords regarding research methods were “sensitivity” and “accuracy.” Each keyword and its free words were combined with “or.” The different keywords were combined with “and.”
The inclusion criteria were as follows.
The study was designed as a diagnostic test study, and the results had been published.
All patients in the study were female with Tanner stage ≥ 2 breast development, advanced bone age by ≥ 1 year, and accelerated growth.
Urine and serum samples were collected for gonadotropin measurement on the same day for each patient.
First-voided urine was used for all urinary samples. For reliable evaluation of urinary gonadotropins, all patients had been informed to empty their bladder before going to bed and to refrain from voiding until the next morning.
The gold standard was a serum LH level of ≥ 5 mIU/L or a serum LH:FSH ratio of > 0.6.
Studies that did not meet the inclusion criteria were excluded. Additional exclusion criteria were fundamental experimental studies, such as animal studies; reviews, conference reports, repeatedly published studies, summary articles, and case reports; and studies without sufficient data.
Literature quality assessment
Literature quality assessments were performed by two independent investigators. They independently extracted and incorporated the data. When disagreements arose, the investigators discussed the study until a consensus was reached. The assessment was performed based on the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) .
Microsoft Excel 2010 (Microsoft Corp., Redmond, WA, USA) was used for data extraction. The following relevant information required for this meta-analysis was extracted: first author, publication year, nationality, sex, age, ethnicity, numbers of participants in CPP group and control group, gold standard test results, and sensitivity, specificity, true positive, false positive, true negative, false negative, and accuracy measures .
In this article, Meta-DiSc version 1.4 (http://www.hrc.es/investigacion/metadisc_en.htm) was used to evaluate threshold effects and heterogeneity. We performed Cochran’s Q test to assess heterogeneity and used the inconsistency index (I2 test) to assess the magnitude of heterogeneity among studies. If the P-value of Cochran’s Q test was > 0.1 and the I2 value was < 50%, no heterogeneity was present, and a fixed-effects model was performed. If the P-value was < 0.1 or I2 was > 50%, great heterogeneity was present, and a random-effects model was performed. The pooled sensitivity, pooled specificity, pooled diagnostic odds ratio, pooled positive likelihood ratio (PLR), and pooled negative likelihood ratio (NLR) were all calculated by Meta-DiSc, and the P-values and 95% confidence intervals (CIs) were calculated at the same time. The area under the summary receiver operating characteristic curve (AUC) was also calculated. Sensitivity evaluation and publication bias were performed with STATA version 15.0 (StataCorp, College Station, TX, USA) . If > 10 articles were included in this meta-analysis, we applied Deeks’ funnel plot to assess the extent of potential publication bias. Otherwise, we used the Begg rank correlation test and Egger test to analyze publication bias.
Baseline characteristics of included studies
We identified 310 candidate studies published before 7 December 2020. Considering that the prevalence of true precocious puberty has changed over the years, 70 studies published before 2000 were excluded. After reviewing the titles and abstracts of the remaining 240 studies, we deleted 76 duplicate studies; 15 reviews, editorials, or systematic analyses; 3 animal experiments; and 133 studies with irrelevant content. Thus, 11 full papers were reviewed. Finally, the eligibility criteria were fulfilled by 6 studies [11,12,13,14,15,16] involving 491 participants (Fig. 1). Potential bias was identified for all of the included studies. The main sources of bias were index tests, which might have introduced systematic error. The six trials were all published from 2012 to 2019 (three were published in 2019). The baseline characteristics of the included studies are shown in Table 1. Among the six included studies, diagnostic testing by ULH was assessed in five studies [11,12,13,14, 16]. Diagnostic testing by the ULH/UFSH ratio was performed in five studies, but one did not use first-voided urine; therefore, four studies [11, 13, 15, 16] were ultimately assessed. We did not assess the diagnostic value of the UFSH level for CPP because many studies have indicated that the serum FSH level alone does not have diagnostic significance.
The first included study not only evaluated the first-voided urinary gonadotropin levels but also tested random urinary gonadotropin levels; however, we only collected the first-voided urine results . Only the value of ULH was examined in the second included study. In the fourth study, first-voided urine samples were obtained on the same day as the GnRH stimulation test and the day before that to examine urinary LH level twice . In this study, the ULH:UFSH ratio was calculated using 4-h urine samples during the GnRH stimulation test period. Therefore, in accordance with the inclusion criteria, this meta-analysis collected the sensitivity and specificity of ULH on the day that the stimulation test was conducted. The fifth study included only the ULH:UFSH ratio; this was a 6-month follow-up study of girls with premature thelarche, and GnRH stimulation tests were performed at the beginning and end of the study period . Ultimately, we included the data after the 6-month follow-up, which were used to differentiate between CPP and premature thelarche. Half of the control groups in these studies included girls without precocious puberty, and half included girls with premature thelarche. The meta-analysis of the diagnostic value of ULH for CPP was performed using five records (nos. 1, 2, 3, 4, and 6), and the meta-analysis of the diagnostic value of the ULH:UFSH ratio for CPP was performed using four records (nos. 1, 3, 5, and 6).
Meta-analysis of urinary LH for diagnosis of CPP
All relevant information from the five included studies is shown in Table 2.
In Meta-DiSc version 1.40, the Spearman correlation coefficient between the sensitivity logarithm and the (1 − specificity) logarithm was 0.30 (P = 0.62 > 0.05), indicating that there was no threshold effect. Furthermore, by drawing the symmetrical summary receiver operating characteristic curve, there was no “shoulder-arm shape,” which further illustrated that there was no threshold effect.
Heterogeneity and inconsistency assessments
Based on a P-value of Cochran’s Q test of > 0.1 and I2 value of < 50%, a fixed-effects model was used (Fig. 2). Forest plots of the pooled sensitivity, pooled specificity, and pooled PLR and NLR are presented in Fig. 2: pooled sensitivity = 0.79 (95% CI = 0.73–0.84), pooled specificity = 0.84 (95% CI = 0.78–0.88), pooled PLR = 4.34 (95% CI = 3.24–5.81), and pooled NLR = 0.26 (95% CI = 0.20–0.35). The AUC was 0.8812, and the Q index was 0.8117 (Fig. 3).
The sensitivity analysis illustrated that three studies in this meta-analysis might have caused bias, which referred to Shim et al.’s , Yang et al.’s  and Chen et al.’s studies . Subsequent sensitivity analyses were therefore conducted, and the results showed that the fourth record had the largest effect (Table 3).
Only five articles were included in the meta-analysis of ULH; therefore, we applied the Begg rank correlation test (P = 0.81) and Egger linear regression test (P = 0.96), which indicated that no publication bias existed.
Meta-analysis of ULH:UFSH ratio for diagnosis of CPP
All relevant information from the four included studies is shown in Table 4.
Meta-DiSc version 1.40 showed that the Spearman correlation coefficient between the sensitivity logarithm and the (1 − specificity) logarithm was − 0.40 (P = 0.60 > 0.05), indicating that no threshold effect existed. The symmetrical summary receiver operating characteristic curve showed no “shoulder-arm shape,” which further indicated that there was no threshold effect.
Heterogeneity and inconsistency assessments
Heterogeneity was estimated by the Q value and I2 test. As shown in Fig. 4, P = 0.00 and I2 = 89.70% > 50%, indicating that heterogeneity existed. We performed a random-effects model. All forest plots of the meta-analysis are shown in Fig. 4: pooled sensitivity = 0.79 (95% CI = 0.72–0.84), pooled specificity = 0.84 (95% CI = 0.78–0.89), pooled PLR = 3.80 (95% CI = 1.22–11.83), and pooled NLR = 0.29 (95% CI = 0.14–0.61). The AUC was 0.8661 and the Q index was 0.7966 (Fig. 5).
The sensitivity analysis illustrated that two of the included studies in this meta-analysis might have caused bias, which referred to Shim’s and Yang’s studies. Moreover, we repeatedly reanalyzed the data while excluding the relevant records one by one. After deleting the data from the first record, a threshold effect was found; therefore, we did not further apply this analysis. As the data from the second record with an impact were eliminated, we obtained a P-value of > 0.5 and I2 of < 50%; therefore, we performed a fixed-effects model. The statistical results were as follows: pooled sensitivity = 0.73 (95% CI = 0.64–0.80), pooled specificity = 0.66 (95% CI = 0.55–0.76), pooled PLR = 2.19 (95% CI = 1.59–3.02), and pooled NLR = 0.40 (95% CI = 0.29–0.56). The surface under the cumulative ranking curve was 0.7820, and the Q index was 0.7203. The results were not altered when this study was removed.
Only four articles were included in the meta-analysis of the ULH:UFSH ratio; therefore, we again applied the Begg rank correlation test (P = 0.09) and Egger linear regression test (P = 0.47), which indicated that no publication bias existed.
Although the number of included studies was limited, we still found that some results were statistically significant. All urinary samples in the six included studies were the first-voided urine. Urinary sample collection and the GnRH stimulation test were performed on the same day. Additionally, there were few differences in the strength of the evidence among the included studies. Although the number of included articles was limited, no publication bias was found in meta-analysis of ULH and the ULH:UFSH ratio. From the six studies included in this meta-analysis, we found that compared with the GnRH stimulation test, the first-voided urinary gonadotropin test can also effectively diagnose CPP without repeated venipuncture, venous blood collection, and excessive time consumption. Furthermore, the sample collection and evaluation are more convenient, more acceptable, less expensive, and noninvasive. The test can be performed after the sample is brought to the hospital without requiring the presence of the patient. Thus, the data from this meta-analysis suggest that the first-voided urinary gonadotropin test can be used to accurately diagnose CPP.
Previous studies have indicated that first-voided urinary gonadotropins increase because of their physiological secretion during the nighttime in female patients with early puberty , and the present meta-analysis supports this finding. However, there are no accepted diagnostic criteria of first-voided urinary gonadotropins in pediatric endocrinology. First, all the samples were first-voided urinary samples, but the sample collection period was inconsistent across studies. The subjects in Shim et al.’s study emptied their bladder before going to bed the previous night, and first-voided urine samples were collected as soon as they woke up. There was no control for time in their study. Kobly et al.’s study used the same methods to collecte urine samples, but this study corrected all the periods by 8 h. In Yang et al.’s, Chen et al.’s and Ma et al.’s studies, there was a clear beginning time for sample collection at 20:00 h the previous night. However, the time of finishing collecting samples was not at one time, but at the time when the subjects woke up in the morning. Zhang et al.’s study used the clear time limit for sample collection, from 19:00 h to the next day at 07:00 h. Second, to avoid errors between different patients, Kolby et al.  used the corrected urinary gonadotropin levels along with osmolality, whereas Ma et al.  and Chen et al.  corrected the urinary gonadotropin levels along with urinary creatinine. Moreover, the test methods used across the studies in this meta-analysis were not identical. Serum gonadotropins were mainly examined by an electrochemiluminescence immunoassay, immunofluorometric assay, or chemiluminescence immunoassay. Urinary gonadotropins were measured by a dissociation-enhanced lanthanide fluorescence immunoassay, an immunofluorometric assay, a chemiluminescence immunoassay, or an immunochromatography assay. Every diagnostic approach has its own diagnostic sensitivity and specificity. Differences not only existed between the assessment of serum and urine gonadotropins but also among each of the included studies. Finally and most importantly, as mentioned above, gonadotropins in nocturnal urinary samples have been employed for CPP diagnosis for decades. In 1995, Demir et al.  found that urinary LH and FSH were age-related and significantly increased during puberty. However, this has not been widely adopted in clinical practice and remains an unresolved problem. In summary, the widespread use of first-voided urinary gonadotropins in the diagnosis of true precocious puberty will require a standardized examination procedure, a consistent cut-off value, and accordant urine sample collection instructions.
Although there was no fixed threshold of urinary gonadotropins in the included studies, no threshold effects existed in this meta-analysis. The meta-analysis of ULH for the diagnosis of CPP showed that the AUC was 0.8812 (sensitivity = 0.79, specificity = 0.84). In the sensitivity analysis of the outcomes in which we excluded trials with a high risk of bias one by one, we found no obvious difference in the AUC, sensitivity, or specificity. However, after excluding the data from Chen et al. , the diagnostic accuracy was slightly higher (AUC = 0.90, sensitivity = 0.81, specificity = 0.85), possibly because of the lower sensitivity and specificity. Regardless, ULH is a reliable indicator for the diagnosis of CPP. Similarly, the meta-analysis of the ULH:UFSH ratio for the diagnosis of CPP showed that the AUC was 0.8661 (sensitivity = 0.79, specificity = 0.84). There was no change in the sensitivity analysis of outcomes by excluding trials with a high risk of bias one by one. From this analysis, it seems apparent that the diagnostic accuracy of ULH is slightly higher than that of the ULH:UFSH ratio. No publication bias was found in this meta-analysis. In general, urinary gonadotropins can be used to effectively diagnose CPP.
First-voided urinary gonadotropin levels reflect the nighttime physiological secretion of gonadotropins in early puberty. Previous studies have shown that measurement of the nocturnal ULH and the ULH:UFSH ratio could be a proper substitute for the GnRH stimulation test . In terms of the quantitative results, we found that ULH had higher diagnostic value than the ULH:UFSH ratio. However, this study had a few limitations: no large-scale study was included in the analysis; fewer than 10 studies were included, with one from Europe, one from Korea and four from China. Therefore, our results could have been skewed by the lack of studies from other parts of the world. Additionally, all the included studies were single-center studies. Thus, to better apply first-voided urinary gonadotropins in pediatric clinics, larger-scale, multicenter, and prospective studies should be conducted in the future.
Although further studies are needed to establish the diagnostic value of urinary gonadotropins in CPP, our findings indicate that both the ULH level and ULH:UFSH ratio are effective and available approaches for the diagnosis of true precocious puberty from an evidence-based view.
Availability of data and materials
All data analyzed during this study are included in this published article.
Central precocious puberty
Urinary luteinizing hormone
Urinary follicle-stimulating hormone
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Positive likelihood ratio
Negative likelihood ratio
Area under the curve
Lee CT, Tsai MC, Lin CY, Strong C. Longitudinal effects of self-report pubertal timing and menarcheal age on adolescent psychological and behavioral outcomes in female youths from northern Taiwan. Pediatr Neonatol. 2017;58(4):313–20.
Bräuner EV, Busch AS, Eckert-Lind C, Koch T, Hickey M, Juul A. Trends in the incidence of central precocious puberty and normal variant puberty among children in Denmark, 1998 to 2017. JAMA Netw Open. 2020;3(10):e2015665.
Jaruratanasirikul S, Chanpong A, Tassanakijpanich N, Sriplung H. Declining age of puberty of school girls in southern Thailand. World journal of pediatrics: WJP. 2014;10(3):256–61.
Bradley SH, Lawrence N, Steele C, Mohamed Z. Precocious puberty. BMJ (Clinical researched). 2020;368:l6597.
Kelly Y, Zilanawala A, Sacker A, Hiatt R, Viner R. Early puberty in 11-year-old girls: Millennium Cohort Study findings. Arch Dis Child. 2017;3:232–7.
Day FR, Elks CE, Murray A, Ong KK, Perry JR. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep. 2015;5:11208.
Demir A, Voutilainen R, Stenman UH, Dunkel L, Albertsson-Wikland K, Norjavaara E. First morning voided urinary gonadotropin measurements as an alternative to the GnRH Test. Hormone research in paediatrics. 2016;85(5):301–8.
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
Ge L, Pan B, Song FJ, Ma JC, Zeraatkar D, Zhou JG, et al. Comparing the diagnostic accuracy of five common tumour biomarkers and CA19-9 for pancreatic cancer: a protocol for a network meta-analysis of diagnostic test accuracy. BMJ Open. 2017;7(12):e018175.
Datta S, Shah L, Gilman RH, Evans CA. Comparison of sputum collection methods for tuberculosis diagnosis: a systematic review and pairwise and network meta-analysis. Lancet Glob Health. 2017;5(8):760–71.
Shim YS, An SH, Lee HJ, Kang MJ, Yang S, Hwang IT. Random urinary gonadotropins as a useful initial test for girls with central precocious puberty. Endocr J. 2019;66(10):891–903.
Kolby N, Busch AS, Aksglaede L, Sørensen K, Petersen JH, Andersson AM, et al. Nocturnal urinary excretion of FSH and LH in children and adolescents with normal and early puberty. J Clin Endocrinol Metab. 2017;102(10):3830–8.
Yang QH, Liu LH, Jiang BL, Chen JX. Value of quantitative determination of gonadotropin in urine in the prediction of female sexual development. Chin J Human Sexuality. 2019;28(7):123–6.
Chen Y, Wang JQ, Ni JH, Lu WL, Sun WX. Value of quantitative assay of urinary gonadotropins in assessing sexual development in girls. Chin J Practical Pediatrics. 2016;31(3):219–23.
Ma XY, Lu WL, Ni JH, Wang JQ, Chen Y, Qin XY, et al. Value of quantitative assay of urinary LH and FSH in differentiating classic from transitional premature thelarche. J Diagnostics Concepts Pract. 2019;18(3):291–5.
Zhang TT, Xu JZ, Ma YP, Wang Q, Zhao JL, Lu JY. Nocturnal urinary gonadotropin in diagnosis of precocious puberty in girls. J Clin Med Pract. 2012;16(3):47–9 56.
McNeilly JD, Mason A, Khanna S, Galloway PJ, Ahmed SF. Urinary gonadotrophins: a useful non-invasive marker of activation of the hypothalamic pituitary-gonadal axis. Int J Pediatr Endocrinol. 2012;2012(1):10.
Demir A, Dunkel L, Stenman UH, Voutilainen R. Age-related course of urinary gonadotropins in children. J Clin Endocrinol Metab. 1995;80(4):1457–60.
Lucaccioni L, McNeilly J, Mason A, Giacomozzi C, Kyriakou A, Shaikh MG, et al. The measurement of urinary gonadotropins for assessment and management of pubertal disorder. Hormones (Athens). 2016;15(3):377–84.
We express our appreciation to reviewers for their helpful comments on this manuscript.
There was no funding source for this study.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Xu, D., Zhou, X., Wang, J. et al. The value of urinary gonadotropins in the diagnosis of central precocious puberty: a meta-analysis. BMC Pediatr 22, 453 (2022). https://doi.org/10.1186/s12887-022-03481-1