Is early detection of abused children possible?: a systematic review of the diagnostic accuracy of the identification of abused children

Background Early detection of abused children could help decrease mortality and morbidity related to this major public health problem. Several authors have proposed tools to screen for child maltreatment. The aim of this systematic review was to examine the evidence on accuracy of tools proposed to identify abused children before their death and assess if any were adapted to screening. Methods We searched in PUBMED, PsycINFO, SCOPUS, FRANCIS and PASCAL for studies estimating diagnostic accuracy of tools identifying neglect, or physical, psychological or sexual abuse of children, published in English or French from 1961 to April 2012. We extracted selected information about study design, patient populations, assessment methods, and the accuracy parameters. Study quality was assessed using QUADAS criteria. Results A total of 2 280 articles were identified. Thirteen studies were selected, of which seven dealt with physical abuse, four with sexual abuse, one with emotional abuse, and one with any abuse and physical neglect. Study quality was low, even when not considering the lack of gold standard for detection of abused children. In 11 studies, instruments identified abused children only when they had clinical symptoms. Sensitivity of tests varied between 0.26 (95% confidence interval [0.17-0.36]) and 0.97 [0.84-1], and specificity between 0.51 [0.39-0.63] and 1 [0.95-1]. The sensitivity was greater than 90% only for three tests: the absence of scalp swelling to identify children victims of inflicted head injury; a decision tool to identify physically-abused children among those hospitalized in a Pediatric Intensive Care Unit; and a parental interview integrating twelve child symptoms to identify sexually-abused children. When the sensitivity was high, the specificity was always smaller than 90%. Conclusions In 2012, there is low-quality evidence on the accuracy of instruments for identifying abused children. Identified tools were not adapted to screening because of low sensitivity and late identification of abused children when they have already serious consequences of maltreatment. Development of valid screening instruments is a pre-requisite before considering screening programs.


Background
The World Health Organization (WHO) defines child maltreatment as "all forms of physical and/or emotional ill-treatment, sexual abuse, neglect or negligent treatment or commercial or other exploitation, resulting in actual or potential harm to the child's health, survival, development or dignity" [1]. It is a major public health issue worldwide. Gilbert et al. estimated that every year in high-income countries about 4 to 16% of children were physically abused, one in ten was neglected or psychologically abused, and between 5 and 10% of girls and up to 5% of boys were exposed to penetrative sexual abuse during childhood [2]. Child maltreatment can cause death of the child or major consequences on mental and physical health, such as post-traumatic stress disorder and depression, in childhood or adulthood [2]. WHO estimated that 155 000 deaths in children younger than 15 years occurred worldwide in 2000 as a result of abuse or neglect [3].
In France, a retrospective study carried out in three regions from 1996 to 2000 showed that many children who died from abuse were not identified as abused before their deaths. After excluding clear neonaticides, 25 of 53 (47%) infants who died from suspicious or violent death had signs of prior abuse, such as fractures of different ages, discovered during post-mortem investigations. Only eight of these children were already known to be victims of abuse [4]. Similarly, only 33% of children who were born in California between 1999 and 2006 and died from intentional injury during the first five years of life had been previously reported to Child Protection Services [5]. Consequently, children who died from child maltreatment can be victims of chronic child abuse while they were not diagnosed before their death. Systematic early detection of abused children could help prevent these deaths and lessen child maltreatment-related morbidity. However, as in usual screening programs, it is important to balance potential positive and negative effects and to determine the conditions for a screening program of child maltreatment to be effective. A first necessary condition is the availability of a test identifying correctly abused children before they have serious or irreversible consequences of maltreatment.
Diagnostic accuracy of ocular signs in abusive head trauma and clinical and neuroradiological features associated with abusive head trauma have been already synthesized [6][7][8][9]. In the reviewed studies, however, markers identified children when they had already serious consequences of child maltreatment. Sometimes the diagnosis had been done when the child was dead. Furthermore, the diagnostic accuracy of markers was not always estimated, the analysis being limited to estimating the association between a marker and maltreatment. Similarly, diagnostic accuracy of genital examination for identifying sexually abused prepubertal girls was reviewed [10], but tools only identified children who were victims of a severe form of sexual abuse (genital contact with penetration). Furthermore, the sensitivity for several potential markers, such as hymeneal transections, deep notches or perforations, was never reported.
Several authors have already considered screening in emergency departments [11][12][13]. A large study in the United Kingdom evaluated the accuracy of potential makers: child age, type of injuries, incidence of repeat attendance, and the accuracy of clinical screening assessments for detecting physical abuse in injured children attending Accident and Emergency departments [13]. They found no relevant comparative studies for incidence of repeat attendance, only one study which reported a direct comparison of type of injury in abused and non-abused children, and three studies for child age. However two of these three studies were limited to a subset of children admitted with severe injuries. Besides, assessments by the medical team were rarely based on standardized criteria, and therefore not reproducible and usable in practice [13]. The same team published another study about the same markers (age, repeated attendance, and type of injury) to identify children victims of physical abuse or neglect among injured children attending Emergency departments [14]. They found no evidence that any of the markers were sufficiently accurate. Thus these two large studies only reviewed the accuracy of tests for two types of child abuse among children who attended Emergency departments and already had injuries. A last study had initially the aim of evaluating the accuracy of tools identifying early abused children, but only reported an accuracy assessment of tools identifying high-risk parents before occurrence of child maltreatment [15].
The aim of our study was to review the evidence on the accuracy of instruments for identifying abused children during any stage of child maltreatment evolution before their death, and to assess if any might be adapted to screening, that is if accurate screening instruments were available. We define as instruments any reproducible assessment used in any types of setting.

Search strategy Information sources and search terms
Electronic searches were carried using PUBMED database from 1966 to April 2012, PsycINFO database from 1970 to April 2012, SCOPUS database from 1978 to April 2012, PASCAL and FRANCIS databases from 1961 to April 2012, to identify articles published in French or English. Search terms used were child abuse, child maltreatment, battered child syndrome, child neglect, Munchausen syndrome, shaken baby syndrome, child sexual abuse, combined with sensitivity, specificity, diagnostic accuracy, likelihood ratio, predictive value, false positive, false negative, validity, test validation, and diagnosis, measurement, psychodiagnosis, medical diagnosis, screening, diagnosis imaging, physical examination, diagnostic procedure, scoring system, diagnostic, scoring system, score, assessment (Table 1).

Eligibility criteria
To be included in this analysis, articles had to 1) state as an objective to estimate at least one accuracy parameter (sensitivity, specificity, predictive value or likelihood ratio) of a test identifying abused children (persons under age 18); 2) include a reference standard to determine whether a child had actually been abused; and 3) describe the assessed test, e. g. when the authors presented the information and method to carry the assessment, and not only the result of this assessment. As there is no gold standard for detecting child maltreatment, we defined acceptable reference standards as: expert assessments, such as child's court disposition; substantiation by the child protection services or other social services; diagnosis by a medical, social or judicial team using one or several information sources (caregivers or child interview, child symptoms, child physical examination, and other medical record review). The assessment made only by the caregiver was not accepted because 80% or more of maltreatment, other than sexual abuse, has been estimated to be perpetrated by parents or parental guardians [2]. Thus, the caregiver likely would not want to reveal that his child is maltreated. Comparative studies of any design examining the results of tools identifying abused children in two population groups (abused children and not abused children) were accepted (case control, cohort, and cross-sectional studies). Descriptive studies with only one group of abused or not abused children, of which the aim was to estimate one accuracy parameter, were also accepted. To avoid missing any potentially relevant tool, no particular setting nor category of patients were used as inclusion or exclusion criteria.
We did not consider tests to identify abusive caregivers, abused children after their death or children victims of intimate-partner violence. Articles were also excluded when they did not provide original data. Tests that identified abused children after their death were excluded as they are by definition not relevant for early detection. Intimate-partner violence, regarded as a separate form of child maltreatment by several authors, was excluded because the main victim is not the child [2].

Study selection
Eligibility of studies was checked by a junior epidemiologist and pediatrician (MB), from April, 2012 to May, 2012, and the resulting selection checked by a senior medical epidemiologist (LRS). Articles were first screened by titles. They were excluded when the title showed that the article did not address accuracy of tools identifying abused children. If the title did not clearly indicate the article's subject, the summary was read. Abstracts were retained for full review when they met the inclusion criteria or when more information was required from the full text to ascertain eligibility.

Data collection process, data items and analysis
The first assessment of selected papers was done by MB, and results were discussed in regular meetings by both epidemiologists MB and LRS. To reduce the likelihood that potentially relevant articles were missed, reference lists from relevant articles were checked. From each included study, we abstracted information about study design, population characteristics, number of participants, screening instrument or procedure, abuse or neglect outcome, and estimates of diagnostic accuracy. Results were not mathematically pooled due to varying methods and types of child abuse identified.

Quality assessment
The selected studies were assessed by MB and reviewed by LRS, using the QUADAS-1 criteria to assess quality of studies of diagnostic accuracy [16]. The standardized checklist included 15 criteria, grouped according to the domains defined by QUADAS-2 [17].
Two criteria related to patient selection: 1) patients were representative of a spectrum of population including all stages of maltreatment before the death of the child; 2) selection criteria were well described. Table 1 Search terms used to identify potentially eligible articles

FRANCIS/ PASCAL
("child abuse" OR "child maltreatment" OR "child neglect" OR "child sexual abuse" OR "battered child syndrome" OR "munchausen syndrome" OR "shaken baby syndrome") AND ("diagnosis" OR "measurement" OR "screening" OR "physical examination" OR "diagnostic" OR "scoring system" OR "score" OR "assessment") AND ("test validation" OR "validity" OR "sensitivity" OR "specificity" OR "predictive value" OR "diagnostic accuracy" OR "likelihood ratio") Three criteria related to the index test: 3) the index test was described in sufficient details to permit replication; 4) when the index test was a score, the cutoff was determined before results were available; 5) the index test was interpreted without knowledge of the results of the reference standard.
Three criteria related to the reference standard: 6) the reference standard correctly classified patients; 7) the reference standard was described in sufficient details to permit replication; 8) the reference standard was interpreted without knowledge of the results of the index test.
One criterion related to both the index test and reference standard: 9) the reference standard and the index test were independent.
Five criteria related to flow and timing: 10) the whole population or a random selection received the reference standard; 11) the study population received the same reference standard; 12) the time period between the reference standard and the index test was short enough so the situation of the child did not change; 13) uninterpretable test results were reported; 14) uninterpretable test results were well-balanced between the reference standard and the index test.
One criterion related to applicability: 15) same clinical data available when test results were interpreted as would be available when the test is used in practice.
Quality of studies was summarized by counting the number of criteria that were respected. Results of the final selection and analysis where reviewed by another senior medical epidemiologist (VL) and a senior pediatrician (PP).

Assessment of tools adaptation to screening
Tools were considered adapted to screening, according to the WHO criteria on the adequacy of tests used in screening programs [18], if they fulfilled the following criteria: 1) identify abused children before they have serious consequences of child maltreatment; 2) identify abused children with a high sensitivity; 3) identify abused children with a high enough specificity to avoid stigmatization of caretakers who were not abusers.

Study selection
Of 2 280 references identified in the databases, 524 were selected from their title, of which 137 abstracts were read; after exclusion of duplicates, 92 full articles were assessed ( Figure 1). Studies excluded for lack of reference standard were case-control studies with control groups recruited in the general population without verifying if children were abused or not. Studies were excluded when the reference standard was only the opinion of caregivers who had been asked whether their children were abused or not. One study was excluded because the method of the index text, an assessment by primary care clinicians, was not described [19]. Finally, one study was excluded because an unknown number of children less than fifteen years old examined in a medical center, who should have been tested during the study period, had not received the index test but were not registered [20]. This limit was noticed because several abused children identified by the reference standard and who had inclusion criteria, had not received the index test by the medical team and were not reported. Thirteen articles met the inclusion criteria. The outcome of interest was sexual abuse in four studies [21][22][23][24], physical abuse in seven [25][26][27][28][29][30][31], psychological abuse in one [32], and several forms of child maltreatment (physical abuse, psychological abuse, sexual abuse, and physical neglect) in one [33]. Eight studies were prospective [21][22][23][24][25][26]32,33], and five retrospective assessment of the diagnostic accuracy [27][28][29][30][31].

Quality of studies
The maximum number of quality criteria met was eight of fourteen, and five studies met four or less criteria ( Table 2). The accuracy of the reference standard was never determined because no gold standard to identify abused children is available. We could not judge patients representativeness, by lack of sufficient information about methods of patient recruitment [21,24,26,28,[30][31][32][33], or refusal by many families, for undocumented reasons [22,23]. In three studies, details on the imaging technique or assessment of impact trauma were not sufficiently described to replicate the index test [25,27,28]. The reference standard was different in the three case-control studies [21,22,31]. In one study, the result of the index test was used to establish the final diagnosis [23]. The time period between the two tests was rarely available; in one study, it was on average 36.4 weeks, so that the situation about child abuse could have changed [33]. We could not judge if the circumstances of test evaluation were the same than in routine practice, by lack of information about the kind of practice considered [22,[25][26][27][28][29]31,33].

Diagnostic accuracy Identification of physical abuse
Four studies were about children with inflicted head injury (Table 3) [25][26][27][28]. One test identified abused children among those admitted to a tertiary care pediatric hospital for acute traumatic intracranial injury, when caregivers reported no history of trauma or a history of low-impact trauma, i.e. with a fall from ≤ 3 feet or with other low-impact non-fall mechanisms [27]. The other tests identified abused children by using findings of physical examination or Computer Tomographic among children hospitalized in Pediatric Intensive Care Units [25,26], Neurosurgical [25,26] or Emergency departments [25,26] or a regional pediatric medical center [28] for head trauma. A prediction rule combining four variables (hygroma; convexity subdural hematoma without hygroma; no fracture; and interhemispheric subdural hematoma in Computer Tomographic images at clinical presentation) could identify 84% of abused children [28].
Three studies estimated accuracy of tests identifying physical abuse and were not limited to intentional head trauma [29][30][31]. A decision tool based on three questions (age of child; localization of bruise during the initial 72 hours of patient's admission; and confirmation of accident in public setting) identified abused children among children aged 0 to 4 y admitted to a Pediatric Intensive-Care Unit, with a sensitivity of 97% (95% CI: 84-100) [31]. In another study, presence of bruises in the same body site than a fracture identified 26% of abused children among children with acute fractures referred for possible child abuse to a specialized team [30]. Finally, a score was developed to identify physical abused children 14 years old or younger, with at least one diagnosis of injury as defined by the International Classification of Disease (ICD-9), 9 the revision (codes 800 to 959), in 1961 hospitals in 17 states of the United States. The 26-point score based on presence of fracture of base or vault of skull (1 point), eye contusion (3 points), rib fracture (3 points), intracranial bleeding (4 points), multiple burns (3 points), and age of the child (3 points for age group 1-3 y, 12 points for age group 0-1 y) identified 87% of physical abused child when the score was ≥ 3 [29].

Identification of sexual abuse
The sensitivity of tests using the results of children anal and genital examination were estimated at best at 56% (95% CI: 33-77), and the specificity at 98% (95% CI: 91-100) [22,23] (Table 4). The frequency of a variety of sexual behaviors of the child over the previous six months prior to assessment was not associated with sexual abuse [24]. A list of 12 symptoms expressed by the child, such as difficulty getting to sleep, change to poor school performance, or unusually interest about sex matters, identified sexual abused children when caretakers reported at least three symptoms, with a sensitivity of 91% and a specificity of 88% [21]. The setting in which the studies took place were consultations with specialized team in child abuse, or when a control group was chosen, consultations at pediatric clinics for well-child examination or others complaints.

Identification of psychological abuse
In a self-administered questionnaire, children were expected to indicate how often they experienced a given parental/caregiver behavior ( Table 4). The scale was administered to children aged 13-15 years without specific complaints attending a school within the city of Colombo. At a cutoff of 95 and greater, 20 of 26 abused children were identified [32].

Identification of several forms of child maltreatment
The Childhood Trauma Questionnaire is a 70-item screening inventory that assesses self-reported experiences of abuse and neglect in childhood and adolescence (Table 4).
Accuracy was estimated for each form of child maltreatment in an adolescent psychiatric population. Physical neglect was defined as the failure of caretakers to provide for a child's basic physical needs like food or clothing. The estimated sensitivity and specificity were the best for sexual abuse. The sensitivity were estimated at 86% (95% CI: 71-94), and the specificity at 76% (95% CI: 67-83) [33].

Adaptation to screening
Identified tools were not adapted to screening because of low sensitivity and late identification of abused children when they have already serious consequences of maltreatment.

Discussion
Assessment of the accuracy of instruments is difficult, because there is no gold standard for identifying abused children. To optimize the reference standard, opinion of experts or medical, social or judicial teams are usually used [21,[24][25][26][27][28][30][31][32][33], but the accuracy of these assessments is not known. Furthermore, the information used for this assessment was rarely specified so that it was difficult to verify the independence between the index test and the reference standard. The incorporation of index test results in the reference standard would overestimate accuracy of the test [21,25,26,28,29,31,33]. Chang et al used the International Classification of Diseases (ICD), 9 th Revision, and E-codes (External cause), used to categorize intent and mechanism of an injury, for reference standard [29].
In a recent study in the Yale-New Haven Children's hospital from 2007 to 2010, the specificity of coding injuries as physical abuse was 100% (95% CI: 96-100). But the sensitivity was low: among the 43 cases determined to be abused by the Child Abuse Pediatrician, four were miscoded as accidents, two as injuries of undetermined cause, and four did not receive any injury code [34]. In 1991-1992 in California, the sensitivity of hospital E-coded data in identifying child victims of intentional injuries had been estimated at 75% (95% CI: 64-84) [35]. This classification underestimates the number of abused children, therefore does not seem to be a good reference test. Cases of child physical abuse are considered as accidents and cases classified as physical abuse are not representative of all the cases of physical abuse, because some cases did not receive any injury code. In this systematic review, the quality of selected studies was low, even when not considering the criterion related to the reference standard. Available information was often insufficient to make a judgment for many criteria. Some of the limitations, for instance the utilization of the index test to establish the final diagnostic, are particularly worrisome as they reflect an important misconception of what is good diagnostic research. This overall poor quality likely limits the validity of the selection of studies, as many could have been excluded on the basis of quality alone. Clearly, the quality of reporting of studies of diagnostic accuracy on child maltreatment needs to improve. Furthermore in five studies, the retrospective evaluation based on a review of records could have introduced bias [27][28][29][30][31]. And in the three case-control studies, the performance of index test could have been overestimated because of the increase of differences between both groups by excluding children for whom maltreatment is difficult to diagnose [21,22,31]. We were interested in tools identifying abused children as early as possible in the evolution of child maltreatment. Existing instruments reported to diagnose child maltreatment were not designed for screening. Many tools identify abused children when they have already clinical consequences of child maltreatment, such as head injury, fracture, or behavior problems [21,[24][25][26][27][28][29][30][31]. The identification of abused children already at the clinical stage comes too late. The performance of tests was also not adapted to screening. Screening instruments require high sensitivity for missing very few abused children. In our synthesis, most sensitivity estimations were low [22][23][24][25][26][27]30,32,33]. Furthermore, the specificity of tests is also important because of the negative effects of a misidentification, in particular the psychological impact and the effect of a potential stigmatization on the child and his parents [36]. As usual, when the sensitivity of the test was high, the specificity was often low [25]. The sensitivity was greater than 90% and the specificity greater than 80% only for two tests [21,31]. However, one was a decision tool to identify physically abused children among those hospitalized in a Pediatric Intensive Care Unit, so that children had severe injuries [31]. The other test was based on twelve child symptoms to identify sexually-abused children [21]. These symptoms could be severe psychological consequences as depression: sudden emotional and behavior changes, changes to poor school performance, frequent stomachaches, difficulty getting to sleep or sleeping more than usual.
Child maltreatment is the "disease" of both the child and his caregiver. Obviously, an abusive caregiver is defined by his abusive behavior and child maltreatment begins by abusive behavior of caregiver. This abusive behavior is responsible for poor health and development of the child. Thus, identification of child maltreatment could consider the identification of both the abused child and his abusive caregiver. Two self-report questionnaires were directed to children who had to indicate if they had experienced given behaviors of parents or caregivers [32,33]. As only children old enough for reading could answer, these questionnaires cannot help reduce deaths in the most vulnerable groups. Indeed, fatal child maltreatment occurs most frequently when children are younger [2,[37][38][39]. Over a half of the 600 victims of child maltreatment under five years reported to the National Violent Death Reporting System of the United States of America from 2003 to 2006 were under one-year-old [40].
The WHO definition of child maltreatment is problematic as it is defined by consequences of neglectful or abusive behaviors that, themselves, are not defined [1,3]. Similarly, the Article 19 of the United Nations convention on the rights of the child, stating "all forms of physical or mental violence, injury and abuse, neglect or negligent treatment, maltreatment or exploitation, including sexual abuse" does not define these behaviors. Moreover, proposed definitions based only on abusive behaviors can vary widely. For example, physical contact or penetration are applied before defining reported experiences as sexual abusive by some authors and not others [41][42][43][44]. Instruments designed to diagnose abusive caregivers such as the Child Abuse Potential Inventory [45], the International Society for the Prevention of Child Abuse and Neglect (IPSCAN) Child Abuse Screening Tool-Parent [46] measure these potential abusive behaviors of caregiver. Consequently, what they measure is not well known and defined. Furthermore they can identify only child maltreatment which is directly due to the questioned parent. These problems might explain why child maltreatment is usually recognized only when the child has consequences of abusive behaviors.
Due to the lack of knowledge of the evolution of child maltreatment, studying the accuracy of diagnostic instruments identifying abused children early remains challenging. Research is required to define what subclinical and clinical abusive behaviors are and when the child maltreatment begins. A multidisciplinary approach might be necessary to correctly identify child maltreatment because of its multiple targets, the child and the caregiver. Input from adult psychiatry is necessary to be able to assess the potential abusive behaviors of caregivers. One might reasonably hypothesize that tools based on simultaneous assessment of potential abusive behaviors and health and development of the child could allow earlier identification of abused child or abusive caregiver than tools based only on separate assessments of the child or caregiver. However, if a combined approach is likely to be more sensitive, it might also be less specific. Furthermore, because of the several types of child maltreatment and the varied consequences to children, several tests might be necessary to screen all types of child maltreatment. The final value of features used for screening will also depend on the prevalence of these features.
We reviewed studies only in French and English and only published studies in databases, and might have excluded interesting research. Also, one of our inclusion criteria was that the aim of the study was clearly to estimate the diagnostic accuracy of a test identifying abused children. This might have disqualified some studies in which some parameters of diagnostic accuracy could be estimated. Finally, we were interested in all forms of child maltreatment and all types of tools and we have not specified a particular such as emergency departments. Depending on the context, some tools could not be applied: for example a test requiring a specific laboratory result if the laboratory exam cannot be performed routinely. Besides, we reviewed the evidence on the accuracy of instruments for identifying abused children during any stage of child maltreatment evolution before their death. Thus both diagnostic and screening studies could be included in our review. We evaluated among the selected studies if accurate screening instruments were available. However the fact that screening test is sensitive and specific is not enough. The side effects, the reliability and the cost of the test should be also considered. Indeed before considering a screening program of child maltreatment, several other criteria need to be respected [18]. A screening program should also be acceptable to families and professionals. Negative effects for the family are consequences of false negatives (children identified wrongly as not abused) and of false positives (children identified wrongly as abused and parents identified wrongly as abusers). The stigmatization of families is an important ethical issue. Furthermore, confirming the relevance of screening of child maltreatment is not enough, as the modalities of the program should also be specified, including the site; the relevant target population group if screening is not mass screening, the child age at the time of screening, and the frequency if screening is repeated. At last, a screening program could become useless because of effective primary prevention program of child abuse. Several primary prevention programs, such as the Nurse Family Partnership [47] and the Early Start [48], have been proposed, but the evidence is currently insufficient to assess the balance between benefits and harms of primary care interventions [49].

Conclusions
There is very scarce and low-quality evidence on the accuracy of instruments for identifying abused children. Child maltreatment is mostly identified when children have already serious consequences and the sensitivities and specificities of tools are inadequate. Before considering a screening program of child maltreatment, better knowledge on the beginning of child maltreatment and development of valid screening instruments at subclinical stages remain necessary.