- Research article
- Open Access
Reliability and validity of the Caregiver Reported Early Development Instruments (CREDI) in impoverished regions of China
BMC Pediatrics volume 20, Article number: 475 (2020)
There is a great need in low- and middle- income countries for sound qualitative and monitoring tools assessing early childhood development outcomes. Although there are many instruments to measure the developmental status of infants and toddlers, their use in large scale studies is still limited because of high costs in both time and money. The Caregiver Reported Early Development Instruments (CREDI), however, were designed to serve as a population-level measure of early childhood development for children from birth to age three, and have been used in 17 low- and middle-income countries. This study aimed to examine the reliability and validity of the CREDI in China, which is still unknown.
The CREDI and the ASQ-3 was administered to a sample of 946 children aged 5–36 months from urban and rural communities, in which 248 children was administered with Bayley-III.
The internal consistency of the CREDI was high, which indicates that the scale internal consistency reliability is quite good. The results also indicated that the concurrent validity of the CREDI with the Bayley-III scale was high in general. Ordinary least squares regression showed that the CREDI is highly consistent with previous widely used instruments in some key predictors (such as the home stimulation) of early childhood development level.
All the results in the current study indicate that the CREDI may be considered an appropriate instrument to measure early childhood development status on a large scale in impoverished regions of China.
It is widely known that the emotional, social, and cognitive skills that emerge in early childhood are important prerequisites for success in school, employment, and potential income in the later stages of an individual’s life [11, 12, 29]. The period from birth to 3 years is also the stage when development is most rapid and children at this stage begin reaching basic development milestones. Therefore, early childhood is very sensitive to environmental effects and is also a period suitable for interventions that alleviate effects of external risk factors [6, 17, 34,35,36].
Early childhood development (henceforth, “ECD”) has been recognized by governments and Non-Governmental Organizations as a window of opportunity to improve the level of individual development and the social and economic well-being of society as a whole . Under this context, continuous monitoring of ECD outcomes using culturally and developmentally appropriate instruments can provide useful information for developing more effective intervention strategies . Moreover, population-level measurement of ECD is necessary to improve ECD outcomes and reduce developmental inequality through national, regional, and global policies .
Although there has been significant progress in supporting and monitoring ECD and developing instruments assessing ECD, there are few effective and reliable instruments available to assess children’s early development status at a large scale in different cultural environments . As Kelly et al.  pointed out, children have different ways and times to acquire motor, cognitive, and language skills in different settings. Differences in preschool children’s cognitive and social emotional skills at the national level were found to be related to the country’s socioeconomic and nutritional status . The assessment of ECD in different regions of the world helps us understand commonalities and differences in, and factors contributing to ECD, thus providing useful information for developing more effective intervention strategies.
In summary, population-level measurement and evaluation of ECD is a key issue that needs to be solved in the current era. Unlike individual assessment instruments, population-level measurement tools need to be simple and inexpensive to implement and require cross-cultural comparisons . In this context, a new instrument, the Caregiver Reported Early Development Instruments (CREDI), was developed. The CREDI was designed as a caregiver-reported, cross-culturally comparable, population-level measure of ECD for children under 3 years [25, 26]. The goal of the CREDI is to provide low-cost, large-scale data to facilitate policy interventions and resource allocation, while tracking global progress in alleviating ECD-related disparities around the world . The reliability and validity of the CREDI have been studied and its applicability to the evaluation of ECD levels in low- and middle-income countries has been confirmed [1, 25, 27], but there is still a lack of research about its application in China, and there are still no studies on the application of CREDI in a longer format (henceforth, “long form”). Based on this, this paper introduces and analyzes the application of the CREDI long form in China, with special focus on its reliability and validity, based on survey data from poverty-stricken areas in China.
In low- and middle-income countries, about 249 million children under the age of five are at risk of poor development, of whom 17.43 million (about 8%) are in China, ranking second in the world . Studies show that concern for ECD in poverty-stricken areas of China is particularly acute [23, 45, 46, 50]. About half of children in poor rural China are at risk for cognitive delays; 52% of children are at risk for language delays, and the risk of delay increases over time .
The Chinese government has made many efforts to promote appropriate ECD. At the Central Economic Work Conference in December 2018, “increasing investment in preschool education, early childhood development and vocational education in rural poverty areas” were listed as key tasks. In May 2019, the General Office of the State Council officially issued the “Guiding Opinions on Promoting the Development of Infant and Child Care Services under 3 Years of Age,” and clearly established policies and regulations, standards and norms, and service supply systems to promote the development of childcare services. Childcare services will be implemented in various forms to gradually meet the people’s needs. The Chinese government’s policies reflect its determination to promote ECD and related public service systems. In this context, data collection and evaluation of population-level ECD is particularly urgent. The application of a population-level assessment instrument for ECD has also become an important element in guiding how the Chinese government can effectively implement childcare services policies.
Existing measures of ECD
Several international instruments have been developed to comprehensively measure ECD, such as the Griffith Mental Development Scales, the Denver Developmental Screening Test, and the Bayley Scales of Infant and Toddler Development, which are direct assessment tools usually done by clinically trained personnel  for screening and diagnosis of children with developmental disabilities or delay. Among these measures, the Bayley Scales of Infant and Toddler Development is more widely used in China. Although these individual screening tools have detailed, standard, and practical advantages in obtaining information on a child’s developmental status, there are limitations in providing population-level measurement of ECD because the costs of copyright purchases, adaptation, administration time, and training of the administrators are often relatively high, making them unsuitable for large-scale use . Moreover, this assessment tool directly engages with the child, which may result in measurement errors caused by external factors, such as temperament of the child (e.g. some children might be too shy because of unfamiliarity with the testing environment and the tester) or the ability of children to understand the verbal instructions given.
Therefore, some indirect assessments, which are reported by the caregiver, such as the Early Development Index , PRIDI , IDELA , and the Early Childhood Index , may be more suitable for capturing the population’s ECD at this age. Moreover, accessing the data in this way is scalable. However, these instruments are still limited because of the small number of measurement items or their age limits (somewhat concerned with older children rather than 0–3-year-old children). Then the Ages and Stages Questionnaire, third edition (ASQ-3) and the Caregiver Reported Early Childhood Development Instruments (CREDI) were developed for assessing ECD status of infants and toddlers. Compared to direct assessment tools, an instrument reported by caregivers requires less training and testing time, and such an instrument is less likely to be biased by children being unfamiliar with clinical assessments, potential behavior changes with strangers, or not understanding verbal instructions [9, 37]. Previous studies about the reliability and validity of the ASQ-3 and the CREDI will be introduced in detail below.
With respect to the ASQ-3, it is a caregiver-reported measurement that asks caregivers to report different aspects of their children’s behavior to assess development . Although its items have good to acceptable internal consistency , previous studies on the validity of ASQ-3 have varied significantly. For sensitivity (the rate at which a screening instrument correctly identifies a developmental delay) and specificity (the rate at which a screening instrument correctly identifies children who perform within the normal range), respectively, the following values have been reported: 75 and 86% ; 66 and 84% ; 82 and 78% ; 87.50 and 84.48% [45, 46]. Previous studies have also assessed the effectiveness of the ASQ-3 as a screening tool by comparing it with the Bayley-III and have found weak or moderate consistency between the two instruments [42, 49]. Some studies show that the ASQ-3 is better able to assess children’s development when children are older rather than when they are younger [31, 32, 40, 49]. The inconsistent results may be related to differences in reference measurement, age of the sample, cultural environments, and item explanations. Furthermore, measurement errors caused by the reference instrument itself may also be an explanation . In spite of the discrepancies, a study by Wei et al. [45, 46] also pointed that ASQ-3 have good reliability and validity in mainland China, and can be used in the development screening and monitoring of eligible children in mainland China. Therefore, we also chose the ASQ-3 as a comparison scale.
The CREDI differs from the ASQ-3 as a population-level assessment tool because is not used to screen and diagnose an individual’s specific developmental problems or developmental delays, but provides caregivers with feedback on the child’s developmental status or tracks subtle changes in individual levels through intervention. This instrument is simpler and less burdensome to test, and parental involvement in testing may also help them gain important knowledge about their child’s development and understand their child’s performance at that age. It is an open resource that can be downloaded from the website https://sites.sph.harvard.edu/credi/ freely. It can serve to provide conceptually rich, developmentally informed, population-level data on global progress in alleviating ECD-related inequities and meeting target 4.2 of the UN Sustainable Development Goals .
The CREDI has been piloted and applied in many low- and middle-income countries around the world, and its reliability and validity have also been studied and analyzed. By analyzing 2481 caregivers of children aged 18–36 months in Tanzania, McCoy et al.  evaluated the acceptability, test-retest reliability, internal consistency, and discriminant validity of the newly developed CREDI items, subscales, and total scale. The results showed that the CREDI and its motor, cognitive, and social-emotional subscales had sufficient acceptability and internal consistency. It also found positive evidence for the validity of the CREDI by showing adequate criterion validity with the Bayley-III motor, cognitive, and communication subscales. The study also found that the CREDI can accurately distinguish differences in children’s ages, nutritional status, disabled status, and home stimulation activities. In addition to providing positive evidence of validity, the CREDI has been found to be a more acceptable tool in low-income environments because it is easily understood and quickly implemented, which was indicated by trained field staff with the equivalent of a secondary education level only spending about 20 min finishing the test on average. Moreover, it was found that this kind of caregiver-reported instrument is beneficial for reducing errors resulting from the non-compliance issues caused by factors such as unfamiliarity with the test environment, fear of unfamiliar adults, and children’s illnesses. However, coverage of the study in Tanzania was insufficient because it only included 18- to 36-month-old children, resulting in a lack of data regarding children younger than 18 months. The authors also suggest that before the CREDI is fully disseminated, more research in multilingual and cultural environments and lower age groups need to be conducted.
Another study about the application of the CREDI was also conducted by McCoy et al.  with 8022 participants from 17 low- and middle-income countries. The results showed that the CREDI short form is an effective, reliable, and acceptable population-level measure of ECD. Feedback from qualitative interviews with caregivers and field team members shows that participants have a good understanding of the CREDI, and it is easy to implement. Internal consistency was also sufficient. The results also show that the CREDI score differs among different social demographic subgroups. The criterion validity was also tested to be sufficient through the correlations between the CREDI and alternative ECD “gold standard” instruments. This study fully demonstrated that the CREDI short form is effective, reliable, and acceptable in measuring population-level ECD status in different cultural environments. Based on this, the authors suggest that the CREDI can be used as a useful tool to monitor ECD status in low-resource, low-cultural settings and in large-scale household surveys, while recommending the CREDI and other indicators be used together.
By using data from 1265 caregivers of infants and toddlers aged 0–35 months in Brazil, Altafim et al.  conducted a study to assess the acceptability, test-retest reliability, internal consistency, and discriminate and concurrent validity of the CREDI short form. The results of qualitative interviews showed that overall acceptance of the scale was high. Internal consistency was very high in the six age groups, with a coefficient greater than .8, but there were fewer participants in the 0–5 and 18–23 age groups, and therefore further research is required. Multivariate analysis of structural validity showed that some of the significant variations in CREDI scores could be explained by the child’s gender and family characteristics, such as women’s education levels, socioeconomic status, and stimulating activities of the family. Regarding concurrent validity, the CREDI score was significantly correlated with the PRIDI score, with a correlation coefficient of .46. In summary, the results of the study in Brazil show that the CREDI short form has high validity, reliability, and acceptability, which suggested that it can be used for assessment of ECD status on a large-scale in Brazil.
These studies conducted in-depth research on the application of the CREDI in low- and middle-income environments and mainly focus on the CREDI short form. Nevertheless, the reliability and validity of the CREDI long form in China has not yet been studied. The Chinese government’s recent commitment to the policy about childcare services urgently requires appropriate instruments to assess ECD status at the population level. Based on the previous review, the Bayley-III, although the gold standard, is not suitable for large-scale use because of its high cost of administering and administering requirements. Alternatively, although the ASQ-3 is suitable for large-scale use as a direct evaluation tool, there are still discrepancies in the data it provides. As an alternative measurement tool for the ASQ-3, the recently developed CREDI tool for the assessment of population-level ECD status in 17 low- and middle-income countries has been widely studied and recommended. However, before it is used in China, analysis of its reliability and validity is especially necessary. Therefore, the goal of this study was to evaluate the reliability and validity of the CREDI long form as a measure of ECD status at a population level in rural China. To do so, we administered the Bayley-III to a subsample of the total sample and administered the CREDI long form and ASQ-3 to their caregivers in the total sample. We then compared the outcomes of the CREDI test to those of the Bayley-III and the ASQ-3.
The data used in this study were collected in a sample of 995 children aged 5–36 months from urban and rural communities in July 2018. It was based on data from a randomized controlled trialFootnote 1 that required implemented intervention on children and their caregivers. However, in China, the traditional custom is that children from 0 to 5 months of age rarely go outside. Therefore, children in this age group were not the targeted group of the intervention and thus not in our study sample. The sample area is representative of one nationally-designated poor county in the Qinba mountain region of China. Each toddler’s primary caregiver was administered a detailed survey on parental and household characteristics, including each toddler’s age, gender, gestational age, presence of any siblings, whether the mother is the primary caregiver or not, maternal education, and household economic status.
All infants/toddlers were administered the two different scales to measure developmental outcomes: the CREDI long form and the ASQ-3. Considering this was the first application of the CREDI in China, the author’s research team engaged a professional translation company to conduct accurate translation and back-translation for the items and relative materials of the CREDI.Footnote 2 The translation was welcomed by the CREDI team. The items translation was also sent to specialists in the field of child development for consultation and we conducted a pilot study in rural China before this survey to check whether the translation of the questions was clear and suitable for rural caregivers. Additionally, out of the 995 sample infants/toddlers, 258 were tested with the Bayley-III scales for their levels of cognitive, language, or motor development by enumeration teams.
It should be noted that the final sample for analysis is less than the full sample because of missing data. It comes from two sources: first, there is a very small proportion of missing values in key measures. We imposed strict quality control during data collection to avoid missing data and there were no missing data at the item level of the CREDI and Bayley-III, and less than 2% missing values at the item level of the ASQ-3. Second, there is less than 0.05% of child and family characteristics data missing in our sample, thus, we excluded this small proportion of missing sample in all analyses. Due to the small size of missing data and negligible effects on analysis, we excluded them from the analysis.
All study protocols were approved by institutional review boards (IRBs), both at Stanford University (No. 46564) and the West China School of Medicine, Sichuan University, China (No. K2018074). Caregivers provided written consent for their own participation and the participation of their children after a field worker read the consent form out loud and answered any questions. All study staff were trained and monitored in IRB-approved procedures for identifying participant needs.
The Bayley scales of infant and toddler development, third edition (Bayley-III)
As one of the most widespread scales used to measure developmental status of infants and toddlers aged between 0 and 42 months, the Bayley-III is considered the gold standard in the field. The Bayley-III includes 326 items divided into five domains: cognition, receptive communication, expressive communication, fine motor, and gross motor. As each item is administered, the examiner records the child’s response and stops when there are five consecutive items wrong. The child gets 1 if he or she met the scoring criteria ; or else, the child gets 0. Then the sum of scaled scores for a given composite is calculated for each child in the normative sample from the Unites States. The scaled score with a mean of 10 and a standard deviation of 3 to composite score with a mean of 100 and a standard deviation of 15 equivalent is a linear conversion . The Chinese version we used in the current study has been widely used in many researches on the early childhood development in China [23, 48,49,50]. The Chinese version is properly translated and back-translated by professional team.
The ages and stages questionnaire, third edition (ASQ-3)
The ASQ-3 is another widely used instrument to measure the developmental status of infants and toddlers aged between 1 and 66 months. The ASQ-3 includes 21 questionnaires, and each questionnaire consists of 30 simple, straightforward questions about five domains of childhood development: problem-solving, communication, gross motor, fine motor, and personal-social. The answer to each question is selected from three possible responses: “yes,” “sometimes,” or “not yet.” Caregivers should select “yes” if the child shows a specific behavior, “sometimes” if the specific behavior is occasional or new, or “not yet” if the question refers to a behavior the child has not yet shown. The total score of each domain is determined by calculating the score of six questions in each domain. By comparing the total score of the five domains with the threshold value (It equals the mean value minus 2 standard deviations of each domain) of the corresponding domain obtained by empirical research, the development status of the child can be determined. The Chinese version we used in the study is validated by Bian et al.  and Wei et al. [45, 46].
Caregiver reported early childhood development instruments (CREDI)
Newly developed scales used to measure ECD status at the population level, the CREDI,Footnote 3 aim to provide an accurate and easy-to-administer assessment of ECD for children between 0 and 35 months that functions across a wide variety of cultural, linguistic, and socioeconomic contexts [1, 25]. CREDI is directly tested by the child’s primary caregiver using a scale that is answered “yes” or “no” (If caregivers are unsure of their response, they may also choose to respond by saying “don’t know”). The CREDI team also set up the credi package in the software program R to guide users scoring the CREDI long form .
As part of a larger project, both a short and a long form of the CREDI were developed from the same broad item set. The long form produces a score for each of the domains, namely cognitive, language, motor, and social-emotional developmental status. The goal of the long form is to provide detailed information to researchers interested in measuring specific developmental domains. The long form consists of a total of 108 items, and the starting point is determined according to the child’s age, and the ending point is determined according to a five-link error/uncertainty factor. In contrast, the short form can produce a total score for the child’s overall developmental status, which contains 20 items selected to characterize children’s development within predefined six-month age bands. For research and evaluation projects, the long form will provide domain-specific details about the child development, which can capture differences in the specific skills that help the design of the intervention targeted to improve child development. In the current study, therefore, the long form was used and evaluated.
In sum, all of the tests can capture the children’s developmental status for each domain. The age range covered in each test is different. The age period of the CREDI is the shortest, from 0 to 35 months; the age period of the Bayley-III is moderate, from 0 to 42 months; the age period of the ASQ-3 is the longest, from 1 to 66 months. The CREDI can cover younger children, while the ASQ-3 can cover older children compared to the CREDI and Bayley. In terms of administration, compared to the Bayley-III, both the CREDI and the ASQ-3 are shorter in duration and easier to administer. Actually, during our survey, the CREDI was very simple and clear enough to be answered by a caregiver with minimal formal education. The cost of administration was also lower than the Bayley-III.
Statistical analysis strategy
First, the descriptive characteristics of the sample were displayed to show the basic information about the corresponding sample and the ECD status measured by different instruments. Second, reliability was assessed with the items internal consistency and Cronbach’s α coefficients were used to interpret the internal consistency of these three questionnaires. Next, internally standardized correlations among these measures were calculated to test concurrent validity. At the same time, to obtain the heterogeneous analysis results, the samples were divided into age cohorts of 5–11 months, 12–17 months, 18–23 months, 24–29 months, and 30–35 months when calculating the correlations. The samples were also divided according to the type of caregiver and the household wealth status, and the correlations calculated. Finally, an ordinary least squares (OLS) regression was conducted to check the relationship between a set of variables shown to be related to child development and the scores on these three instruments. This analysis tested whether the three measures have consistent predictive factors or not. It is one way to determine the similarity of the three measures in terms of how they identify developmental status of the children. All statistical analyses were performed using Stata 14.2 statistical software.
First, the descriptive characteristics of the sample are displayed in Table 1. As shown in Table 1, 946 toddlers were included in our final analysis. Among these toddlers, only 248 were administered the Bayley-III. The distribution of child and family characteristics between the total sample and Bayley sample were mostly consistent. Generally, there were a slightly higher proportion of female toddlers; slightly over half had siblings; around 5% of the sample were born prematurely; the mother was identified as the primary caregiver for about 70% of the toddlers; the educational attainment of mothers was low overall – around half had junior high school and below. The household wealth status was moderate among all samples, and the wealth status of the Bayley sample was relatively better than the total sample.
Second, the ECD results are shown in Table 2. The mean scores (SD) from the CREDI indicated that the overall developmental status of our sample was only moderate. In terms of the Bayley-III scores, the Bayley-III has not yet been administered to a healthy reference population in China. As such, we rely on reference populations from other widely accepted research, which reveals that, for a healthy population, the mean score (SD) is expected to be 105 (9.6) for the cognitive scale [20, 33], 109 (12.3) for the language score , and 107 (14) for the motor score [7, 20]. According to the above standards, the developmental status of our sample was slightly below average. With respect to the ASQ-3, the mean scores of each domain were a little lower than the referenced mean scores shown in the ASQ-3 user guide. In sum, the results obtained from the three tests were generally consistent with slightly better results from the CREDI.
Third, the internal consistency of the CREDI, Bayley-III, and ASQ-3 are shown in Table 3. Both the CREDI and Bayley-III have large Cronbach’s α coefficients, which means the internal consistency of the two scales was high. For the CREDI, the Cronbach’s α coefficients of each subscale ranged from .92 to .97. When the internal consistency was examined by age group, it was found that the Cronbach’s α coefficients of each subscale decreased accordingly, but remained relatively high. For age 6–11 months, the Cronbach’s α coefficients of each subscale ranged from .81 to .87; for age 12–17 months, it ranged from .83 to .91; for age 18–23 months, it ranged from .74 to .93; for age 24–29 months, it ranged from .66 to .91; for age 30–35 months, it ranged from .60 to .89. Overall, the Cronbach’s α coefficients of the cognitive, motor, and social-emotional subscales decreased with age, but increased before 12 months. For the language subscale, the Cronbach’s α coefficients decreased with age, but increased before 24 months. Besides, it should be noted that the CREDI has unacceptably low internal consistency reliability (Cronbach’s α coefficients are below .7) in some places, such as the motor and social-emotional subscale within age 24–35 months.
For the Bayley-III, the Cronbach’s α coefficients of each subscale ranged from .97 to .98. The Cronbach’s α coefficients of each subscale decreased after the sample was divided by age group. Despite this, the internal consistency reliability indicated by the Cronbach coefficients was still very high. For the subscales of cognitive and fine motor skills, the internal consistency reliability increased with age between 6 and 23 months, while the internal consistency reliability decreased with age after 23 months. For the subscales of receptive communication and expressive communication, the internal consistency reliability increased with age between 6 and 17 months, while it decreased with age after 17 months. For the subscale of gross motor, the internal consistency reliability decreased with age across the five age groups.
In contrast, the ASQ-3 had a relatively lower scale internal consistency reliability. The Cronbach’s α coefficients of each subscale ranged from .41 to .70. Among the five subscales, the internal consistency reliability of the gross motor subscale was the highest and its Cronbach coefficient was .70; the internal consistency reliability of the personal-social subscale was the lowest and its Cronbach coefficient was .41. When the sample was divided by age group, the Cronbach’s α coefficients varied irregularly within different age groups.
Subsequently, the correlations between the CREDI and Bayley-III scores, the ASQ-3 and Bayley-III scores, and the CREDI and the ASQ-3 scores for each of their domains by age group were calculated respectively. P-values of the correlations were calculated by bootstrapping methods, with 1000 replications. As shown in Table 4, the results indicated that the concurrent validity of the CREDI with the Bayley-III scale was high in general. That is, CREDI cognitive, language, and motor subscales had strong correlations with the corresponding Bayley-III subscales. The correlation coefficients ranged from .84 to .90, among which the correlation between the CREDI motor subscale and the Bayley-III gross motor subscale was the largest. In contrast, although the correlation coefficients between the ASQ-3 communication subscale and the Bayley-III expressive communication and receptive communication subscales, and the ASQ-3 gross motor subscale and the Bayley-III gross motor subscale were significant at moderate levels, the concurrent validity of the ASQ-3 with the Bayley-III scale was relatively lower in general. With respect to the concurrent validity of the ASQ-3 with the CREDI, the results showed that only the correlation coefficients between the ASQ-3 communication subscale and the CREDI language subscale, as well as the ASQ-3 gross motor subscale and the CREDI motor subscale were significant at moderate levels, and the correlations in other domains were extremely weak.
The heterogeneous analysis of the CREDI was also conducted, as shown in Tables 5, 6 and 7. The correlations were calculated among the Bayley-III, the CREDI, and the ASQ-3 by age group, primary caregiver, and wealth status. Table 5 shows that the correlations between the CREDI and Bayley-III varied with different age groups. In general, the correlation between the CREDI and the Bayley-III was strong before 18 months but was relatively weak at 18 to 23 months, and was moderate after 24 months. When the correlations between the ASQ-3 and Bayley-III by age group were examined, it was found that, generally, the correlation in the domain of communication between ASQ-3 and Bayley-III was better within 12–29 months than other age periods. With respect to the correlations between the CREDI and the ASQ-3 by age group, it was found that within 5–11 months, only the correlation between the CREDI language subscale and the ASQ-3 communication was significant and at a moderate level. After 12 months, the correlations between each domain of the CREDI and the ASQ-3 were significant and moderate.
When the correlations among the three tests were examined by caregiver type and household wealth status, it was found that, in general, regardless of whether the primary caregiver was the mother or the grandmother, or whether the household wealth status was poor or rich, the correlations between the CREDI and Bayley-III were large and statistically significant. The correlations between ASQ-3 and Bayley III were only significant and moderate in the domains of communication and gross motor, and the correlations between the CREDI and ASQ-3 were significant but relatively small. This is shown in Tables 6 and 7.
To complete the analysis, the OLS regression results were reviewed to check whether the three instruments have consistent predictors. All the scores received from the Bayley-III, CREDI, and ASQ-3 were internally standardized before OLS regression.
As shown in Appendix Table, children from homes with higher stimulation obtained higher Bayley cognitive scores; the older the children, the higher the Bayley cognitive scores. When the same factors were used to predict children’s CREDI cognitive scores, some consistencies with the Bayley-III results were evident. That is, the higher the home stimulation, the higher the CREDI cognitive scores; and the CREDI cognitive scores increased with the child’s age. However, different from the Bayley cognitive, household wealth status was positively related to the CREDI cognitive with a very small effect (indicated by the small coefficient). The child’s gender was negatively related to the CREDI cognitive, that is, girls had higher CREDI scores than boys. Additionally, when the same factors were used to predict children’s ASQ-3 scores, in a similar way to the CREDI and Bayley III, home stimulation was positively related to ASQ-3 “Problem Solving”. Consistent with the CREDI while inconsistent with the Bayley-III, household wealth status was positively related to the ASQ-3. Different from the other two tests, type of primary caregiver was significantly related to the ASQ-3. When the primary caregiver was the mother, the ASQ-3 “Problem Solving” score was higher.
When the same factors were used to predict children’s language scores, it was found that children with higher home stimulation obtained higher Bayley “Receptive Communication” and “Expressive Communication” scores; the older the children, the higher the Bayley scores; girls’ Bayley scores were higher than boys’ s. When the same factors were used to predict children’s CREDI language scores, the results were consistent with the Bayley-III. However, different from the Bayley-III, household wealth status was positively related to CREDI language scores, while the correlation between household wealth status and Bayley III “Receptive communication” and “Expressive communication” was insignificant. When the same factors were used to predict children’s ASQ-3 communication scores, the results were a little different. Just as with the CREDI and Bayley-III, home stimulation was positively related to ASQ-3 “Communication”, and girls obtained higher scores than boys. Consistent with the CREDI while inconsistent with the Bayley III, household wealth status was positively related to ASQ-3. Different from the other two tests, the relationship between age and the ASQ-3 communication score varied with age group. Compared to age 5–11 months, only children aged 18 months and above were higher in ASQ-3 “Communication”.
When the same factors were used to predict children’s motor scores, the results were both consistent as well as inconsistent among the three instruments. Specifically, motor development measured by the three instruments was positively related to children’s age. With respect to the predictor “home stimulation”, the correlation between home stimulation and Bayley motor was insignificant, but home stimulation was positively related to CREDI motor and ASQ motor. The child’s gender was significantly related to CREDI motor rather than Bayley motor and ASQ-3 motor. Whether the child was premature or not was significantly related to the Bayley fine motor results, rather than the CREDI motor and ASQ-3 motor.
In terms of predicting the development of children’s social emotional data, only the ASQ-3 “Personal-Social” and CREDI “Social-Emotional” were assessed because of the lack of a Bayley-III “Social-Emotional” category in our study. The results showed that both the ASQ-3 and CREDI scores were positively related to home stimulation, and girls obtained higher social-emotional scores than boys.
Above all, there was high consistency in predicting ECD status among the three tests in some key predictors, such as home stimulation. That is, higher Bayley scores (except Bayley motor scores), CREDI scores, and ASQ-3 scores were positively related to home stimulation. There were also some consistent results in some individual and family characteristics. For example, the results showed the scores of the three tests indicating language and social emotional development level were higher for girls than boys. Nevertheless, there were inconsistent results in predicting ECD status in some individual and family characteristics. For example, children’s gender was not significantly related to Bayley-III cognitive and motor scores, but was closely related to CREDI scores. Household wealth status was not significantly related to Bayley-III scores, but was positively related to CREDI and ASQ-3 scores indicating cognitive and language development level. With respect to caregiver type, the results of Bayley-III and CREDI scores suggested there was no significant association. However, the results for ASQ-3 scores indicated the caregiver type was connected to the ASQ-3 “Problem- Solving” scores.
In general, the results showed relatively consistent predictors of scores about the level of ECD through different measurements. It can be concluded that as a caregiver-reported, population-level measurement for children’s development, the CREDI is highly consistent with previous widely used instruments in some key predictors (such as home stimulation) concerning the ECD level. Moreover, the CREDI is highly consistent with indirect assessment, namely the ASQ-3, in some individual and family characteristics (such as the children’s gender and household wealth status).
From the above information, it can be concluded that the administration time, difficulty, and cost of the CREDI is more advantageous than the Bayley-III, and the internal consistency reliability and validity of the CREDI is also more advantageous than another indirect measurement, that is, the ASQ-3.
First, according to the results shown above, the Bayley scales have very good internal consistency, whereas the ASQ-3 has unacceptably poor internal consistency reliability. In contrast, the Cronbach’s α coefficients of each CREDI subscale were large, despite declining when the sample was divided by age group, which indicates the internal consistency reliability of the CREDI was still good in general. However, it should be noted the CREDI has unacceptably low internal consistency reliability in the motor and social-emotional subscale within age 24–35 months. Second, concurrent validity analysis conducted using the Bayley-III as the criterion indicated generally high concurrent validity of the CREDI. In contrast, the concurrent validity of the ASQ-3 with the Bayley-III scale was low, and the concurrent validity of the two indirect assessments, the CREDI and the ASQ-3, was also low. Third, heterogeneous analysis generally showed that the correlation between the CREDI and the Bayley-III was strong before 18 months but relatively weak at 18–23 months, and was moderate after 24 months. In contrast, the correlation in the domain of communication between ASQ-3 and Bayley-III was better within 12–29 months than other age periods. In terms of the correlation between the CREDI and ASQ-3 by age group, within 5–11 months, only the correlation between the CREDI language subscale and ASQ-3 communication was significant and at a moderate level. After 12 months, the correlations between each domain of the CREDI and the ASQ-3 were significant and moderate. In addition, the heterogeneous analysis showed that there are no big differences in the correlations between the CREDI and Bayley-III by caregiver types and household wealth statuses. Finally, OLS analysis showed that, the CREDI was highly consistent with previous widely-used instruments in some key predictors (such as home stimulation) of ECD level. Furthermore, the CREDI was also highly consistent with the indirect assessment, namely the ASQ-3, in some individual and family characteristics (such as the children’s gender and household wealth status).
Compared with previous studies, the current study examined the reliability and validity of the CREDI long form in China, which hasn’t been assessed before. Moreover, the coverage under 3 years of age is extensive and each age group are included except for 0–5 months. Consistent with previous research, the results in the current study suggested that the CREDI can be used as a useful tool to monitor ECD status in impoverished regions of China at large scale. Multivariate regression results are consistent with previous study that emphasizes the importance of home stimulation activities and family economic status. At present, the development of childcare services under 3 years old in China is lagging behind. Systematic and effective childcare policies and services have not yet been formed, and the absence of supportive systems and the shortage of social services are prominent . Especially after the implementation of the universal two-child policy, the establishment of the childcare policy system has attracted unprecedented attention. The information about the ECD status at the population level is the basis for the Chinese government in the implementation of childcare policies and services and the development of more effective intervention strategies. The current study makes implications for the use of the CREDI long form to monitor the ECD outcomes in impoverished regions of China. Besides, this study also indicates that the public services and support provided by the society cannot completely replace the function of the family, and the improvement of family members’ parenting practices (reflected by the home stimulation activities) is conducive to effectively improving the early development of children in poor rural areas. Despite its merits, the current study has several limitations. First, there are some limitations on using the motor and social-emotional subscale with children aged 24–35 months. The possible reason needs to be explored further. Second, there was an issue with collection of concurrent gold-standard measures of child development with which to determine the CREDI’s concurrent validity. Concurrent validity with direct observation, that is, using the Bayley-III, was tested only for just over two hundred children. Because of the small sample, which lacked corresponding representation, the conclusion of the current study cannot be generalized to the whole Shaanxi Province or China. The focus on a single geographic context for the sample also limits the generalizability of these results. Besides, the measure invariance is not assessed at this stage. Future studies should include samples from geographically, linguistically, developmentally, and culturally diverse contexts of China. Third, there was a lack of inter-rater reliability for the study coordinators who administered the CREDI. Although the two local study coordinators who administered the items of the screening tool by verbal interview were fluent in Mandarin, possible communication issues or varying levels of comprehension should be considered, particularly given the wide range in education backgrounds of the caregivers. Lacking of the test-retest reliability was also a limitation of the study, which should be done in our future studies. At last, our lack of a “gold standard” metric against which to compare our social-emotional items limits our understanding of their concurrent validity in the current study. Children aged 0–5 months old were not included in the study, which makes it impossible to verify and analyze the reliability and validity of the scale for children aged 0–5 months.
Providing high-quality ECD services in low- and middle-income countries would require joint efforts from all sectors, effective management, adequate funding, an ample workforce, community and parental collaboration, reliable data systems, continuous monitoring, and evaluation and improvement cycles . It can be concluded from the current study that the CREDI is a feasible low-cost instrument for use in large-scale data collection for early developmental intervention. In China, due to the long-term urban-rural dual economic system and economic development gap, it has been found that there are significant urban-rural differences in early childhood development . As a feasible population-level measurement of ECD, the use of the CREDI long form in China is beneficial to improve ECD outcomes and reduce developmental inequality through national, and regional policies and resource allocation. However, it should be noted the CREDI long Form still lacks the ability to provide information about individual children. It may also not be sensitive enough to detect smaller changes attributable to intervention. The CREDI team also pointed out that in spite of the value of the CREDI long form in intervention evaluation, a more detailed and domain-focused measure should be paired with whenever possible (Please refer to the CREDI User Guide). Besides, given the CREDI is a caregiver-reported scale, using a direct assessment (such as Bayley) as the triangulation of measurement is useful to address potential weakness in one approach versus another. Therefore, there is much more work that needs to be done in the future so that the instrument can be effectively used for population level monitoring and research purposes. The use of the CREDI long form to assess the interventions effects in China also should be evaluated in the future studies.
Availability of data and materials
The data used in the current study is not publicly available because its confidentiality. However, it is available from the corresponding author on reasonable request.
Trial registration: ISRCTN, ISRCTN16736104.
Registered 25 May 2018, https://doi.org/10.1186/ISRCTN16736104
According to the CREDI’s user guide, cultural adaptation of the CREDI is not required. Nevertheless, appropriate translation of the CREDI is critical to its utility and comparability as a population-level measure.
A thorough discussion of the instrument’s construction (i.e., item construction, IRT testing, etc.) as well as the psychometrics of the CREDI short form is provided by McCoy et al. . Details regarding the items developed and the psychometrics of the CREDI long form are introduced in the forthcoming paper by Waldman et al. , which is conducted by the CREDI Team in the School of Public Health, Harvard University, and reveals that the CREDI’s motor, cognitive, language, and socio-emotional subscales developed using a multidimensional framework exhibited adequate internal-consistency and test-retest reliability, as well as sufficient concurrent validity evidence.
Early Childhood Development
Caregiver Reported Early Childhood Development Instruments
The Ages and Stages Questionnaire, third edition
The Bayley Scales of Infant and Toddler Development, third edition
Altafim ER, McCoy DC, Brentani A, et al. Measuring early childhood development in Brazil: validation of the Caregiver Reported Early Development Instruments (CREDI). Jornal de Pediatria (Versão em Português). 2020;96(1):66–75.
Attanasio O, Cattan S, Fitzsimons E, et al. Estimating the production function for human capital: results from a randomized control trial in Colombia. Am Econ Rev. 2020;110(1):48–85.
Bayley N. Bayley scales of infant and toddler development, third edition. Technical manual. San Antonio: PsychCorp, Harcourt Assessment, Inc; 2006a.
Bayley N. Bayley scales of infant and toddler development, third edition. Administration manual. San Antonio: PsychCorp, Harcourt assessment, Inc; 2006b.
Bian X, Chen J, Chai Z, et al. ASQ-3 User’s guide. Shanghai: Shanghai Science and Technology Press; 2013.
Black MM, Walker SP, Fernald L, et al. Early childhood development coming of age: science through the life course. Lancet. 2017;389(10064):77–90.
Bos AF. Bayley-II or Bayley-III: what do the scores tell us? Dev Med Child Neurol. 2013;55(11):978–9.
Denboba AD, Sayre RK, Wodon QT, et al. Stepping up early childhood development: investing in young children for high returns. Washington: The World Bank; 2014.
Fernald L, Kariger P, Engle P, et al. 2009. Examining early child development in low-income countries: a toolkit for the assessment of children in the first five years of life. https://elibrary.worldbank.org/doi/pdf/10.1596/28107.
Gottlieb CA, Maenner MJ, Cappa C, et al. Child disability screening, nutrition, and early learning in 18 countries with low and middle incomes: data from the third round of UNICEF's multiple Indicator cluster survey (2005–06). Lancet. 2009;374(9704):1831–9.
Heckman J. Skill formation and the economics of investing in disadvantaged children. Science. 2006;312(5782):1900–2.
Heckman J, Moon SH, Pinto R, et al. Analyzing social experiments as implemented: a reexamination of the evidence from the high scope Perry preschool program. Quant Econ. 2010;1(1):1–46.
Hill T, Lewicki P. Statistics: methods and applications: a comprehensive reference for science, industry, and data mining. Tulsa: StatSoft, Inc.; 2006.
Hix-Small H, Marks K, Squires J, et al. Impact of implementing developmental screening at 12 and 24 months in a pediatric practice. Pediatrics. 2007;120(2):381–9.
Janus M, Offord D. Development and psychometric properties of the early development instrument (EDI): a measure of children’s school readiness. Can J Behav Sci. 2007;39(1):1–22.
Kelly Y, Sacker A, Schoon I, et al. Ethnic differences in achievement of developmental milestones by 9 months of age: the millennium cohort study. Dev Med Child Neurol. 2006;48(10):825–30.
Kim P, Evans GW, Angstadt M, et al. Effects of childhood poverty and chronic stress on emotion regulatory brain function in adulthood. Proc Nat Acad Sci USA. 2013;110(46):18442–7.
Li Y, Jia M, Zheng W, et al. Status and determinants of early childhood development in poor rural China. J East China Normal Univ (Educational Sciences). 2019;37(3):17–32 (In Chinese).
Limbos MM, Joyce DP. Comparison of the ASQ and PEDS in screening for developmental delay in children presenting for primary care. J Dev Behav Pediatr. 2011;32(7):499–511.
Lowe JR, Erickson SJ, Schrader R, et al. Comparison of the Bayley II mental developmental index and the Bayley III cognitive scale: are we measuring the same thing? Acta Paediatr. 2012;101(2):e55–8.
Lu C, Black M, Richter L. Risk of poor development in young children in low-income and middle-income countries: an estimation and analysis at the global, regional, and country level. Lancet Global Health. 2016;4(12):e916–22.
Luo R, Emmers D, Warrinnier N, et al. Using community health workers to deliver a scalable integrated parenting program in rural China: a cluster-randomized controlled trial. Soc Sci Med. 2019;239:112545.
Luo R, Jia F, Yue A, et al. Passive parenting and its association with early child development. Early Child Dev Care. 2017:1–15. https://doi.org/10.1080/03004430.2017.1407318.
McCoy DC, Peet ED, Ezzati M, et al. Early childhood developmental status in low-and middle-income countries: national, regional, and global prevalence estimates using predictive modeling. PLoS Med. 2016;13(6):e1002034.
McCoy DC, Waldman M, Field Team CREDI, et al. Measuring early childhood development at a global scale: evidence from the caregiver-reported early development instruments. Early Child Res Q. 2018a;45:58–68.
McCoy, D., G. Fink, & M. Waldman. 2018b. CREDI Data Management & Scoring Manual. Available on https://sites.sph.harvard.edu/credi/.
McCoy DC, Christopher R, Sudfeld DC, et al. Development and validation of an early childhood development scale for use in low-resourced settings. Popul Health Metrics. 2017;15(1):3.
Nores M, Fernandez C. Building capacity in health and education systems to deliver interventions that strengthen early child development. Ann N Y Acad Sci. 2018;1419(1):57–73.
Nores M, Barnett WS. Benefits of early childhood interventions across the world: (under) investing in the very young. Econ Educ Rev. 2010;29(2):271–82.
Pisani L, Borisova I, Dowd AJ. International development and early learning assessment technical working paper. Save the children; 2015.
Rubio-Codina M, Araujo MC, Attanasio O, et al. Concurrent validity and feasibility of short tests currently used to measure early childhood development in large scale studies. PLoS One. 2016;11(8):e0160962.
Schonhaut L, Armijo I, Schonstedt M, et al. Validity of the ages and stages questionnaires in term and preterm infants. Pediatrics. 2013;131(5):e1468–74.
Serenius F, Källén K, Blennow M, et al. Neurodevelopmental outcome in extremely preterm infants at 2.5 years after active perinatal care in Sweden. JAMA. 2013;309(17):1810–20.
Shonkoff J, Phillips D. From neurons to neighborhoods: the science of early childhood development. Washington: The National Academy Press; 2000.
Shonkoff JP. The neurobiology of early childhood development and the foundation of a sustainable society. In: Marope PTM, Kaga Y, editors. Investing against evidence: the global state of early childhood care and education. New York: UNESCO Publishing; 2015.
Shonkoff JP, Garner AS, Siegel BS, et al. The lifelong effects of early childhood adversity and toxic stress. Pediatrics. 2012;129(1):e232–46.
Snow CE, Van Hemel SB. Early childhood assessment: why, what, and how. Washington: The National Academies Press; 2008.
Squires J, Bricker D, Potter L. Revision of a parent-completed development screening tool: ages and stages questionnaires. J Pediatr Psychol. 1997;22(3):313–28.
Squires J, Twombly E, Bricker D, et al. ASQ-3 User’s Guide. Baltimore: Brookes Publishing; 2009.
Steenis LJ, Verhoeven M, Hessen DJ, et al. Parental and professional assessment of early child development: the ASQ-3 and the Bayley-III-NL. Early Hum Dev. 2015;91(3):217–25.
UNICEF. The formative years: UNICEF’s work on measuring early childhood development. New York: UNICEF; 2014.
Veldhuizen S, Clinton J, Rodriguez C, et al. Concurrent validity of the ages and stages questionnaires and Bayley developmental scales in a general population sample. Child Dev. 2014;15(2):231–7.
Verdisco A, Cueto S, Thompson J, et al. Urgency and possibility: results of PRIDI a first initiative to create regionally comparative data on child development in four Latin American countries. In: Technical annex. Washington: Banco Interamericano de Desenvolvimento; 2014.
Waldman, M., D. C. McCoy, J. Seiden, et al. 2020. Validation of motor, cognitive, language, and socio-emotional subscales using the caregiver reported early development instruments: an application of multidimensional item factor analysis. Forthcoming.
Wei M, Bian X, Squires J, et al. Studies of the norm and psychometrical properties of the ages and stages questionnaires, third edition, with a Chinese national sample. Chinese J Pediatr. 2015a;53(12):913–8 (In Chinese).
Wei QW, Zhang JX, Scherpbier RW, et al. High prevalence of developmental delay among children under three years of age in poverty-stricken areas of China. Public Health. 2015b;129(12):1610–7.
Wu F, Wang L. Family care arrangements and policy needs of preschool children in China: an analysis based on multiple data sources. Popul Res. 2017;41(6):71–83 (In Chinese).
Yue A, Cai J, Bai Y, et al. Challenges and possible solutions for children 0-3 years old in poor rural China. J East China Normal Univ (Educational Sciences). 2019a;37(3):1–16 (In Chinese).
Yue A, Jiang Q, Wang B, et al. Concurrent validity of the ages and stages questionnaire and the Bayley scales of infant development III in China. PLoS One. 2019b;14(9):e0221675.
Yue A, Shi Y, Luo R, et al. China’s invisible crisis: cognitive delays among rural toddlers and the absence of modern parenting. China J. 2017;78(July):50–80.
We would like to thank the management of the Hupan Modou Foundation, the Parenting the Future Action and Research Center (PARC), and the partners of the Ningshan County Management Center for ECD Services. Furthermore, we thank all of the study staff and the data collection team at the Center for Experimental Economics in Education (CEEE) of Shaanxi Normal University. We thank all of the primary caregivers and their families for participating in our surveys and the intervention programme.
The data collection and analysis of this study was supported by grants from the China Postdoctoral Science Foundation (2019 M663619), the National Natural Science Foundation of China (No.71803108 and No.71703084), the Higher Education Discipline Innovation Project (111 Project, No. B16031), the Fundamental Research Funds for the Central Universities (20SZYB12) and the Zhejiang Hupan Modou Foundation. The funders had no role in study design, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
All study protocols were approved by institutional review boards (IRBs), both at Stanford University (No. 46564) and the West China School of Medicine, Sichuan University, China (No. K2018074). Caregivers provided written consent for their own participation and the participation of their children after a field worker read the consent form out loud and answered any questions. All study staff were trained and monitored in IRB-approved procedures for identifying participant needs.
Consent for publication
Not applicable. All data used in the study has been anonymized.
We declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Li, Y., Tang, L., Bai, Y. et al. Reliability and validity of the Caregiver Reported Early Development Instruments (CREDI) in impoverished regions of China. BMC Pediatr 20, 475 (2020). https://doi.org/10.1186/s12887-020-02367-4
- Early childhood development
- Impoverished regions of China