“Childhood Anemia in India: an application of a Bayesian geo-additive model”

Background The geographical differences that cause anaemia can be partially explained by the variability in environmental factors, particularly nutrition and infections. The studies failed to explain the non-linear effect of the continuous covariates on childhood anaemia. The present paper aims to investigate the risk factors of childhood anaemia in India with focus on geographical spatial effect. Methods Geo-additive logistic regression models were fitted to the data to understand fixed as well as spatial effects of childhood anaemia. Logistic regression was fitted for the categorical variable with outcomes (anaemia (Hb < 11) and no anaemia (Hb ≥ 11)). Continuous covariates were modelled by the penalized spline and spatial effects were smoothed by the two-dimensional spline. Results At 95% posterior credible interval, the influence of unobserved factors on childhood anaemia is very strong in the Northern and Central part of India. However, most of the states in North Eastern part of India showed negative spatial effects. A U-shape non-linear relationship was observed between childhood anaemia and mother’s age. This indicates that mothers of young and old ages are more likely to have anaemic children; in particular mothers aged 15 years to about 25 years. Then the risk of childhood anaemia starts declining after the age of 25 years and it continues till the age of around 37 years, thereafter again starts increasing. Further, the non-linear effects of duration of breastfeeding on childhood anaemia show that the risk of childhood anaemia decreases till 29 months thereafter increases. Conclusion Strong evidence of residual spatial effect to childhood anaemia in India is observed. Government child health programme should gear up in treating childhood anaemia by focusing on known measurable factors such as mother’s education, mother’s anaemia status, family wealth status, child health (fever), stunting, underweight, and wasting which have been found to be significant in this study. Attention should also be given to effects of unknown or unmeasured factors to childhood anaemia at the community level. Special attention to unmeasurable factors should be focused in the states of central and northern India which have shown significant positive spatial effects.


Background
Anemia among children is still a major public health concern in both developed and developing countries. Anemia is a condition in which the number and size of red blood cells or haemoglobin concentration is lower than the established cut-off value [1]. Haemoglobin is essential to carry oxygen and if the body has abnormal or low red blood cells or not enough haemoglobin level, there will be a reduced capacity of the blood to carry oxygen to the body tissues. Globally, anemia affects 1.6 billion people, of which 47.4% were preschool-age children [2]. According to the World Health Organization, (2008), anemia is considered a severe public health problem if the prevalence is 40% or more [2]. In India, 58.5% percent of children between the age of 6 months to 5 years were anemic during 2015-2016 [3]. Moreover, studies have acknowledged high prevalence of anaemia in low and middle-income countries [4], with 67.6 and 65.6% preschool-age children in Africa and South-East Asia suffered from anaemia [2] respectively.
Iron is an essential element of haemoglobin, and iron deficiency is the most common cause of anaemia. However, deficiency in micronutrient-rich diet, Vitamin A, and Vitamin B12 could be the reason for iron deficiency [5]. Also, diseases like diarrhea [6], malaria [7], helminth infection, and hookworms [5] increases the risk of anemia. In India, due to various socio-economic, cultural, and religious beliefs, dietary food habits also vary across the population. Dietary pattern is an essential factor associated with iron intake and absorption. For example, a vegetarian diet may increase the risk of anemia due to lack of iron fortification [8]. Existing literature have also shown that socio-economic factors such as lower maternal education, low economic status [9], and demographic factors such as age and sex of a child [10] affect anaemia. Maternal health status during pregnancy had a significant impact on the health and nutritional status of the child. Evidence from previous studies reported that maternal anaemia, and child nutritional statuses such as wasting, stunting and underweight increase the risk of anaemia [11,12]. During the first 5 years of life, children are most vulnerable to iron-deficiency anaemia because of increased iron requirements due to their rapid growth [13]. Iron deficiency in children is a serious concern because it may increase childhood morbidity, impaired growth development, and have long term effects on cognitive development and school performance [13].
Accounting for geographical heterogeneity of anaemia and the possible cause of heterogeneity is vital for the allocation of health resources to prevent and control anaemia. Geographical heterogeneity can be an effect of an unobserved independent variables which may include contextual factors. According to Koissi & Högnäs, (2013) ignorance of geographical heterogeneity due to unobserved characteristics could lead to biased estimation of parameters [14]. Geographical heterogeneity could be the effect of the unmeasured factors, which means that the geographical differences of factors that caused anaemia can be partially explained by the variability in environmental factors [15]. Environmental factors such as availability of toilet facility, type of house, source of drinking water, seasonality influence the risk of anaemia among children. Studies found that lower odds of anemia among children living in household with better toilet facility, improve drinking water and better housing condition [16]. Malaria which causes anaemia is known to be associated with altitude and weather conditions such as temperature and rainfall [17]. Similarly, soil-transmitted helminth infection, which causes anaemia is influenced by the distance to water bodies, surface temperature, index of vegetation and rainfall [18]. There are number of studies using different statistical models such as multilevel and spatial mixed model to determine the effect of geographical heterogeneity on childhood anaemia in India [9,10]; however, all these studies have overlooked the advantage of using bivariate spline in modelling geographical heterogeneity. Above models failed to explain especially the non-linear effect of continuous covariates on childhood anaemia. Thus, the pioneering contribution of this study would be to explore correlated spatial effect of anaemia among children aged 6 to 59 months using the spatial mixed model by assuming the flexible approach of bivariate splines. This study would probably be the first in India to map childhood anaemia in terms of residual spatial effects due to unmeasured factors. So, the map would have important implications for targeted policy for allocation of resources and to search for unmeasured variables that are responsible for residual spatial effects.

Study area
The study used the fourth round of the Indian National Family and Health Survey (2015-2016) which adopted a multi-stage stratified cluster sampling design [19]. From all over India, total of 699,686 eligible women between 15 and 49 years of age completed the interview. The data for the present study uses child as the unit of analysis, rather than the mother. Information was available on 259,627 children born in the last 5 years preceding the survey. The present study excluded the two union territories i.e., Andaman & Nicobar Island and Lakswdeep as their borders are not connected to other parts of India and which would create problem in the estimation of spatial effects. Children with missing haemoglobin level were also dropped from the analysis. With this criterion the final analytical sample size consists of children 208,707.

Outcome variable and covariates
The outcome variable used in the analysis was based on the categorization of haemoglobin level of children adjusted for altitude. The children whose haemoglobin level was less than 11Hb categorised as being anaemic otherwise not anaemic. The covariates in the present study were selected based on previous study [15] and theoretical understanding of the issue under investigation. As such, mother's educational level, anaemia status, age and duration of breast feeding are considered as covariates. Children related characteristics considered are whether children had cough, had fever, received vitamin A, whether stunting, wasting, underweight, birth weight, birth order, and age of the children. Further, household wealth index and family size are included in the study. Duration of breast feeding, age of children, and mother's age were treated as continuous variables. However, the standard -2SD cut off values of z-scores categorization of height for age, weight for height, and weight for age were used to characterize stunting, wasting and underweight respectively.

Statistical analysis
Multiple logistic regression model was employed to select potential covariates for childhood anaemia prior to spatial analysis. A significance level of 20% was set for the selection of potential covariates to allow for selection of more variables to be used in the further analysis of spatial modelling. Geo-additive logistic regression models were fitted to the data to understand fixed as well as spatial effects of childhood anaemia. Basically, the model takes the form of a multiple variable hierarchical model as Where g is the logit link function which gives the log odds of being anaemic and it links the mean of the response to the predictor Uβ + e i , and e i is the area level random effects representing unmeasured contextual factors. More formally, we can formulate the above model as, if p ij is the probability that child j from location i being anaemic, then child anaemic status which is binary is distributed as Bernoulli(p ij ). Then, following models were fitted to estimate fixed and spatial effects.
All categorical and continuous variables were treated as fixed effects in M0. In case of M1, categorical variables were employed as fixed effects and continuous variables were modelled by non-parametric smooth functions f j s. Model M2 included a spatial effect of the state where a child belongs in addition to the fixed effects of categorical variables. Finally, M3 was a combination of M1 and M2. The smooth functions f j s were specified as Bayesian splines and can be approximated by polynomial spline priors of degree l at equally spaced knots u min j = γ j0 , γ j1 , . . . . . . , γ js = u max j which are within the domain of covariate u j , and the spatial component f spatial (S i ) with Markov random field prior [20,21] which captures the area of the child random effect. The Bayesian spline can be expressed as a linear combination of d = s + l basis functions B m having the form as, Then, the Bayesian estimation of the above spline reduces to estimating model parameters ε j s by assigning first or second order random walk priors for the regression coefficients. A tensor product of two-dimensional spline has been used to model the spatial effect as, where, the combination (u 1 , u 2 ) corresponds to the coordinates of the location of the data point, latitude and longitude, or the location centroids based on the map. The commonly available spatial smoothness priors in spatial statistics [22] based on the four nearest neighbours have been adopted.
A fully integrated Bayesian approach was adopted to estimate the parameters and the estimated posterior odds ratio (OR) can be interpreted as the odds ratio from the logistic regression models. The models were fitted using the freely available package bamlss [23] in R (R Core Team, 2020). A total of 40,000 MCMC iterations and 10,000 number of burn in samples were used in the analysis. Convergence of models were checked through autocorrelations and sampling paths. Finally, models were compared by Deviance Information Criterion (DIC) values [24], where the model with the smallest value is the preferred one. The DIC is calculated as DIC = D + p D , where D is the posterior mean of the model deviance, which gives a measure of goodness of fit, and p D is the effective number of parameters describing the complexity of the model and controls for penalty for model overfitting.

Descriptive results
states in the north-eastern region show comparatively low prevalence of anaemia ranging from 24 to 57%. The states of Karnataka and Telangana show relatively high prevalence of anaemia above 60%. The overall prevalence of anaemia in India is about 58%. Table 2 provides a comparison of childhood anaemia across categorical covariates and a test of significance difference between categories of each covariate by chisquare test. It is evident that children from rural, mother with low education, household of poor economic condition show higher prevalence of anaemia than their respective counterparts. There is a clear significant difference in childhood anaemia by place of residence, mother's education and household wealth. But no significant difference in childhood anaemia by sex of child is observed. Children with fever show a tendency of higher prevalence of anaemia. It can also be seen that consumption of vitamin A supplement during childhood is helpful to reduce prevalence of anaemia. Under nutrition of children also show an increase in prevalence of anaemia. At 5% level of significance the categorical variables-place of residence, mother's education, mother's anaemic status, household economic status, children's fever, vitamin A, stunting, wasting, and underweight are associated with childhood anaemia without controlling for other covariates. The categorical variables children's birth order, children's birth weight and household size show a non-significant effect on childhood anaemia at 20% level of significance in the preliminary analysis. Therefore, only categorical variables listed in Table 2 are included in the spatial logistic regression model in Table 4.

Model selection
The selection of the most preferred model is based on the deviance information criterion (DIC) and deviance values. Model with the smallest values of DIC and deviance is the preferred model. With these criteria, model M3 is the preferred model (Table 3). Therefore, interpretations of results (Table 4) and discussions are based on model M3. Table 4 shows fixed effects to childhood anaemia. Place of residence, mother's education, poorest, rich, richest categories of household wealth, fever, cough, child under nutrition and mother's anaemic status are fixed effects variables which are significant to childhood anaemia. The fixed effects coefficient for fever is positive, which indicates that children with fever are likely to increase the risk of childhood anaemia. Children who take vitamin A supplement decrease the likelihood of becoming anaemic. Children from rich or richest quintile of household wealth also have lesser risk of childhood anaemia than those who belong to poorest quintile. Children who are malnourished increase the risk of childhood anaemia. Mother's anaemic status has a positive effect on childhood anaemia. This means    that children whose mothers are anaemic have higher risk of being anaemic than those whose mothers are not anaemic.

Non-linear effects
Another reason behind the geo-additive modelling is the ability to incorporate non-linear effects of continuous variables in the model. In the present study, we incorporated non-linear effects of age of child, mother's age and, duration of breast feeding. The age of children has non-linear effect on childhood anaemia (Fig. 1). It is evident from Fig. 1 that as the age of children increases, its effect on childhood anaemia decreases, which indicates, older children are less likely to have the risk of childhood anaemia. The risk of having anaemia is much higher among younger children aged about 6 months to about 15 months and decreases thereafter.
Mother's age also has a non-linear effect on childhood anaemia (Fig. 2). The functional relationship between childhood anaemia and mother's age depicts almost a U shape pattern. This indicates that mothers of young (in particular mothers aged 15 years to about 25 years) and old ages are more likely to have children who are anaemic. The risk of childhood anaemia starts declining after the age of 25 years and continuous till the age of around 37 years, thereafter again starts increasing. Figure 3 shows the non-linear effects of duration of breast feeding on childhood anaemia. The risk of childhood anaemia decreases till 29 months, thereafter increases. This indicates improvement in childhood anaemia with increase in duration of breast feeding. The credible intervals are wider at extreme ages because of small cases of observations. Figure 4 displays the estimates of the spatial effects of childhood anaemia, with colour range goes from blue to red representing low to high risk of childhood anaemia. Spatial effects represent unobserved influences, such as environmental and climatic factors, availability of good transport facility, and access to good services for child health. The figure clearly shows evidence of residual spatial effects of childhood anaemia in India with most of states showing significant positive/negative effects with respect to the 95% posterior credible interval map (Fig. 5). With respect to 80% posterior credible interval more states show significant spatial effects (Fig. 6). Most of the states in northern and central regions show significant positive spatial effects with respect to 95% credible interval. However, almost all states in north-eastern region of India show significant negative spatial effects with regard to the 80% credible interval (Fig. 6).

Discussion
In India Childhood anaemia cuts across all the sections of society with varying intensity. Its prevalence, as per the WHO classification, is a severe public health problem for India. Except for Mizoram, Manipur, Nagaland, Assam, and Kerala for all the states and union territories (UTs,) anaemia is a matter of concern, whereas for states like Haryana, Jharkhand, and Madhya Pradesh it is of extremely serious concern. These three states need to revisit existing programs targeting to address the child health in general and anaemia in particular. Anaemia has a close link with the food habit. Food habit is closely associated with culture and the nature. Geographical settings decide the nature of food supply and the micronutrients. Within the same geographical settings culture may encourage or discourage some group of population to consume or avoid certain nutritious food. For example tribal culture of northeast India approves consumption of varieties of insects, whereas for non-tribals consumption of such insects is considered as taboo. Probably because of this reason the tribal dominated states like Mizoram, Manipur, and Nagaland have very low prevalence of anaemic children. However, our finding contradicts other studies in India that children from lowest socioeconomic strata have more likelihood of suffering from anaemia [9,25] and Nepal [26].
The prevalence of anaemia among children in rural areas is comparatively higher than their counter part in India. Rural mass in India might be less aware about the balanced diet which has potentials to improve the hemoglobin count. Because, as high as one third of rural population in India are illiterate. Ignorance of food items relating to iron content food staff may also add to the problem of anaemia in rural areas. This indicates that mass media campaign to address anaemia should emphasize on pictorial depiction and or audiovisual means, rather than on the written leaflets. A distinct negative relationship between wealth quintile and child anaemia is quite evident. This is indicative of the fact that economically poorer households may not be able to afford to procure food regularly and especially the nutritious food times. This calls for better Public Distribution System (PDS) which provides subsidized Uneducated mothers are less equipped with knowledge of hygiene and proper knowledge of child care. Unhealthy feeding habit can lead to various types of food related health problems. Feeding practice is closely associated with diarrhoeal disease and studies exhibit that there is positive relationship between diarrhoea and anemia. Unlike earlier studies [8,10] no significant association is noted between sex of the child and prevalence of anaemia in the present study. Children who take vitamin A supplement decrease the likelihood of becoming anaemic. But earlier study [8] did not find significant statistical association between vitamin A intake and childhood anaemia. In India, poor and illiterate families leave their baby on the mud floor. The crawling baby in absence of a care taker may put to mouth anything it comes to her/his hand. Such activities may lead to various infections and morbidities, for which younger children have more likelihood of suffering from anaemia. Other studies also indicate that younger children have more chances of having anaemia [15,26]. Very young mothers definitely are less educated and relatively old mothers might take child rearing for granted, as they may already have older children and experienced of child rearing. Other study also indicates U-shape relationship between mother's age and the childhood anaemia [15] and others [10,27] found children born to young mothers are more likely to be anaemic. In India usually the educated and rich women, due to various reasons, do not practice exclusive breast feeding. Exclusive breast feeding in India is usually practiced among the less educated and poor women, as a result a positive association between exclusive breast feeding and childhood anaemia is observed. However, this finding contradicts studies conducted elsewhere [28].

Limitations
The present study is not without any limitation despite using an innovative statistical technique. First, our study is based on cross-sectional design. Therefore, control of major confounders and no causal inferences can be made in spite of robustness in the analysis. Second, the study uses only relevant variables in our data set leading to

Conclusions
There is strong evidence of residual spatial effect to childhood anaemia in India. Government child health programme should gear up in treating childhood anaemia by focusing on known measurable factors such as mother's education, mother's anaemia status, family wealth status, child fever, stunting, underweight, and wasting which have been found to be significant in this study. Attention should also be given to effects of unknown or unmeasured factors of childhood anaemia at the community level. Special attention to these unmeasurable factors should be focused in the states of central and northern India which have shown significant positive spatial effects. As the problem of anemia is multi-faceted, the Anemia Mukt Bharat strategy adopted under Poshan Abhiyaan shows great hope in bringing down the prevalence of anemia in India by adopting 6x6x6 strategy [29]. The strategy of targeting six groups of population, six interventions, and six institutional mechanisms is very fascinating but only time will tell its success.