Comparing apples and pears: misleading conclusions about the population mental health impact of a parenting programme, a commentary on Marryat, Thompson and Wilson (2017)

Background The article by Marryat, Thompson and Wilson (2017) in BMC Pediatrics presents an evaluation of the implementation of the Triple P system as a public health intervention conducted by the Glasgow City Council and NHS Greater Glasgow and Clyde. Discussion Unfortunately, the conclusions drawn are questionable for multiple reasons. The lack of a controlled design precludes defensible conclusions about intervention effects free from routine threats to internal validity. There was a substantial mismatch between the intervention sample and the population sample assessed. The article’s title and abstract leave readers with the mistaken impression that the children assessed for outcome were suitably representative of intervention families, when in fact many of the children in the intervention families were missing from the teacher-report outcome assessment (a single questionnaire), and many or most of the children in the teacher-report outcome assessment belonged to families who had never received the intervention. Although Triple P targets parent-child relations and child behavioural and emotional problems at home, Marryat et al. narrowly defined mental health impact as child difficulties in nursery or preschool, while not reporting data from practitioners and parents in the same evaluation that did not support the authors’ conclusion. The paper was further diminished by a number of misleading statements and factual errors related for example to other research on Triple P. Summary Studying the extent to which child mental health functioning at home can generalise to school settings is an important topic of inquiry in relation to parenting support interventions, but unfortunately the Marryat et al. article did not move this area forward.


Background
The Triple P -Positive Parenting Program (Triple P) [1] is a multilevel system of parenting support designed to prevent and treat child social, emotional and behavioural problems. The Triple P system involves a population health approach to parenting support, and in recent years has been evaluated at a population level in various locations including the United States [2], Australia [3], and Ireland [4]. The population approach involves universal access to evidence-based parenting support through a mix of prevention, early intervention and targeted intervention options, with the aim of providing parents with the minimal amount of support they need.
A recent paper published in BMC Pediatrics [5] reports on a city-wide implementation of Triple P in Glasgow, Scotland. We applaud the authors for conducting an independent evaluation of the Triple P system, and endorse the value of independent evaluations of parenting support interventions. A healthy mix of independent and developer-led evaluations is vital for the ongoing refinement and dissemination of rigorous, evidence-based practice in the field of parenting support interventions. We also welcome the focus on generalisation of the effects of parenting programs to other contexts, as this has important implications for the range of services that are offered to communities. However, following a careful review of the methodology and findings reflected in this paper, we have concerns relating to the validity of inferences drawn by the authors, namely that "no convincing evidence of benefit for preschool aged children's mental health problems" was found from the initiative. We believe this conclusion is untenable due to the methodological, conceptual, and measurement limitations of the uncontrolled study, which we detail below. There are also numerous misleading claims and factual errors throughout, which further undermine the paper's validity.

Uncontrolled design
The Marryat et al. [5] design neglected to include a viable control condition. The lack of a control or comparison group precludes any conclusions about program effects that are free from routine threats to internal validity. Although briefly mentioning this as a weakness, the authors maintain that prior survey data make a control condition unnecessary, and support this notion by pointing out sampling and other design challenges encountered in prior experimental studies. Challenges encountered in prior controlled studies do not mitigate the absence of a control group in the Marryat et al. (2017) study. This methodological limitation is further compounded by the fact that delivery of Triple P had already started in the target city prior to the launch of this study, further suggesting the importance of including a nointervention control condition. The authors claimed that it is highly unlikely the prior delivery of Triple P affected baseline data, but they provided no evidence for this assertion.

Mismatch between intervention and outcome measurement age
One of the most serious methodological problems with the article has to do with the age range of the children assessed. The article fails to make it clear to the reader that there is a substantial mismatch between the intervention and the outcome measurement with respect to child age range and sampling. Parental participation in the intervention (i.e., the independent variable) targeted children 2-16 years of age, while the teacher-reported outcome variable (i.e., the dependent variable) assessed 4 and 5 year-olds. The article provided detail about the number of families participating in the intervention, and briefly mentioned that a substantial proportion (i.e., 40% or more) of the families that received parenting services had children that were too old to be picked up by the outcome measure. The 40% figure refers to the percentage of children who were older than age five at the time their parents participated in the intervention. However, the implications of attempting to detect population-level impact by assessing a diluted, marginal sample of those who actually received the intervention have not been discussed.
Related to the age and sample mismatch issue, many of the families receiving the parenting intervention did not have a child within an eligible age-range for inclusion in the teacher-report outcome assessment, and thus were not represented in the evaluation of the intervention. Likewise, many of the children assessed via teacher report were from families who had never received the intervention. No data were provided regarding the proportion of children assessed that had a parent who had participated in Triple P. This omission, along with the aforementioned lack of control condition, makes it impossible to calculate common indices representative of population-level impact, such as risk ratios or numberneeded-to-treat.

Narrow focus for assessing mental health impact
Triple P aims to reduce child behavioural and emotional difficulties through the mechanism of promoting change in parenting practices, and thus the primary focus is on producing change in the family context. Although Marryat et al. [5] claimed to evaluate the mental health impact of Triple P, they presented data related only to child difficulties at school (in this case, the nursery or pre-school context) via a routinely collected teacher-report questionnaire, the Strengths and Difficulties Questionnaire (SDQ) [6]. This narrow focus is important because (a) changes within the school setting are not the primary target of the Triple P intervention, and (b) the aims and conclusions outlined by the authors do not align with the actual data reported.
The impact of family-based interventions like Triple P on school adjustment is an important research question, and one that would be reasonable to explore. We might anticipate that significant improvements in child mental health or behavioural difficulties seen within the home context might also be seen at school, particularly if the child has significant difficulties at school in the first place. However, reliance on teacher-report data as the sole indicator of population-level impact on child mental health is seriously flawed. First, there are generally low levels of concordance between teacher and parent reports regarding child difficulties, with often only modest correlation (e.g., < .30) between parents and teachers as informants, and teachers typically reporting fewer problems overall (e.g. [7,8]).
Teacher report cannot be used as a proxy for parents' experiences with their children at home or for parental reports on children's mental health status. To support the decision to present only teacher data, the authors claimed that reliance on parental report can be problematic due to its potential to introduce a measurement confound with parent's mental state, however no evidence is presented that teacher-reported data provides a more realistic or reliable indication of children's mental health or difficult behaviours than parent-reported data. Teacher-report data can provide a valuable contribution within a multi-informant approach to understanding the broader impact of a parenting intervention such as Triple P, yet as with any single-informant approach to data collection, findings should be framed within the confines of the extent to which these are generalisablein this case, teachers' views of child behaviour within the preschool setting posited to generalise to the home, and children's general mental health. We acknowledge that issues of pragmatism can preclude the collection of data from multiple sources, but the authors failed to acknowledge this limitation. The result, unfortunately, took the form of over-reaching and inappropriately generalised conclusions regarding the population-level impact on mental health.

Selective reporting
Original data from the final report of the Glasgow Parenting Support Framework Evaluation [9] included parent-reported outcomes for pre-school children using the SDQ. Although not aggregate-level data, these data showed positive outcomes for Triple P when parents completed the program. However, these findings and other qualitative data from practitioners and parents have been ignored, which is problematic because reporting the full pattern of findings would have given the reader a more complete understanding that perhaps would have contradicted the authors' stated conclusions. Furthermore, the authors claimed there were no changes in social outcomes for children, but they only examined the Conduct Problems subscale when analysing the SDQ data, and not other SDQ subscales. The authors directed the reader to Additional File 4: Table S1 for further information regarding the pattern of differences in subscales other than the Total Difficulties score and Conduct Problems subscale, yet this table includes mean differences on only SDQ Total score and no individual subscale information. Subscales are plotted individually in Additional File 3: Fig. S1, however the lack of accompanying statistical information hinders any substantive interpretation of the data.

Factual errors and misleading statements
The authors' claim that independent observers do not generally report positive findings is incorrect. Sanders, Kirby, Tellegen, and Day [10], who conducted the most comprehensive meta-analysis of 101 studies on Triple P, found significant intervention effects across 21 studies reporting observational data on child behavior, including both prevention and treatment studies, with an average effect size (Cohen's d) of .50. Similar positive effects on observational measures of child behavior were found in a more recent meta-analysis of Stepping Stones Triple P by Ruanne and Carr [11], who reported an average effect size of .51. The reported effect sizes for independent observational measures do not align with Marryat, Thompson and Wilsons's claim of no impact.
The authors claimed that Triple P has little effect in deprived communities. This claim ignores studies showing that socioeconomic status does not moderate effect sizes for child outcomes in Triple P studies [10] and the mounting evidence that Triple P works well in low resource communities (e.g. [2,4,12]). There have since been a number of high quality studies showing the value of Triple P in a range of disadvantaged communities. Examples include: a place-based randomised trial of the Triple P system in the US showing population level effects on child maltreatment in communities with substantial representation of disadvantaged families [2]; an RCT of low intensity Triple P Discussion Groups in Panama showing positive effects on child and parent outcomes with parents in deprived communities [12]; an RCT of Triple P Discussion Groups with a Maori indigenous population in New Zealand [13]; evaluations of Group Triple P with Aboriginal and Torres Strait Islander samples in Australia [14]; and a trial of Triple P Online with vulnerable disadvantaged urban mainly African American and Latino families in Los Angeles [15]. Qualitative studies showing high levels of consumer acceptance of Triple P principles and techniques have been conducted with homeless parents [16], vulnerable low income families involved with child protective services [15], and women in shelters who have histories of domestic violence [17]. Fives et al. [4] reported that many participants in the Ireland population roll out of Triple P were low SES (39% of Group Triple P participants, 33% of workshop participants, and 26% of seminar participants had a medical card, a key indicator of low SES). Contrary to Marryat et al.'s conclusion [5], Triple P has been found to be a promising intervention with many vulnerable, socially disadvantaged parents.
The paper also raised concerns about the costs of Triple P without defining the costs or placing the costs in perspective relative to not intervening, or the costs of other intervention strategies. Serving 10,000 families ostensibly costs more than serving 100 families, but the key metric would be the per-family cost, which the article ignored in making a general pronouncement (i.e., "consumes substantial resources"). It did not discuss the potential cost savings of brief, early, minimal intervention, or the mix of varied delivery formats, for example, the cost saving in offering group programs serving several families in the same amount of staff time as conducting individual sessions.
The paper failed to take into account that during the intervention period in the same catchment area other parenting interventions were also being supported and implemented concurrently, albeit on a smaller scale. This again highlights the need for control data to allow suitable comparisons to support conclusions around population-level impact of any universal prevention or public health initiative.
Finally, there are some major errors in the article. Firstly, it reports null results of "a recent cluster randomized control trial exploring the impact of Triple P levels 2 and 3 on pre-schoolers' externalizing behaviours and parental mental health". The references cited relate to Hiscock et al. [18], a study that was not a Triple P intervention, and Malti et al. [19], a study that tested one level of Triple P (Level 4 Group). Similarly, the article refers to Prinz and Sanders [20] in relation to "previous work in which no significant improvement in childbased outcomes resulted from a public health parenting programme" which is an incorrect citation-the article cited is a theoretical piece about population-level interventions and does not include an evaluation nor any discussion of child outcome results. The authors also cite a study reporting a subgroup analysis focusing on lone parent families that showed no benefit from the Triple P intervention [21]. It is true the study reported no group difference between intervention and control parents around parenting and child behaviour based on selfreport data. However, independent clinical observations reported within the same paper showed significant improvements in positive parenting behaviour and decreases in negative child behaviour for the intervention group. We find it curious that this finding was omitted from the authors' discussion, particularly considering it reports data from an independent source which would seem of relevance given the prior arguments made by the authors.
The paper claimed to have registered the study protocol, yet the reference list only cites a University of Glasgow webpage for a description of the protocol, no trial registration number. Furthermore, the protocol as described is significantly different from the primary findings reported in the paper or the final evaluation report.

Measurement problems
The study had a number of measurement problems. First, one of the primary outcome measures was a modified version of the Conduct Problems subscale of the SDQ. Using only three of the original five items for this scale resulted in a modified version that had low internal consistency (α = 0.66), which is below the commonly accepted threshold (0.7) for scientific acceptability, and which relied on a questionably small number of items (three). Additionally, they use a weighted procedure to compute an average score for this modified subscale, and then applied the standard cut-off levels intended for the full subscale. Given these measurement issues the validity of this scale as a primary outcome variable is uncertain and highly questionable.

Conclusions
Overall, while an independent evaluation of a complex community-wide intervention such as that undertaken in Glasgow is welcome, the capacity to learn from the present evaluation is diminished by methodological, interpretational and factual issues and errors. Given the absence of a proper control or comparison group, and in light of the substantial mismatch between the intervention sample and the outcome measurement age group, rather than sweeping claims of "No evidence of whole population mental health impact," the scientifically justifiable conclusion is one of uncertainty. It is not possible to assert with any confidence that the observed data reflect a true test of intervention impact (i.e. behavior assessed in the preschool setting, with assumptions unable to be drawn about the home setting) or an inadequate or limited test of intervention impact (i.e. questionable measurement validity and a lack of suitable control condition). This article provides further support for the pressing need for the field to develop accurate and scalable measurement procedures to test population effects for public health interventions.
The ongoing delivery of Triple P in Glasgow is viewed by the NHS as part of a long-term strategy and it was expected to take several years for any new programme to be properly established in practice. In a city with Glasgow's levels of poverty and deprivation, health visitors implementing the programme have spent time engaging parents and helping them understand the need and benefit of parenting support. As expected, there have been many learnings over the years since Triple P was first introduced, including the need for dedicated practitioners within health visiting teams to run parenting groups and to establish strong links and partnerships with the voluntary sector to further improve engagement. Although positive outcomes have been achieved with many individual families who completed the programme, sustained implementation of Triple P requires a quality improvement framework that has been adopted by the implementation team in Glasgow. This involves applying learnings from implementation science, large-scale rollouts of the Triple P system (e.g. [4,22]), consumer and end user feedback from parents and practitioners, and outcome data collected as part of routine implementation. The ultimate aim is to continuously improve fidelity of delivery, minimise drop out and increase the reach and impact of the intervention.

Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Competing interests
The

Louise Marryat, Lucy Thompson and Philip Wilson
In the above correspondence, Sanders et al. have commented on our paper [5] which reported a lack of impact of whole-population implementation Triple P in Glasgow City. Sanders et al. consider the findings not proven due to 'methodological, interpretational and factual issues and errors'.
The main criticisms are: 1) The design neglected to include a control condition 2) The mismatch between the age of the target child for the intervention and the age of population assessment 3) The focus on the outcome as childhood mental health difficulties being too narrow 4) Selective reporting of outcomes 5) Factual errors and misleading statements 6) A number of measurement problems We strongly reject the contention that our conclusions were not justified by the evidence. Whilst our research has weaknesses, as with any evaluation carried out in the real world, these were clearly set out in the original paper, and do not alter our overall conclusions. We now discuss each of the arguments in turn.

Assertion 1 the design neglected to include a variable control condition
It is true that the study design did not have a control group. The authors acknowledged this as a potential weakness in the original manuscript. The resources available to us, administered through NHS Greater Glasgow and Clyde (NHS GGC), were insufficient to meet the cost of a control group, and data collection had not begun at the start of the intervention, so no pre-intervention comparison group was available.
The study design was subject to a review process. The evaluation steering group, which included NHS members, sent the protocol out to external peer review by Warwick Medical School, who provided very strong support for our design.
The conclusions reached in the Glasgow evaluation are supported by the only other published independent UK evaluation of Triple P, a randomised controlled trial showing no effect from Triple P interventions [23].
Assertion 2 the mismatch between the age of the target child for the intervention and the age of population assessment This criticism refers to the fact that the six years' of population outcome measures of child mental health difficulties were assessed at age 4-5 years, whereas the group Triple P interventions were delivered to parents of children of a range of ages, many of whom were over that age. This argument has little relevance to our conclusions. First, group Triple P was only one part of a population-level programme. Triple P International, in its tender documents shared with NHS GGC in 2010, considered the level of reach of the programme was sufficient to produce a whole-population effect and that effect should have been seen at any and all ages of children. Second, the Glasgow Triple P programme included a city-wide media campaign involving television, newspaper and billboard posters aimed at all parents as well as a universal seminar programme. It is difficult to believe that any Glasgow family was not exposed to at least some of these Triple P materials. Third, parents nominated only one of their children as the index child when attending groups, and it was this child for whom the age was recorded in our process evaluation. Many attending parents would have had younger children in the family who would have been affected by the Triple P programme if it had been effective. Finally, we reiterate that the overall intensity of intervention was at least as high as that reported in previous non-independent studies claiming positive results [2,3].

Assertion 3 the focus on the outcome as childhood mental health difficulties being too narrow
The Triple P programme claims to 'preventas well as treatbehavioral and emotional problems in children' [Triple P website, accessed 5th March 2018]. This is considered by the developers to occur through changes in parenting practices. Sanders et al. point out that our paper only reports behavioural and emotional problems within the school context, as reported by teachers, rather than the home context, as reported by parents. As the original paper explains, a multi-informant approach would have been desirable but no resources were available for this.
Parental reports of children's mental health are strongly influenced by the parent's own state of mind [24,25], and parental reports of depression, anxiety and stress over the course of a group Triple P intervention generally show improvements in mental state [10].
Given the overwhelming balance of evidence that independent observers fail to report any impact of Triple P on child behaviour [26], and that parental (usually maternal) mood is improved by group attendance, the most parsimonious explanation is that Group Triple P does not have an impact on child behaviour; it simply enables attending parents to think that their child is behaving better. This is clearly a desirable outcome but it is difficult to see why any service commissioner would consider investing in an expensive programme with such limited impact. Further independent research would be valuable in this area.

Assertion 4 selective reporting of outcomes
Sanders et al., criticise our paper for not presenting the parent reported child mental health outcomes collected during the course of the intervention. These data were only available for a relatively small number of families who completed interventions. Aside from the problems expressed in the previous section about parent-reported child mental health outcomes, the purpose of our paper was to assess whether there had been any effect on population-level child mental health difficulties, which the population level Triple P programme purports to produce. These impacts were not found over six years of collected data.
Sanders et al. are correct in that, of the 44.1% of parents who completed a group Triple P intervention once they had enrolled in the programme, mean overall child mental health difficulties were reported by this relatively small number of parents to fall to a modest extent from 15.8 to 12 (on a scale of 0-40, n = 366). Parents who completed a group Triple P intervention, however, had children with lower levels of difficulties at the start of the intervention, were more affluent and better educated than those who failed to complete it, suggesting that the parents with children most in need of the intervention would not be receiving it in full [9]. Given the low numbers and completion levels of the more intensive strands of Triple P which do show parental-reported positive impacts in Glasgow city, we would not expect to see an overall impact on population level child mental health difficulties, even if we had surveyed the population of parents in Glasgow, as opposed to just teachers. We would welcome the publication of the qualitative element of the evaluation which gives some insight into parental perceptions of the programme.

Assertion 5 factual errors and misleading statements
Sanders et al. disagree with our statement that 'Some doubt has been expressed about the effectiveness of Triple P in deprived communities' [5]. This statement referred to the meta-analysis of Triple P by Thomas and Zimmer-Gembeck (2007), which concluded that 'Due to the high number of Triple P studies in the meta-analysis with middle or higher SES, it is not certain that findings can be generalized to low income or high risk groups at this time.' [27]. This complements our own findings that families from lower SES groups were less likely to complete Triple P interventions [9].
Sanders et al. raise concerns about the lack of information about the cost of the Triple P programme in our paper, despite this being raised as a concern. We requested these cost data formally from NHSGGC, in 2015, and were told that this information was not available. An estimate that we consider very conservative, owing to the lack of inclusion of staff costs in programme delivery, was given in The Times at £4 million [28]. We would welcome the publication of comprehensive cost data for Triple P in Glasgow city.
Sanders et al. assert that our paper 'failed to take into account that during the intervention period in the same catchment area other parenting interventions were also being supported and implemented concurrently'. From our own knowledge, the only other major parenting programme with significant levels of delivery, the Family Nurse Partnership, began a pilot implementation in Glasgow City in April 2012 [29] with the first cohort of mothers and children completing the programme in late 2014, after the evaluation of Triple P concluded and with first-time parents of children much younger than preschool age so none of these families would have been included in our analysis. There were a few other parenting programmes operating in Glasgow (e.g. Mellow Parenting, the NCH programme and Incredible Years), however these were operating on a very small scale indeed, not at whole population level, and highly unlikely to have affected overall population level mental health.
The response correctly points out that three crossreferences have become misaligned in the final version of the bibliography. We apologise for the typographical errors, but our conclusions are not in any way altered by them and we are happy to offer a corrigendum.
Sanders et al. are mistaken in their claim that our paper says that the protocol was registered: we stated that 'The protocol for this study was published in 2010' and a link is provided to the published version. As the evaluation was not a randomised trial, there was no mechanism to formally register it at the time.

Assertion 6 a number of measurement problems
Despite Sanders et al. claim that there are 'a number of measurement problems' , they list only two linked difficulties, which were discussed in the original paper. The change in use of the Conduct Problems version was less than ideal but was necessary. It was carried out in response to nursery staff who were finding the 4-16 year old version inappropriate for some preschool children [30]. The alpha measuring internal consistency dropped from .71 to .66, falling just short of the normal level which is seen as acceptable, however one aspect of the creation of this measure is the number of items in the scale, so this fall may simply be a reflection of the change from 5 items to 3 items in the scaleall of which was discussed in the original paper. We believe that the 'weighting' which Sanders et al. refer to is the averaging of the 3 items and then multiplying by 5 in order to create a comparable score. In the original documentation and code produced by Youth in Mind, the organisation hosting the SDQ, the averaging and multiplication of 3 items in a normal 5 item measure (where two items within the scale were not completed or incomprehensible, for example) is permitted, so this is no different from usual accepted practice [31]. Sanders et al. state inaccurately that this calculation makes the overall results 'highly questionable' , however, the conduct problems scale is only one of four scales used to form the Total Difficulties score, and none of the other scales showed improvements over time either.

Conclusions
The authors strongly reject the contention that our original conclusions were not justified by the evidence. The original paper set out the weaknesses in the study design, however, our methods were robust and would have shown a positive impact of Triple P on the mental health of the population of children in Glasgow city, should there have been one.
We did not agree to perform retrospective subgroup analyses that would show the Triple P interventions in a more favourable light. We consider that independent evaluations should be carried out without the influence of programme developers in order that replicability of results can be truly established.
There is some continuing small scale activity including an independent trial comparing antenatal Triple P with another parenting programme and a highly targeted and structured programme offering level 4 Triple P to a small number of families, but the whole population approach has undoubtedly been abandoned.
Ethics Approval and Consent to Participate Not applicable.