The present study reanalyzed the strongest causal evidence against customary spanking[1] in order (a) to investigate which alternative forms of discipline would reduce antisocial behavior more than spanking, and (b) to determine whether these apparent causal effects could be attributed to residual confounding due to a selection bias. The results varied somewhat by whether the analyses considered one disciplinary tactic at a time in the full sample, whether the analyses were limited to a misbehaving subsample, or whether the tactics were all included as simultaneous predictors in the analyses. All three types of analyses yielded similar findings when controlling for the most valid and reliable measure of initial differences in externalizing behavior problems, however. The results for the first type of analysis (one disciplinary tactic at a time on the full sample) will be discussed first because it was used in the original study[1]. This will be followed by a brief consideration of the similarities and differences in the alternative types of analyses, which are summarized in Tables 6 and 7.
Independent Analyses of Disciplinary Tactics in the Full Sample
The first purpose of this study was to compare the child outcomes of spanking with outcomes for alternative disciplinary actions that parents could use instead of spanking. When analyzed one at a time, more frequent use of all three types of nonphysical punishment was associated with higher subsequent antisocial behavior, with effect sizes similar to spanking, as shown in Table 5. No alternative disciplinary tactic was associated with significantly lower antisocial behavior, even after improving the covariate measures. Grounding and psychotherapy were associated with significantly higher antisocial behavior as often as spanking was. Removing privileges and sending children to their room did not have as many significant associations with antisocial behavior, but their effect sizes were similar to those of spanking. The effect size (β) for spanking was within .03 of the mean effect size for the three disciplinary alternatives when controlling for initial antisocial behavior in various ways. Psychotherapy had consistently more adverse effect sizes than spanking and the other disciplinary tactics, except for unadjusted correlations and the first structural equation model (see the next-to-last row of Table 5).
The second purpose of the study was to see whether the apparent causal effects would remain significant after improving the covariate measure of pre-existing antisocial behavior. Consistent with a residual confounding explanation, the apparently adverse effects of all disciplinary tactics and psychotherapy became non-significant as the covariate measure of antisocial behavior became more comprehensive and reliable. The non-significant coefficients changed signs when predicting simple gain scores in a latent externalizing variable in the final analysis, consistent with residual confounding due to selection biases [48, 56].
Analyses of the Subsample Receiving Some Disciplinary Correction
Part of the selection bias occurred because the zero-use group for each disciplinary tactic included the best-behaved children whose behavior never led to any of these disciplinary corrective actions in the referent week. To evaluate the role of this part of the bias, all the analyses were repeated for the subsample that required at least one disciplinary tactic during the referent week (Table 6). After removing the 27% of the children who received no disciplinary tactics, spanking never predicted significantly greater subsequent antisocial behavior after controlling for initial antisocial behavior. Grounding and psychotherapy showed generally more adverse effects than spanking, whereas privilege removal and sending children to their room had smaller non-significant effect sizes than spanking.
Analyzing All Corrective Actions Simultaneously in the Full Sample
A second alternative to the original type of analysis was to include all disciplinary tactics and psychotherapy as simultaneous predictors in analyses of the full sample. These analyses accentuated the small differences in effect sizes, resulting in near-zero effect sizes for privilege removal and sending children to their room in all of the analyses (see Table 7). In contrast, spanking, grounding, and psychotherapy were associated with significantly higher subsequent antisocial behavior in most of the analyses, except when controlling for the most comprehensive measure of externalizing behavior problems, especially in its more reliable latent form.
Regardless of the type of analysis, all disciplinary tactics and psychotherapy showed small non-significant associations with antisocial behavior when the measure of pre-existing differences maximized comprehensiveness and minimized measurement error in the latent variable analyses in the last two rows of Tables 5, 6, and 7. Moreover, the non-significant associations generally reversed signs when predicting simple gain scores in the latent variable of externalizing behavior problems, consistent with small residual selection biases[44, 48, 56]. These results suggest that the findings of all three types of analyses are due to residual selection biases that are minimized by controlling for a latent variable for externalizing behavior problems to reduce measurement error and to maximize the comprehensiveness of the proxy for the selection process.
Differential Selection Biases: Mild vs. Other Corrective Actions
Why do sending children to their room and privilege removal appear to have less adverse effects than spanking, grounding, and psychotherapy when using covariates of intermediate adequacy in the latter two types of analyses (Tables 6 and 7)? The simplest explanation is that the selection bias is smaller for the two mildest disciplinary tactics, creating a differential selection bias in the latter two types of analyses. When each disciplinary tactic is investigated by itself in the full sample, the selection bias is enhanced by well-behaved children requiring no corrective actions in the reporting week. After removing those best-behaved children, the remaining selection bias is minimal for mild disciplinary tactics. Grounding and psychotherapy retained some significantly adverse effects after dropping the best behaved children, probably because they tend to be selected for more problematic misbehavior than are the milder tactics.
Allowing for each disciplinary tactic to be a statistical control for the other disciplinary tactics (and psychotherapy) proved to be similar to controlling for initial antisocial behavior, so that improving the adequacy of the covariate for initial antisocial behavior did not change the effect sizes or significance as much in Table 7 as in the original study's type of analyses in Table 5. In fact, privilege removal and sending children to their room never significantly predicted subsequent antisocial behavior even when controlling only for the other three corrective actions. In one sense, these results illustrate the main point of this study, namely that frequencies of disciplinary tactics are confounded with the behavioral difficulty causing mothers to select those tactics more often. Controlling for the frequency of disciplinary responses to more oppositional misbehavior eliminated the smaller selection bias for the two mildest disciplinary tactics. In contrast, the generally significant effects for spanking, grounding, and psychotherapy indicate that, at any level of using the other tactics, more frequent use of each of those three corrective actions is associated with greater subsequent antisocial behavior than is less use of that corrective action. In other words, given the same degree of behavioral difficulty as indexed by the other disciplinary tactics, greater use of spanking, grounding, and psychotherapy is associated with higher levels of antisocial behavior two years later. But these adverse effects also seem to be due to residual confounding due to selection biases.
There are other indications of a differential selection bias for the two milder tactics compared to the other corrective actions. Sending children to their room, for example, was probably selected for milder behavior problems than was the typical case for grounding. Sending children to their room was the most frequently used tactics, grounding the least used. Sending children to their room had smaller correlations with antisocial behavior in 1990 and 1988 than did grounding, although the difference was smaller for 1988 antisocial behavior. The most important evidence of a differential selection bias is that all significant effects disappeared after controlling for a latent variable for externalizing behavior problems, which maximized the comprehensiveness, validity, and reliability of the covariate.
In sum, the apparently adverse effects of all these disciplinary tactics and psychotherapy seem to be due to selection biases that are stronger for spanking, grounding, and psychotherapy than they are for the two milder disciplinary tactics. Accordingly, the originally adverse effects of spanking replicate for grounding and psychotherapy and are marginally adverse for sending children to their room (Table 5). Across all analyses, grounding and psychotherapy showed as many significantly adverse effects as spanking. The adverse effects of all of these corrective actions became smaller and non-significant when the adequacy of the covariate for pre-existing antisocial behavior was improved.
Implications
The general failure of spanking to show more adverse effects than grounding and psychotherapy in our closest re-analyses in Table 5 is remarkable because the original study produced the largest estimate of an adverse causal effect for customary spanking to date[1]. First, the unadjusted longitudinal correlation in this cohort was larger than Gershoff's [5] average for longitudinal studies of corporal punishment and antisocial outcomes (d = .56 [r = .27] compared to a mean of d = .37 [r = .18])[63]. Second, this cohort had the largest longitudinal correlation between spanking and subsequent antisocial behavior out of the five NLSY cohorts considered by Straus et al[1]. Third, the Straus et al. study had stronger and more consistent causal evidence against spanking than any of the other six prospective studies that have predicted antisocial behavior from customary spanking of children under the age of 13. Therefore the generally similar outcomes for grounding, psychotherapy, and spanking are not due to selecting a sample with a weak effect for spanking.
These results are all consistent with the conclusion that the apparent effects of all of these corrective actions are due to residual confounding from the tendency of more oppositional children to be selected more often for disciplinary corrective actions and for professional corrective actions. Statistically controlling for pre-existing differences reduces this selection bias confound, but fallible covariate measures do not eliminate it. When initial differences in levels of externalizing behavior problems were measured more comprehensively and reliably with latent variables, no corrective action predicted significantly higher antisocial behavior two years later. This result and the similar pattern of results for all corrective actions by parents and professionals are what would be expected if these results were due to residual confounding with a selection bias. When we changed the direction of the bias by predicting latent change scores, then the apparent effects of all of these disciplinary tactics were not only non-significant, but reversed the signs of their coefficients. This is because the usual analyses of residualized gain scores in net-effects regression is biased against corrective actions, but analyses of simple gain scores are biased in favor of them[48, 51, 64].
Overall, this is the same pattern of evidence found in a major early evaluation of Head Start,[43] which concluded that the summer version of Head Start was detrimental. Similar to Straus et al.,[1] the Head Start evaluation controlled statistically for the major confound, which was socioeconomic status. Fortunately, Campbell and others recognized what has been illustrated in the present study - that matching and statistical controls are only partially adequate in correcting for this confound - leaving residual confounding, which Campbell called the under-adjustment bias[44, 65]. Similar to the present study, subsequent re-analyses showed that the apparently detrimental effects of Head Start disappeared with improved covariate measures, but the re-analyses never reversed the effect of summer Head Start into a significantly beneficial effect [66–68].
Other research has shown that most of the corrective actions in this study can be used skillfully to reduce behavior problems in children. Beneficial effects have been demonstrated from more causally conclusive designs for some psychotherapies,[27, 28], time out,[69] privilege removal,[70] and spanking when used to enforce time out in clinically defiant 2- to 6-year-olds[23].
In addition to residual confounding, three other methodological aspects of this type of longitudinal analysis may suppress the detection of effective corrective actions. The first aspect that might suppress evidence of effectiveness is that overly frequent use of any disciplinary tactic reflects less effective ways of implementing it. The more effectively any disciplinary tactic is used, the less the child will misbehave and the less often a parent will need to resort to that disciplinary tactic again. Therefore more effective use of any disciplinary tactic will be associated with a lower frequency of using it, other things being equal. This would be particularly true of last-resort tactics, such as spanking or psychotherapy. Frequent use of a last-resort tactic is a symptom of dysfunction in the entire disciplinary system as well as a symptom of the challenge to that system by the child's oppositional behavior.
A second factor that might suppress evidence of effectiveness is that two years is too long an interval to detect a causal effect of the frequency of any disciplinary tactic during one week. Many events may occur in the course of two years that influence antisocial behavior in the life of a child, including genetic effects, peer effects, and other parenting effects. These other causal influences may account for almost all of the development of antisocial behavior over the next two years, leaving little more to be explained by how often disciplinary punishments were used during a single week two years earlier.
A third suppressor of effectiveness might be exclusive reliance on maternal report, which has limited reliability and some likely biases. It is well known that parental reports of child behavior problems have low positive correlations with reports from teachers, children, and observers (e.g., rs from .25 to .27), although mothers' and fathers' reports correlate more highly with each other (r = .59)[71]. By asking how often each disciplinary tactic was used in the past week, the reports about disciplinary tactics minimize problems of recall and of subjective generalizations. Limiting parental reports to specific behaviors in a very recent time period has been shown to increase the validity of parental reports in other measures[72]. However, the frequency of use in one week may not be typical of other weeks. In addition, there is some evidence that reliance on maternal reports for all data tends to inflate the evidence of adverse effects of all disciplinary punishments[73].
Finally, the near-zero effects may represent the average of effective and ineffective use of these disciplinary tactics in reducing antisocial behavior. The failure to find between-tactic differences in effectiveness raises the possibility that within-tactic differences in how and when a disciplinary tactic is used may be more important than which tactic is used. From this perspective, different ways that parents use these forms of punishment may counterbalance each other, yielding the overall non-significant coefficient. This view is consistent with anecdotes from behavioral parent trainers, who train parents how to use time out consistently, even though many referred parents say they have tried time out previously without success.
Consistent with this last possibility, the meta-analysis by Larzelere and Kuhn[9] found that the outcomes of physical punishment compared differently with outcomes of alternative disciplinary tactics depending upon how physical punishment was used. Child outcomes of physical punishment compared unfavorably with alternatives only when it was used too severely or as the primary discipline method. The outcomes of customary physical punishment (e.g., spanking frequency) were equivalent to those of alternative disciplinary tactics, consistent with the closest replication of Straus et al.[1] in this study. The meta-analysis also found that spanking could be more effective than alternatives when it was used nonabusively to back up milder disciplinary tactics when 2- to 6-year-olds defiantly refused to cooperate with them. Such back-up spanking led to greater reductions in defiance or antisocial behavior than 10 of 13 alternatives it had been compared with directly. One advantage of back-up spanking is that it enhances the subsequent effectiveness of milder disciplinary tactics, such as time out, so that spanking can be phased out in a matter of weeks[23].
Future research needs to discriminate between effective and counterproductive ways of implementing all disciplinary tactics, so that advice to parents can recommend the mildest effective disciplinary tactics for each situation. Statistically controlled studies of the outcomes of frequency of usage fail to provide those discriminations because frequency measures include no information about how disciplinary tactics were implemented or the disciplinary situations for which they were used.
Limitations
Some limitations of this study should be noted. First, all the data were based on maternal report, identical to the original study by Straus et al.[1] and all other statistically controlled longitudinal studies with consistent evidence against customary spanking. It has been shown that evidence based on a single source of information is biased against disciplinary tactics[73]. Second, this study had no data on disciplinary tactics used by fathers, which was also a limitation in the original study.
A third limitation is that we have not duplicated the original study exactly, although we re-analyzed it as closely as possible. In contrast to the original study, we used a log transformation for the antisocial behavior to reduce the influence of extreme outliers in its skewed distribution. We also dropped 22 cases (2.7%) because they had missing data on one or more of the nonphysical consequences in order to ensure that the comparisons among disciplinary tactics were based on exactly the same sample and types of analyses.