Translation and validation of the German version of the Young Spine Questionnaire

Background Back pain in childhood and adolescence increases the risk for back pain in adulthood, but validated assessment tools are scarce. The aim of this study was to validate the Young Spine Questionnaire (YSQ) in a German version (G-YSQ) in children and adolescents. Methods Children and adolescents between 10 and 16 years (N = 240, 166 females, mean age = 13.05 ± 1.70 years), recruited in chiropractic practices and schools, completed the G-YSQ (translated according to scientific guidelines) and the KIDSCREEN-10 (assessing health-related quality of life) at three time points. Test-retest reliability was determined calculating intraclass correlation coefficients [ICC(3,1)] using start and two week-data. Construct validity was investigated testing a priori hypotheses. To assess responsiveness, the patients additionally filled in the Patient Global Impression of Change (PGIC) after three months and the area under the curve (AUC) of receiver operating curves was calculated. Results The ICC(3,1) was 0.88 for pain intensity and pain frequency, indicating good reliability, 0.68 for week prevalence and 0.60 for point prevalence, indicating moderate reliability. Pain intensity, frequency and prevalence differed between patients and controls (p < 0.001) and, except point prevalence, between older (> 12 years) and younger control participants (p < 0.01). Health-related quality of life of participants with severe pain (in one or several spinal regions) was lower (KIDSCREEN-10, total score: F(4,230) = 7.26, p < 0.001; KIDSCREEN-10, self-rated general health: H(4) = 51.94, p < 0.001) than that of participants without pain or with moderate pain in one spinal region. Thus, altogether these findings indicate construct validity of the G-YSQ. The AUC was 0.69 (95 % CI = 0.57–0.82) and 0.67 (95 % CI = 0.54–0.80) for week and point prevalence, respectively, indicating insufficient responsiveness of the G-YSQ. Conclusions Apart from the question on point prevalence, construct validity and sufficient test-retest reliability was shown for the G-YSQ. However, its responsiveness needs to be improved, possibly by asking for pain frequency during the last week instead of (dichotomous) week prevalence. Trial registration ClinicalTrials.gov, NCT02955342, registered 07/09/2016, https://clinicaltrials.gov/ct2/results?cond=&term=NCT02955342&cntry=CH&state=&city=Zurich&dist=. Supplementary Information The online version contains supplementary material available at 10.1186/s12887-021-02804-y.

Keywords: Adolescence, Back pain, Childhood, Neck pain, Reliability, Responsiveness, Validity, Young Spine Questionnaire Background Back pain and neck pain are leading causes for years lived with disability [1] and have a large impact on individuals, their families, employers and healthcare systems [2]. Spinal pain starts early in life and its prevalence increases with age, in particular around the age of 12 to 15 years [3,4]. Back pain in childhood and adolescence is a significant risk factor for developing back pain in adulthood: the number of days in a given year with low back pain (LBP) in adolescence was shown to be associated with the risk of developing LBP in adulthood [5]. However, research on spinal pain in childhood and adolescence shows large heterogeneity in assessing pain prevalence, pain intensity and associated disability [3], and validated assessment tools are scarce [6].
In 2013, the Young Spine Questionnaire (YSQ) was developed in Denmark as an instrument to measure spinal pain in the young population [7], though it has so far only been tested for content validity in preliminary versions during the developmental process in a population of Danish schoolchildren in the age range of 9-11 years [7]. To allow for the practical implementation of the YSQ, it must be shown to provide accurate, valid and interpretable data. Therefore, the assessment of instrument validity and reliability is essential [8,9].
Because the YSQ was originally not designed for capturing change, it is not known whether it is suitable to measure change in a longitudinal study design [7].
Thus, the first aim of this study was to translate the YSG into German (G-YSQ) to obtain a standardized assessment tool of child and adolescent back and neck pain which can be used in Switzerland (and other German-speaking countries). The second aim was to test the G-YSQ for validity, reliability and responsiveness in children and adolescents between 10 and 16 years.

Translation
After authorization by the original authors of the YSQ, the questionnaire was translated into German in five steps according to the guidelines by Beaton et al. [10] ( Fig. 1).
Two native German speakers (two researchers) independently translated the questionnaire forward from English to German (stage I) and produced one German version (stage II). Two bilingual (English and German) individuals (two researchers) independently translated this German version back to English (stage III). From these two versions, one common prefinal version was produced by the four forward or back translators (stage Fig. 1 The five steps in translating the English Young Spine Questionnaire into the German version according to procedures recommended by Beaton et al. [10] Nyirö et al. BMC Pediatrics (2021) 21:359 IV), which was then pilot tested (stage V) with 17 children and adolescents between 9 and 15 years (mean age = 12.25 ± 1.70 years). They were asked to report any difficulties in completing the questionnaire on their own and to make suggestions for improvement if considered necessary. During the translation process, two cultural adaptations were made to the G-YSQ compared to the YSQ: (i) the terms 'chiropractor' and 'physical therapist' were omitted, because it was questioned whether Swiss children are familiar with the chiropractic profession and because there is no direct access to physiotherapy in Switzerland; (ii), the terms 'stepfather' and 'stepmother' were omitted, because these terms are uncommon in contemporary Swiss German. During the pilot testing of the prefinal version of the G-YSQ, three adolescents (aged 13 and 15) reported that they would prefer text or scales instead of faces indicating pain intensity but none of the children and adolescents reported any difficulties in completing the questionnaire. Nevertheless, the statements of several participants during the main study indicated the necessity for an answer option 'I don't know' in the questions about paternal back problems.

Study participants
Patients between 10 and 16 years of age with neck or back pain were recruited in seven private outpatient chiropractic clinics. The majority (76/100) came from one clinic specialized in the treatment of children and adolescents. Age-and gender-matched pupils (frequency matching) were recruited in four primary and secondary schools: After agreement of the teacher, two researchers from the Balgrist University Hospital, Zurich, Switzerland (FR, BW) personally informed the pupils about the study during a school lesson and distributed the information sheets. The study was approved by the ethics commission of the Canton of Zürich, Switzerland (BASEC-Nr 2016_00568) and was registered at Clinical-Trials.gov (NCT02955342). According to Swiss law, written informed consent of adolescents older than 14 years and their parents/legal guardians was required for participation. For participants between 10 and 14 years, oral informed consent of the children and adolescents and written informed consent of their parents/legal guardians was required.

Outcomes
There were three measurement time points: at enrolment (start), at two weeks, and at three months after start. The patients filled in the questionnaires at the start in the chiropractic clinics and received the two-week and three-month questionnaires by mail from the coordinating researchers at the Balgrist (FR, BW). To acquire follow-up data of the control participants, the same researchers physically distributed the questionnaires at start, after two weeks and three months at the schools, mostly during a lesson in physical education. Both groups answered the questionnaires at all time points in paper form. The questionnaires consisted of the G-YSQ and the KIDSCREEN-10: the YSQ/G-YSQ assesses pain frequency, week pain prevalence, point pain prevalence and pain intensity (revised Faces Pain Scale (rFPS) [11]) in each of the three spinal regions (each visualized in a drawing) and asks about pain-related consequences (school absenteeism, activity restrictions in sports and care seeking behavior) as well as parental back problems and related work absenteeism [7]. The KIDSCREEN-10 [12,13], including ten questions on physical and emotional well-being and one question on self-rating of general health, was used to estimate the impact of spinal pain on health-related quality of life (HRQoL). After two weeks, the control participants additionally answered a question on whether the state of their back problem remained stable during the past two weeks, and the patients completed the Patients' Global Impression of Change (PGIC), which evaluates the patients' rating of overall improvement on a seven-point Likert scale [14]. For the present study, PGIC was dichotomized (im-proved='very much better' or 'much better') [15,16]. The assessment after three months included the G-YSQ, the KIDSCREEN-10 and the PGIC (patients only).

Data analysis
Several participants reported spinal pain in the question on pain frequency, but reported a pain intensity of zero. This was observed for all spinal regions (neck: N = 8; midback: N = 7; low back: N = 8). Incompletely filled in questionnaires were included in all analyses for which they provided values. Pupils who previously sought medical care for back or neck pain were analyzed in the patient group because their care-seeking due to back or neck pain was considered an indicator for a back or neck problem. The G-YSQ data was analyzed by calculating sum scores across the three spinal regions for pain frequency (0 = no pain; 1 = yes, once in a while; 2 = yes, once or twice; 3 = yes, often), pain prevalence (0 = no, 1 = yes) and pain intensity (first face/no pain = 0, sixth face/very much pain = 5), resulting in a pain frequency sum score (range 0 to 9), a week prevalence and point prevalence sum score (each ranging from 0 to 3) and a pain intensity sum score (range 0 to 15). The sum score of the KIDSCREEN-10 was calculated according to the KIDSCREEN handbook [12].

Validity
Construct validity was investigated using the start data by testing a priori hypotheses [17,18]: (i) The sum scores of pain intensity, frequency and prevalence (week, point) are higher in patients than in controls; (ii) The sum scores of pain intensity, frequency and prevalence (week, point) are higher in older (> 12 [19][20][21]) than in younger controls; (iii) Self-rating of general health via the KIDSCREEN-10 significantly differs between patients and controls and strongly correlates (r S >0.6) with the sum scores of pain intensity, frequency and prevalence; (iv) The KIDSCREEN-10 total score moderately correlates (r S >0.4) with the sum scores of week and point prevalence (all measures refer to the last week); (v) Participants with severe pain report low general health and low total scores on the KIDSCREEN-10. To test this hypothesis, the participants were sub-grouped based on pain severity, defined by pain frequency and intensity, across spinal regions [22]. The resulting three severity levels 'no pain' (frequency: never, once or twice or once in a while; intensity: lowest two intensities on the rFPS), 'severe pain' (frequency: once in a while or often; intensity: highest three intensities on the rFPS) and 'moderate pain' (in between) were combined to an 'overall spinal pain composite variable' [22] with five levels: 'multiple severe pain' ('severe pain' in two or three spinal regions), 'one-sited severe pain' ('severe pain' in one spinal region), 'multiple moderate pain' ('moderate pain' in two or three spinal regions), 'one-sited moderate pain' ('moderate pain' in one spinal region), and 'no pain' ('no pain' in all spinal regions). If pain severity differed between the three spinal regions, the region with the most severe pain was used for the overall spinal composite variable. The hypotheses (i) and (ii) were tested using Mann-Whitney U-tests, hypothesis (iii) using Mann-Whitney U-test and Spearman correlation, and hypothesis (iv) was tested using Spearman correlation. Spearman's coefficient values (r S ) were interpreted as excellent (> 0.9), good (0.7-0.9), moderate (0.5-0.69), fair (0.2-0.49), or minimal to absent (0.0-0.19) [23]. Hypothesis (v) was tested using a one-way ANOVA (KIDSCREEN-10 total score) and Kruskal Wallis test (self-rated general health) and post-hoc tests in case of significance, between the five levels of the 'overall spinal pain composite variable'.

Reliability
Test-retest reliability was assessed based on the agreement between start data and data after two weeks. For the reliability of week and point prevalence, the data of those children and adolescents who reported no or stable spinal pain was used. Unweighted Cohen's Kappa and Intraclass correlation (ICC) (3,1) was used to analyze categorical and ordinal data, respectively [17]. Kappa values were interpreted as 0.01-0.20 none to slight, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 substantial, and 0.81-1.00 almost perfect [24,25]. ICC-values > 0.90 were considered excellent, 0.75-0.90 good, 0.50-0.74 moderate and < 0.50 as poor [26].

Responsiveness
To test responsiveness, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was calculated as a measure to discriminate between two groups according to an external gold standard [17]. The ROC curve is a probability curve in which each value represents the sensitivity versus 1-specificity for all possible cut-off points. The AUC represents the integral under the ROC curve fitted through these points and serves as a measure of discrimination. A value of 0.5, represented by the diagonal, indicates that the measurement instrument has no discrimination capacity to distinguish between the two groups, and an AUC greater 0.70 is recommended for sufficient responsiveness [17]. In the present study, perceived recovery after three months (PGIC dichotomized) was used as external measure, and data of patients at start and after three months were analyzed.
The statistical analyses were conducted using R (version 3.5.0) for the validation measures and IBM SPSS (version 25) for the remainder. The significance level alpha was set at 0.05.

Results
Data was collected between January 2017 and February 2019. In total, 240 participants (166 females and 74

Validity
Patients reported significantly higher sum scores of pain frequency (p < 0.001), week prevalence (p < 0.001), point prevalence (p < 0.001) and pain intensity (p < 0.001) compared to the control participants (Table 2). Control participants older than 12 years had significantly higher sum scores of pain frequency, week prevalence and pain intensity compared to the younger controls, indicating that the G-YSQ captures the natural age-related increase in spinal pain prevalence [3,4]. In contrast, older and younger control participants did not significantly differ in point prevalence (Table 3).
Significantly fewer patients than controls reported excellent general health (p < 0.001) ( Table 4).
Lower self-ratings of general health were associated with higher sum scores in the G-YSQ (Fig. 2): the correlations between self-rated general health and the pain frequency sum score (r s =0.46), the pain prevalence sum scores (week prevalence: r s =0.42; point prevalence: r s = 0.28) and the pain intensity sum score (r s =0.39) were moderate to fair.
The pain severity subgroups significantly differed in the KIDSCREEN-10 sum score [F(4,230) = 7.26, p < 0.001] and in the self-rating of general health [H(4) = 51.94, p < 0.001]. Post-hoc tests indicated that the 'no pain' and the 'one-sited moderate pain' groups reported significantly higher KIDSCREEN-10 sum scores and higher ratings of general health than the 'one-sited severe pain' and the 'multiple severe pain' groups (p-values for sum score = 0.001 and 0.005; p-values for general health < 0.001). The 'multiple moderate pain' subgroup did not significantly differ from any of the other pain groups in terms of the KIDSCREEN-10 sum score, but rated general health significantly better than the 'multiple severe pain' subgroup (p = 0.021): e.g. 28 % of the participants in the 'multiple moderate pain' subgroup, compared to 5 % in the 'multiple severe pain' subgroup, rated their health as excellent (Table 5).
Thus, although the correlations with the KIDSCREEN-10 sum score and the self-rated general health were lower than expected, all hypotheses could be confirmed, which indicates good construct validity of the G-YSQ.

Reliability
The G-YSQ showed good reliability in terms of pain intensity and pain frequency ( Table 6). No pain or stable pain during the past two weeks was reported by 83 control participants, and the reliability of week prevalence in those data was moderate for all spinal regions (the neck, the middle and the lower back) as well as for the sum score. Reliability of point prevalence was fair to moderate for the neck, the middle and the lower back, and moderate for the sum score. The reliability of the questions on the consequences of spinal pain was good for school absence and doctor visits and moderate for restriction in sports. All questions on parental spinal pain showed moderate reliability (Table 6).

Responsiveness
Of the 68 patients who answered the PGIC after three months (27 patients answered the G-YSQ, but did not fill in the PGIC), 61.8 % reported clinically significant improvement. AUC was 0.69 (95 %CI = 0.57-0.82) for the sum of week prevalence and 0.67 (95 %CI = 0.54-0.80) for the sum of point prevalence (Fig. 4), indicating insufficient responsiveness.

Discussion
The aims of this study were to translate the YSQ into German according to scientific guidelines and to determine construct validity, test-retest reliability, and responsiveness of the G-YSQ. The translation process of the original YSQ into German was straightforward apart from two minor issues in terms of cross-cultural adaptation. Except for the question on point prevalence, the G-YSQ was shown to possess construct validity and sufficient test-retest reliability, but its responsiveness needs to be improved.
Although the study population of the present study was older than the population for which the original Table 4 Self-rated general health in the two cohorts  questionnaire was designed, the G-YSQ discriminated well between patients and controls and between older and younger children and adolescents, depicting the natural age-related increase in spinal pain prevalence [3,4]. The G-YSQ (week and point prevalence) correlated fairly with the KIDSCREEN-10 sum score. This finding might reflect a weaker association between back pain in childhood and adolescence and psychosocial issues than commonly believed [27][28][29][30][31][32] because eight of the ten questions of the KIDSCREEN-10 are on psychosocial topics and only two on physical health. Self-rated general health was better in the control group. However, the correlation to pain frequency, pain intensity and week prevalence was fair and weaker than hypothesized, which might reflect that the majority of children and adolescents, approximately 9 out of 10 [33], are not seriously impacted by their back pain [33,34]. Nevertheless, a minority is and identifying this minority, i.e. differentiating between trivial and consequential back pain, is one of the biggest challenges with respect to back pain in childhood and adolescence [35]. Although most spinal pain in childhood and adolescence is self-limiting [36] or 'trivial' [35], pain problems in children and adolescents in general [37], and back pain [38] and LBP [33] in particular, are associated with lower health-related quality of life. A decline in quality of life might be an early indicator of potential vulnerability during development [39] and might be a symptom of a multidimensional process [33], possibly leading to consequences in adulthood ('consequential' pain [35]). To avoid medicalization of the problem and to target treatment, consequential back pain should be differentiated from trivial back pain in childhood and adolescence [35]. In the present study, the cut-off for potentially consequential spinal pain as defined by a significant impact on health-related quality of life was having severe pain (one-sited or multiple), and 32 % of the controls belonged to these subgroups. The present study shows that both, pain frequency and intensity, need to be assessed in each spinal region to capture all children and adolescents whose spinal pain impacts their quality of life and might therefore be consequential. Whether these individuals will indeed develop a back problem in adulthood needs to be verified in longitudinal studies. The G-YSQ also showed test-retest reliability, particularly for pain intensity and pain frequency. The lower values for validity and reliability of the question on point prevalence might reflect the natural clinical course of spinal pain in childhood and adolescence, which appears to fluctuate [40], rather than a deficiency of the assessment tool. Nevertheless, regarding moderate reliability, deficiencies in construct validity and insufficient responsiveness, the benefit of asking for point prevalence is questionable, and omitting this question is recommended. In its present form, the YSQ/G-YSQ cannot be recommended to be used to measure change. This is not surprising given that the answers to the only questions possibly reflecting change (presence or absence of low back pain, mid back pain or neck pain during the last week) are binary (yes/no). Refining this dichotomous structure similarly to the question on pain frequency ('often', 'once in a while', 'once or twice', 'never') might enhance the responsiveness of the YSQ/G-YSQ. Similarly, it might be worth attempting to address the problem of inconsistent answers to the question on pain frequency and corresponding pain intensity by rearranging these items (rFPS for pain intensity directly following the question on pain frequency).
The age range of 10 to 16 years was selected as it appears to be the crucial period for developing spinal pain [3,4]. Nevertheless, this implies that the findings cannot be generalized to younger children. Pupils who indicated that they have seen a doctor because of back or neck pain (N = 31) were analyzed in the group of patients because we considered them as having a back or neck problem. However, these participants did not necessarily have back pain at the time of the survey. We therefore repeated the analysis without these pupils. Results were similar in terms of group differences, ICC-and AUCvalues. Group differences became even more pronounced without these pupils. Three adolescents (aged 13 and 15) of the pilot phase reported that they preferred indicating pain intensity on a scale rather than on the rFPS. There is no optimal pain assessment to be used throughout development from childhood to adolescence [41,42]: for school-aged children (from 8 to 12 years) the rFPS is recommended [42]. Adolescents prefer visual analogue scales (VAS) or numeric rating scales (NRS) [42], although faces scales have been reported to also be well accepted [41,42]. A supplementation of the rFPS with a VAS or NRS might be useful to make the YSQ/G-YSQ applicable throughout childhood and adolescence. Not all participants could be reached for the follow-up after two weeks and three months. Because data of pupils were collected at schools, these missing values were most likely at random. However, a closer look at differences between responding and nonresponding patients revealed that they were comparable in terms of gender and age. There was a tendency for the responders to have more frequent and more intense back or neck pain at start compared to the non-responders. Because only the start data were used for validity assessment and the assessment of reliability and responsiveness involved only intra-subject comparisons, it seems unlikely that data from the non-responders would have changed the findings. Another study limitation might have been that the first reassessment took place after two weeks, although the question on week prevalence in the YSQ and the questions in the KIDSCREEN-10 refer to the last week. However, there is no standard for an ideal time period for testing reliability, and a time interval of two weeks is commonly used [17].

B). Confidence interval of AUC is shown in brackets
Authors' contributions LN performed the statistical analysis, was involved in data interpretation and drafted the manuscript. TP was involved in the statistical analysis and data interpretation. MHS performed most of the patient data acquisition and contributed to data interpretation. FR was involved in data acquisition and in data interpretation. PSch contributed to the conception of the study, was involved in data interpretation and in writing the manuscript. BW conceived the study, was involved in data acquisition in schools, in statistical analysis and data interpretation and finalized the manuscript. All authors contributed to and approved the final manuscript.
Funding BW received salary support from the Swiss Chiropractors' Association ChiroSuisse, which is not bound to any specific project. The funder did not have any specific role in this project.

Availability of data and materials
The datasets used are available from the corresponding author upon request.

Declarations
Ethical approval and consent to participate The study was conducted in accordance with the Declaration of Helsinki. It was approved by the ethics commission of the Canton of Zürich, Switzerland (BASEC-Nr 2016_00568) and was registered at ClinicalTrials.gov (NCT02955 342). According to Swiss law, the parents or legal guardians of all participants had to give written informed consent. Children and adolescents between 10 and 14 years received written information adapted to their age and gave oral informed consent for study participation. Adolescents older than 14 years (from the 14th birthday onwards) received the same information as their parents or legal guardians and gave written informed consent.

Consent for publication
Not applicable.