Participants and Methods
Healthy children, attending the second year of 6 mainstream primary schools were invited to take part in the reliability study, the practical utility study, and the exploratory RCT. Parents gave informed written consent to participation in the study and children provided verbal assent and an initialled consent form. The study was approved by the UK Central Office for Research Ethics (COREC). Participants were recruited from entire year 2 classes of the 6 volunteering schools in the City of Glasgow, Scotland. Children were eligible for inclusion in the study (n 185 eligible) if they had no known diagnosed disorder of cognition, and had no physical condition affecting their ability to participate in a school PE program (assessed by parent questionnaire).
The present study was in two phases: an initial study of practical utility and reliability of the cognitive outcome measures over three weeks, followed by a 10 week exploratory RCT.
A literature search and contact with experts in the field prior to the present study suggested three measures of cognition which might be suitable as candidate outcome measures for an RCT in young children: the Cognitive Assessment System (CAS;); the Cambridge Neuropsychological Test Battery (CANTAB http://www.cantab.com); the Attention Network Test (ANT) . High reliability of measurements in children under age 7 has been reported in one study for the CAS , but such data are not available for the ANT or the CANTAB and so reliability data for the ANT and CANTAB were collected in the present study prior to the exploratory RCT.
All cognitive tests were administered to children individually in a quiet room in school using a laptop and touch-screen. E-Prime Software (http://www.psnet.com) was used with the ANT. All children were tested by the same trained researcher (AF), seated comfortably approximately 53 cm from the laptop screen, and with the dominant hand resting on the computer mouse. For a detailed description of the ANT see http://www.sacklerinstitute.org/users/jin.fan and Rueda et al , but in brief, the ANT was administered in four blocks of tests, each lasting approximately 5 minutes: a practice block of tests was used first to train the children in what was expected, and to identify any problems they had in performing the test (e.g. understanding of what to do/how to do it); in three subsequent blocks which formed the basis of the ANT outcomes in the present study children were asked to perform 48 short tests, each of which involved a 'flanker' (a fish) presented in 12 potential states (congruent, incongruent, neutral; with no cue, a central cue, a double cue, or a down/up cue). After appearance of the fish on the laptop screen children were asked to press the right or left mouse button corresponding to the direction the fish was pointing. The outcomes for the ANT were reaction time to the stimulus of the fish on screen (ms) and accuracy (number of times the correct mouse button was selected).
For a detailed description of the CANTAB see http://www.cantab.com. For the present study the CANTAB working memory battery was administered as recommended by the manufacturer (http://www.cantab.com), incorporating a test of Spatial Memory Span (SSP) and a test of Spatial Working Memory (SWM). A motor screening test was carried out prior to CANTAB administration to ensure no visual or comprehension problems, and to familiarise participants with the study procedures. The motor screening test involved the appearance of a pink cross on a black laptop screen and children were asked to touch the center of the cross with the dominant hand. The SWM tests the number of items which can be held in working memory by asking participants to observe the laptop screen as a pattern of boxes appears, then to remember and replicate the pattern by touching boxes which are displayed on the screen. The SWM starts with two boxes (items) and progresses to a maximum of 9 boxes, but after two failed attempts the SWM ends. The SSP involves presentation of colored squares on the laptop screen and tests the ability of participants to remember the longest sequences of squares which appear ('span length' test), the ' total errors' (the number of times an incorrect box is chosen) and 'total usage errors' (the number of times boxes are chosen out of the sequence in which they were presented).
The CAS is better established for use in children and adults than the other two cognitive tests . In the present study it was administered precisely as recommended by the test authors . It has to be administered by Psychologists and involves four sub-scales, each tested using three assessments on the laptop: planning; attention; perceptual processing; memory.
In order to collect test-retest reliability data and to test for changes associated with the intervention in the exploratory RCT, the ANT and CANTAB data were collected on three occasions: 3 weeks before to the intervention; just before the intervention (week 0, just before the intervention began) and following the 10 week intervention or control conditions (end of week 10). Since encouraging reliability data were available for children of this age with the CAS , and resources were limited, it was decided not to collect reliability data for the CAS and administer it only at weeks 0 and 10 for the exploratory RCT. Research psychologists responsible for administering the CAS, and research assistants entering the pre and post ANT and CANTAB data, were blinded to group allocation, and to the nature of the study.
Intervention Study: Intervention and Control Group Allocation and Treatment
Immediately after collection of retest data at week 0, a statistician independent of the present study randomised the six schools by computer to receive either the Intervention or Control PE for ten weeks. Prior to randomisation the schools had been matched pair-wise to provide three pairs of schools with similar socio-economic profile, assessed using an area based measure; , size, geographical location, and availability of space for PE. The local council PE specialists responsible for all public primary schools in Glasgow were asked to devise a 10 week experimental PE curriculum for the intervention which consisted solely of the most aerobically active components of the existing curriculum. The same PE specialists delivered 1 session per week and the usual classroom teacher delivered the other session in the experimental group. Teachers received training in the experimental Intervention PE programme and were encouraged to make the sessions 'as physically active as possible' 'minimise instruction time', and 'minimise/avoid any time children were waiting to use equipment, or standing around; minimise object control tasks'.
There is evidence that numerous psychological variables can change as a result of any intervention, perhaps related to increased attention being paid to study participants. To control for any improvement in psychological variables by simply intervening  and to try to ensure that any differences between groups might be attributable to the difference in aerobically intense PE between the two groups, control and intervention groups were matched for intervention time. To match conditions in the intervention group over the same 10 week period the three schools randomly allocated to the control condition received the standard Scottish elementary school PE curriculum, but PE was increased from 1 to two hours per week for the 10 week study, and 1 of the 2 hours of PE per week was delivered by a specialist and one by the class teacher in both groups. To reduce risk of bias of parent ratings of behavior and participant expectations children and parents were not informed which group was hypothesised to change, and outcome measures were made blind to group allocation. During the 10 week winter school term in which the present study took place standard PE consisted largely of skill development (e.g. object control - throwing and catching a ball, balance). The lack of emphasis on aerobic activities increased the contrast between intervention and control groups. Physical activity was measured by accelerometry during the sessions. The PE sessions were observed directly by researchers in two randomly selected teacher-directed and two specialist-directed intervention classes in the first two weeks of the intervention. The direct observations were made to identify problems in the implementation of the intervention, to answer questions about the intervention, and to encourage delivery of the PE intervention as 'prescribed'.
Objectively measured physical activity and sedentary behavior
Habitual physical activity data were collected at week 0 (baseline) by asking participating children to wear the Actigraph GT1M accelerometer (http://www.theactigraph.com) for 7 days. Actigraphs were worn over the right hip on a waist belt and used as described previously [23–25] with 1 minute epochs. Evidence based cut points [23–25] were applied to accelerometry output to define sedentary behavior (accelerometer counts per minute < 1100; , light intensity physical activity (accelerometer count per minute 1100-3200) and moderate-vigorous intensity physical activity (MVPA, > 3200 counts per minute) . Use of cut-points and epochs varies widely between studies, but the options selected for the present study were age-appropriate and choice of cut-point and epoch has only a small impact on the measurement of time spent sedentary and time spent in MVPA . Validity of the Actigraph in children has been demonstrated repeatedly against criterion methods of energy expenditure and direct observation [24, 26]. Reliability of Actigraph-measured habitual physical activity in Scottish 5-6 year olds is high so long as at least three days of data are collected  and in the present study accelerometry data were excluded if < 3 days and 9 hours each day were obtained. Only data collected between the hours of 7 am and 11 pm were included in analyses.
Statistical analysis and power
Reliability of the ANT and CANTAB
Intraclass correlations (ICC), and standard error of measurement (SEM) - measures of within subject variation from biological difference or equipment 'noise' or error  were calculated. The coefficient of variation (CV) - the standard deviation expressed as a percentage of the mean, and limits of agreement (LOA) were calculated using SPSS software and the reliability spreadsheet http://www.sportsci.org/resource/stats. While the required level of reliability of a test will depend on the application, there is general agreement in the psychological literature that ICC's should exceed 0.75 .
The exploratory RCT examined changes in the cognitive outcomes measured by the ANT, CANTAB, and CAS. We also measured parent ratings of child behavior using the short form of the Conner's Parent Rating Scale , on the grounds that increases in physical activity could have favorable effects . All data were checked for normal distribution using graphical summary of data, assessment of skewness, descriptive statistics, and tests of normality. For initial between-group comparisons of cognitive data, t tests were carried out on the change in variables over time. A general linear model was applied to all psychological and behavioral outcome measures with the follow up score as the response variable, 'group' (Intervention or Control), socio-economic status (SES), gender, school (nested within group) as factors and age and baseline (week 0) score as covariates. The study was a pilot, intended to produce data necessary to adequately power a full scale RCT, so was not powered formally. However, a sample of around 60 children (30 per group) was considered both practical and adequate for an exploratory study.