Study design and sample collection and analysis
IRB approval and consent
This study was approved by the IRB of Mayo Clinic-Arizona and the IRB of Arizona State University. All parents signed informed consent forms after the study was explained to them.
Advertising
The study advertisement was emailed to several thousand ASD families on the email lists of the ASU Autism/Asperger’s Research Program and the Zoowalk for Autism Research. Other local autism groups such as the Autism Society of Greater Phoenix also helped advertise the study. Finally, participants were invited to share the study advertisement with their network of friends.
Participants
The inclusion criteria were:
-
1)
Mother of a child 2–5 years of age
-
2)
Child has ASD or has typically development (TD) including both neurological and physical development
-
3)
Child ASD diagnosis verified by the Autism Diagnostic Interview-Revised [17]
The exclusion criteria were:
-
1)
Mother currently taking a vitamin/mineral supplement containing folic acid and/or vitamin B12
-
2)
Mother currently taking or had taken any vitamin supplements within the past 2 months
-
3)
Mother pregnant or planning to become pregnant in the next 6 months
The recruitment period ran from August 2016 until July 2017. Thirty mothers who have a child with ASD (ASD-M) and twenty-nine mothers who have TD children (TD-M) were recruited for this study. Originally, there were three additional ASD-M participants. However, two of the mothers were disqualified because their child did not meet the ADI-R criteria and one was disqualified because the child did not have an official ASD diagnosis by a psychiatrist, licensed psychologist, or developmental pediatrician. The mothers were age-matched by group. Enrollment was done on a rolling basis, the control group was recruited so that the two groups had a similar average age and all came from the greater Phoenix, Arizona area. All mothers in the ASD-M group had a child previously diagnosed with ASD and the diagnoses were confirmed using the ADI-R. The ADI-R is a 2-h structured parent interview and is one of the primary tools used for clinical and research diagnosis of ASD [17]. All the ADI-R interviews were conducted by Elena L. Pollard, who is a certified rater on the ADI-R and has conducted over 300 ADI-R evaluations.
Diet and medical history
An estimate of dietary intake during the previous week was obtained using Block Brief 2000 Food Frequency Questionnaire (Adult version), from Nutrition Quest (www.nutritionquest.com). Medical histories and current medical symptoms were collected from the mothers using a self-survey. The symptoms were collected as there is little research on the health of the mothers of children with ASD. In these surveys, pesticide exposure was defined as any pesticides used in their home during pregnancy. Furthermore, the prenatal supplement usage was recorded as whether any prenatal supplements were used, however, the specific type of prenatal supplement was not recorded. These variables are included in order to address potential cofactors in the metabolic analysis.
Biological sample collection
Urine collections and blood draws were conducted over a 12-month period from September 2016 to August 2017 for both groups. Most participants did their blood draws in fall/winter, with a few in spring or summer for both groups. Fasting whole blood samples were collected in the morning at the Mayo Clinic and all urine collections were first-morning. Samples were stored at -80 °C freezers at Mayo and ASU until all samples were collected, and then all samples were sent together to Metabolon for testing. The amount of time the samples were frozen ranged from 1 to 12 months with an average of 8 months.
Laboratory tests
Laboratory measurements were conducted by Mayo Clinic, the Metabolic and Oxidative Stress Laboratory at the University of Arkansas for Medical Sciences, and Metabolon Inc. as described below.
Mayo Clinic
Mayo Clinic laboratories measured levels of vitamin B12, folate, methylmalonic acid, homocysteine, isoprostane, vitamin D, vitamin E, hCG, and MTHFR variants as described below.
Vitamin B12 (cyanocobalamin) was measured quantitatively with a Beckman Coulter Access competitive binding immunoenzymatic assay. Briefly, serum is treated with alkaline potassium cyanide and dithiothreitol to denature binding proteins and convert all forms of vitamin B12 to cyanocobalamin. Cyanocobalamin from the serum competes against particle-bound anti-intrinsic factor antibody for binding to intrinsic factor – alkaline phosphatase conjugate. After washing, alkaline phosphatase activity on a chemiluminescent substrate is measured and compared against a multi-point calibration curve of known cyanocobalamin concentrations.
Folate (vitamin B9) was measured quantitatively with a Beckman Coulter Access competitive binding receptor assay. Briefly, serum folate competes against a folic acid – alkaline phosphatase conjugate for binding to solid phase-bound folate binding protein. After washing, alkaline phosphatase activity on a chemiluminescent substrate is measured and compared against a multi-point calibration curve of known folate concentrations. The Folate assay is designed to have equal affinities for Pteroylglutamic acid (Folic acid) and 5-Methyltetrahydrofolic acid (Methyl-THF), so the result is a measure of both.
Methylmalonic acid (MMA) was measured quantitatively by liquid chromatography tandem mass spectrometry (LC-MS/MS). Briefly, serum is mixed with d3-methylmalonic acid as an internal standard, isolated by solid phase extraction, separated on a C18 column, and analyzed in negative ion mode. Chromatographic conditions and mass transitions were chosen to carefully distinguish methylmalonic acid from succinic acid.
Homocysteine was measured quantitatively by LC-MS/MS. Serum is spiked with d8-homocystine as an internal standard, reduced to break disulfide bonds, and deproteinized with formic acid and trifluoroacetic acid in acetonitrile. Measurement of total homocysteine and d4-homocysteine (reduced from d8-homocystine) is performed in positive ion mode with electrospray ionization.
Urine F2-Isoprostane (8-isoprostane) was measured quantitatively by LC-MS/MS after separation from prostaglandin F2 alpha. Urine is spiked with deuterated F2-isoprostane and deuterated prostaglandin F2 alpha, then positive pressure filtered. A mixed mode anion exchange turbulent flow column is used to clean up samples which are then separated on a C8 column and analyzed in negative ion mode.
Vitamin D (25-hydroxyvitamin D2 and D3) was measured quantitatively by LC-MS/MS. D6–25-hydroxyvitamin D3 is added to serum as an internal standard before protein precipitation with acetonitrile. Online turbulent flow chromatography is used to further clean up the samples prior to separation on a C18 column and analysis in positive ion mode. The D2 and D3 forms are measured separately; results are reported as D2, D3, and the sum.
Vitamin E was measured quantitatively by LC-MS/MS. D6-alpha-tocopherol internal standard is added to serum, and proteins are precipitated with acetonitrile. The supernatant is subjected to online turbulent flow for sample cleanup, separated on a C18 column, and analyzed in positive ion mode.
Serum ferritin was measured quantitatively with a Beckman Coulter Access two-site immunoenzymatic (sandwich) assay. Serum ferritin binds mouse anti-ferritin that is immobilized on paramagnetic particles; ferritin is also bound by a goat anti-ferritin – alkaline phosphatase conjugate. After washing, alkaline phosphatase activity on a chemiluminescent substrate is measured and compared against a multi-point calibration curve of known ferritin concentrations.
MTHFR mutation analysis was performed for the A1298C and C677T variants using Hologic Invader assays. DNA was isolated from whole blood and amplified in the presence of probes for both wildtype and variant sequences. Hybridization of sequence-specific probes to genomic DNA leads to enzymatic cleavage of the probe, releasing an oligonucleotide that binds to a fluorescently labeled cassette. This second hybridization results in generation of a fluorescent signal that is specific to the wildtype or variant allele. The MTHFR gene mutations are measured as categorical variables that indicate whether a sample has the mutation.
The Metabolic and Oxidative Stress Laboratory (MOSL) located at Arkansas Children’s Research Institute performed the measurements described below.
Sample preparation for measurement of plasma methylation and oxidative stress metabolites
For concentration determination of total thiols (homocysteine, cysteine, cysteinyl-glycine, glutamyl-cysteine, and glutathione), the disulfide bonds were reduced and protein-bond thiols were released by the addition of 50 μl freshly prepared 1.43 M sodium borohydride solution containing 1.5 μM EDTA, 66 mM NaOH and 10 μl n-amyl alcohol and added to 200 μl of plasma. After gentle mixing, the solution was incubated at + 4 °C for 30 min with gentle shaking. To precipitate proteins, 250 μl ice cold 10% meta-phosphoric acid was added and the sample was incubated for 20 min on ice. After centrifugation at 18,000 g for 15 min at 4 °C, the supernatant was filtered through a 0.2 μm nylon filter and a 20 μl aliquot was injected into the high-performance liquid chromatography (HPLC) system.
For determination of free thiols and methylation metabolites, proteins were precipitated by the addition of 250 μl ice cold 10% meta-phosphoric acid and the sample was incubated for 10 min on ice. Following centrifugation at 18,000 g for 15 min at + 4 °C, the supernatant was filtered through a 0.2 μm nylon and a 20 μl aliquot was injected into the HPLC system.
HPLC with Coulometric electrochemical detection
The methodological details for metabolite elution and electrochemical detection have been described previously [18, 19] The analyses were accomplished using HPLC with a Shimadzu solvent delivery system (ESA model 580) and a reverse phase C18 column (5 μm; 4.6 × 150 mm, MCM, Inc., Tokyo, Japan) obtained from ESA, Inc. (Chelmsford, MA). A 20 μl aliquot of plasma extract was directly injected onto the column using Beckman autosampler (model 507E). All plasma metabolites were quantified using a model 5200A Coulochem II electrochemical detector (ESA, Inc., Chelmsford, MA) equipped with a dual analytical cell (model 5010) and a guard cell (model 5020). The concentrations of plasma metabolites were calculated from peak areas and standard calibration curves using HPLC software.
MOSL calculated metabolite ratios that include SAM/SAH, fGSH/GSSG, tGSH/GSSG, fCysteine/fCystine, and the percent oxidized glutathione.
Metabolon Inc
Metabolon Inc. conducted measurements of 595 metabolites in whole blood samples in a manner similar to a previous study [20]. Briefly, individual samples were subjected to methanol extraction then split into aliquots for analysis by ultrahigh performance liquid chromatography/mass spectrometry (UHPLC/MS). The global biochemical profiling analysis comprised of four unique arms consisting of reverse phase chromatography positive ionization methods optimized for hydrophilic compounds (LC/MS Pos Polar) and hydrophobic compounds (LC/MS Pos Lipid), reverse phase chromatography with negative ionization conditions (LC/MS Neg), as well as a hydrophilic interaction liquid chromatography (HILIC) method coupled to negative (LC/MS Polar) [21]. All of the methods alternated between full scan MS and data dependent MSn scans. The scan range varied slightly between methods but generally covered 70–1000 m/z.
Metabolites were identified by automated comparison of the ion features in the experimental samples to a reference library of chemical standard entries that included retention time, molecular weight (m/z), preferred adducts, and in-source fragments as well as associated MS spectra and curated by visual inspection for quality control using software developed at Metabolon. Identification of known chemical entities was based on comparison to metabolomic library entries of purified standards [22]. Metabolites that were not officially confirmed with a standard are marked throughout the paper with a *. Measurements that were below the detection limit were replaced with the next lowest measurement divided by the square root of two.
Statistical analysis
Univariate analysis
To conduct a univariate analysis, a test was performed for whether the population means or medians between the ASD-M group and the TD-M group are equal against the alternative hypothesis that they are not. To determine which testing method to use, the Anderson-Darling test [23] was applied to each sample. If the recorded samples of a particular metabolite or ratio were drawn from two normal distributions an F-test was subsequently performed to determine whether the population variances of both distributions were identical. If at least one of the two samples of a particular metabolite or ratio was not drawn from a normal distribution, the two-sample Kolmogorov-Smirnov test [24] was applied to examine whether the two samples were drawn from unknown distributions that had the same shape. This pre-analysis yielded four distinct scenarios for a particular metabolite or ratio: (i) both samples were drawn from normal distributions that had identical population variances, (ii) both samples were drawn from a normal distribution with unequal population variances, (iii) both samples were drawn from two unknown distributions that had the same shape and (iv) both samples were drawn from distinctively different distributions. For scenarios (i), (ii), (iii) and (iv) the standard Student t-test (t=), the Welch test (t ≠) [25], the Mann-Whitney U test (MW) [26] and the Welch t-test (t ≠ †) were applied, respectively, for a significance of α = 0.05. If a p-value is less than α, the null hypothesis is rejected. Conversely, for a p-value above or equal to α, the null hypothesis cannot be rejected.
Some of the data analyzed below is categorical. In order to analyze these data, the Chi-square test (χ2) was used for independence. This tests if categorical variables are independent [27]. If this is so, the next step is to determine if the recorded categorical variables are dependent on whether the mother has previously had a child with ASD.
In order to determine the robustness of the hypothesis tests, the false discovery rates (FDR) for each metabolite were also calculated [28]. This was done by calculating the p-values for various combinations of mothers and calculating the fraction of p-values that were considered significant (≤ 0.05) over the total number of p-values. These combinations included leaving one mother out at a time, every combination leaving two mothers out at a time, and every combination leaving three mothers out at a time. This produced 1770 p-values for each metabolite from which the FDR was computed.
The area under the curve (AUC) of the receiver operating characteristic (ROC) curve was also calculated for each metabolite. The ROC curve is a plot of false positive rate (FPR) vs. the true positive rate (TPR). The higher the area under the curve is, the better the measurements are at classifying between the two groups of mothers [29].
A test was considered significant if the p-value was less than or equal to 0.05 and the FDR value was less than or equal to 0.1.
Multivariate analysis
While the univariate analyses focused on testing for equal population means or medians of individual metabolites/ratios, this does not answer the question of how important the differences in mean or median are to separate the two groups of mothers. In order to examine the extent of the differences within the recorded observations of the two groups of mothers, Fisher Discriminant Analysis (FDA) was applied [30]. This technique defines a projection direction in the data space such that the squared difference between the centers of the projected observations of both groups over the variances of the projected observations is a maximum. The objective function, J, to compute the projection direction is as follows:
$$ J=\frac{{\left({\overline{t}}_1-{\overline{t}}_2\right)}^2}{s_1^2+{s}_2^2} $$
(1)
Here, \( {\overline{t}}_1=\frac{1}{n_1}{\sum}_{i=1}^{n_1}{t}_{1,i} \) and \( {\overline{t}}_2=\frac{1}{n_2}{\sum}_{i=1}^{n_2}{t}_{2,i} \) are the orthogonally projected means of both groups onto the direction vector and the sample variances of the projected data points are \( {s}_1^2=\frac{1}{n_1-1}{\sum}_{i=1}^{n_1}{\left({t}_{1,i}-{\overline{t}}_1\right)}^2 \) and \( {s}_2^2=\frac{1}{n_2-1}{\sum}_{i=1}^{n_2}{\left({t}_{2,i}-{\overline{t}}_2\right)}^2 \). The orthogonal projection of i-th observation from the second sample, x2, i, is \( {t}_{2,i}={\boldsymbol{x}}_{2,i}^T\boldsymbol{p} \), where p is the unit-length direction vector. Note that the projection coordinate, t2, i, is often referred to as a score. Essentially, FDA produces a projection direction which represents a tradeoff between optimally separating the two groups of mothers and minimizing the spread of the projected data within each group. FDA is used to develop a multivariate model that can be used to classify between the two groups of mothers.
FDA works well with data consisting of real numbers. However, some of the data were discrete in nature such as the information about MTHFR gene mutation. For classification tasks including both continuous and discrete data, logistic regression was used. Logistic regression is similar to linear regression, but the output is a variable that can assume two or more discrete values, i.e. a binomial or multinomial variable. The prediction of a logistic regression model is the probability that a sample belongs to either the ASD-M group or the TD-M group. The group that produces the highest probability is considered the group that the model classified the sample as belonging to [31, 32].
The multivariate analysis made use of both FDA and logistic regression. The data was split into multiple subsets for analysis. These subsets include: (i) the 20 measurements from the FOCM/TS pathways, (ii) the same 20 measurements plus additional nutritional information, (iii) the 20 FOCM/TS metabolites with the additional nutritional information and the MTHFR gene information, and (iv) the 20 FOCM/TS metabolites, the additional nutritional information, the MTHFR gene information and a select number of significant metabolites from the broad metabolomics analysis. The additional nutritional markers included B12, Folate, Ferritin, Methylmalonic acid (MMA), and Vitamin E. The 50 metabolites from the Metabolon dataset included in the analysis produced were selected based upon the 50 highest AUC values from the corresponding ROC curves. These steps reduced the total number of metabolites from 621 to 76 for case iv. All combinations of two through ten variables were analyzed in each subset. FDA was used for subsets i and ii and logistic regression was used for subsets iii and iv. The reason for using two different methods is that FDA was used to ensure consistency in the methodology with prior work [8] while logistics regression was needed for subsets iii and iv because they contained the MTHFR gene information which are binary variables. The reason for analyzing a reduced set of variables for each of the four cases, instead of just investigating the full variable set, is to alleviate some of the concerns related to overfitting of the classification models.
Furthermore, a leave-one-out cross-validation procedure (LOOCV) [32] was used to independently assess classification accuracy. LOOCV removes the first observation, or participant’s data, determining a model using (Eq. 1) based on the n − 1 observations, and then applying this model to the first observation which was left out. This application determines whether this observation is correctly/incorrectly classified as belonging to the ASD-M or TD-M group. Then, the second observation is left out, whilst the first observation is included for determining a second model using (Eq. 1). The second model is then used to decide whether the second observation is correctly classified or misclassified. This procedure is repeated until each observation is left out once allowing the calculation of the overall rate of correctly classified and misclassified observations. To determine whether an observation is correctly or incorrectly classified, the samples describing the ASD-M group were defined as positives and the corresponding samples of the TD-M cohort as negatives. The decision boundary to assign the label “ASD-M” or “TD-M” to a data point was based on a kernel density estimation of the scores (projection coordinates) computed by the FDA model from the positives (ASD-M group). More precisely, the decision boundary is determined for a chosen confidence level (one-sided) such that a score that is less than or equal to this boundary is labeled an ASD-M subject and a score that is larger than this threshold is labeled as a TD-M subject. The confidence level is chosen to reduce the difference between the type I and type II errors.