Machine Learning Approach to Predict Childhood Neurodevelopmental Outcomes in the FinnBrain Birth Cohort Study : Importance of Serum Biomarkers
Lund, Riikka (2024-05-19)
Machine Learning Approach to Predict Childhood Neurodevelopmental Outcomes in the FinnBrain Birth Cohort Study : Importance of Serum Biomarkers
Lund, Riikka
(19.05.2024)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2024060342681
https://urn.fi/URN:NBN:fi-fe2024060342681
Tiivistelmä
The aim of this study was to determine whether serum biomarkers predict behavioural and socio-emotional problems of the children participating in the FinnBrain Birth Cohort study. The biomarkers were measured from maternal serum during pregnancy and children’s own serum at five year follow-up. In addition, the aim was to identify other factors that may co-influence the outcomes. The outcomes of interest included Brief Infant Toddler Social Emotional Assessment Problem and Competence scores from two year follow-up and Strengths and Difficulties total difficulties scores from four and five year follow-ups.
The original data contained 6051 features and 1642 observations, including a panel of 13 biomarkers. After exploration and cleaning, the data was splitted into training and test datasets. The machine learning model was developed using training data and five-fold grid search cross-validation approach. The key steps included comparison and tuning of regressors and classifiers as well as techniques to mitigate class imbalance. The generalisation performances were evaluated in the hold-out test dataset and features predicting the outcomes were identified using permutation and SHAP techniques.
Acceptable performance levels were achieved using XGBoost Classifier and weighted target features for the models predicting total difficulties outcomes, however, not for Problem and Competence outcomes. The generalisation performances of the models on the holdout test data were moderate (ROC-AUC 0.63-0.66). Gestational TSH levels were among the most important features predicting total difficulties at both four and five year follow-ups. In addition, several other biomarkers, including LDL, APOA1, Trigly, FT4, Glucose HK2 and insulin, predicted the five year outcome with weaker influence. Furthermore, numerous other protective and risk factors were identified. Children’s own biomarkers were not associated with the total difficulties. The results suggest that gestational imbalance in thyroid, lipid and glucose metabolism in combination with numerous other prenatal and early life factors influence the total difficulties outcome at five year follow-up.
This study is important in advancing our understanding of the early life factors associated with emotional and behavioural problems in the childhood and provide predictive markers for early detection of individuals at risk.
The original data contained 6051 features and 1642 observations, including a panel of 13 biomarkers. After exploration and cleaning, the data was splitted into training and test datasets. The machine learning model was developed using training data and five-fold grid search cross-validation approach. The key steps included comparison and tuning of regressors and classifiers as well as techniques to mitigate class imbalance. The generalisation performances were evaluated in the hold-out test dataset and features predicting the outcomes were identified using permutation and SHAP techniques.
Acceptable performance levels were achieved using XGBoost Classifier and weighted target features for the models predicting total difficulties outcomes, however, not for Problem and Competence outcomes. The generalisation performances of the models on the holdout test data were moderate (ROC-AUC 0.63-0.66). Gestational TSH levels were among the most important features predicting total difficulties at both four and five year follow-ups. In addition, several other biomarkers, including LDL, APOA1, Trigly, FT4, Glucose HK2 and insulin, predicted the five year outcome with weaker influence. Furthermore, numerous other protective and risk factors were identified. Children’s own biomarkers were not associated with the total difficulties. The results suggest that gestational imbalance in thyroid, lipid and glucose metabolism in combination with numerous other prenatal and early life factors influence the total difficulties outcome at five year follow-up.
This study is important in advancing our understanding of the early life factors associated with emotional and behavioural problems in the childhood and provide predictive markers for early detection of individuals at risk.