Prediction of Children’s Subjective Well-Being from Physical Activity and Sports Participation Using Machine Learning Techniques: Evidence from a Multinational Study
Date Issued
2025
Author(s)
Souza-Lima, Josivaldo De
Ferrari, Gérson Luis De Moraes
Yánez-Sepúlveda, Rodrigo Alejandro
Giakoni-Ramírez, Frano
Muñoz-Strale, Catalina
Alarcón-Aguilar, Javiera
Parra-Saldías, Maribel
Duclos-Bastias, Daniel Michel
Godoy-Cumillaf, Andrés Esteban Roberto
Merellano-Navarro, Eugenio
DOI
https://doi.org/10.3390/children12081083
Abstract
Highlights: What are the main findings? Machine learning models, particularly XGBoost and LightGBM, predict children’s subjective well-being with up to 50% explained variance, surpassing traditional regression. Sports participation, including exercise frequency, emerges as a key predictor, with linear benefits observed across diverse global samples. What is the implication of the main finding? These results support the development of targeted sports programs to enhance child well-being, leveraging advanced predictive tools. The findings advocate for integrating physical literacy into educational policies to address global inactivity trends in youth. Background/Objectives: Traditional models like ordinary least squares (OLS) struggle to capture non-linear relationships in children’s subjective well-being (SWB), which is associated with physical activity. This study evaluated machine learning (ML) for predicting SWB, focusing on sports participation, and explored theoretical prediction limits using a global dataset. It addresses a gap in understanding complex patterns across diverse cultural contexts. Methods: We analyzed 128,184 records from the ISCWeB survey (ages 6–14, 35 countries), with self-reported data on sports frequency, emotional states, and family support. To ensure cross-country generalizability, we used GroupKFold CV (grouped by country) and leave-one-country-out (LOCO) validation, yielding mean R2 = 0.45 ± 0.05, confirming robustness beyond cultural patterns, SHAP for interpretability, and bootstrapping for error estimation. No pre-registration was required for this secondary analysis. Results: XGBoost and LightGBM outperformed OLS, achieving R2 up to 0.504 in restricted datasets (sensitivity excluding affective leakage: R2 = 0.35), with sports-related variables (e.g., exercise frequency) associated positively with SWB predictions (SHAP values: +0.15–0.25; incremental ΔR2 = 0.06 over demographics/family/school base). Using test–retest reliability from literature (r = 0.74), the estimated irreducible RMSE reached 0.941; XGBoost achieved RMSE = 1.323, approaching the predictability bound with 68.1% of explainable variance captured (after noise adjustment). Partial dependence plots showed linear associations with exercise without satiation and slight age decline. Conclusions: ML improves SWB prediction in children, highlighting associations with sports participation, and approaches predictable variance bounds. These findings suggest potential for data-driven tools to identify patterns, such as through physical literacy pathways, informing physical activity interventions. However, longitudinal studies are needed to explore causality and address cultural biases in self-reports. © 2025 Elsevier B.V., All rights reserved.


