Poster No:
1462
Submission Type:
Abstract Submission
Authors:
Matthew Rosenblatt1, Link Tejavibulya1, Chris Camp1, Rongtao Jiang2, Margaret Westwater2, Stephanie Noble3, Dustin Scheinost1
Institutions:
1Yale University, New Haven, CT, 2Yale School of Medicine, New Haven, CT, 3Northeastern University, Boston, MA
First Author:
Co-Author(s):
Introduction:
Identifying reproducible and generalizable brain-phenotype associations is a central goal of neuroimaging. Consistent with this goal, prediction frameworks evaluate brain-phenotype models in unseen data. Most prediction studies train and evaluate a model in the same dataset. However, external validation, or the evaluation of a model in an external dataset, provides a better assessment of robustness and generalizability (1), and it may improve reproducibility. Yet, the statistical power of such studies has not been investigated. Here, we ran over 60 million simulations across several datasets, phenotypes, and sample sizes to better understand how the sizes of the training and external datasets affect statistical power.
Methods:
We used resting-state fMRI data from the Adolescent Brain Cognitive Development (ABCD) Study (2) (N=7822-7969), the Healthy Brain Network (HBN) Dataset (3) (N=1024-1201), the Human Connectome Project Development (HCPD) Dataset (4) (N=424-605), and the Philadelphia Neurodevelopmental Cohort (PNC) Dataset (5,6) (N=1119-1126). Resting-state functional connectomes were formed using the Shen 268 atlas (7).
We performed external validation, where a model was developed in one dataset and applied to the other three datasets. Ridge regression models (8) with 1% feature selection were trained to predict age, attention problems, and matrix reasoning from functional connectivity. We subsampled the training and external datasets at various sample sizes (Figure 1) to determine how sample size affects external validation performance (Pearson's r).
We defined the "ground truth" prediction performance as the performance when using the full training and external datasets. The fraction of significant simulation results was calculated. This fraction was considered power for models with a significant ground truth effect and false positive rate for models with an insignificant ground truth effect (Figure 1). Among the significant prediction results, effect size inflation was calculated as the difference between the observed performance and the ground truth performance (Figure 2). We further compared our simulation results to the median sample sizes of external validation studies in the field (training sample: n=129; external sample: n=108) (1).
Results:
Increasing the external sample size increased the power consistent with theoretical curves, and decreasing the size of the training dataset negatively offset the power curve (Figure 1). For sample sizes similar to the median in the field, the power ranged from 99.11-100.00% for age, 5.47-8.35% for attention problems, and 5.24-72.74% for matrix reasoning. For insignificant ground truth effects, the false positive rate was highest for large external samples and small training samples.
Effect size inflation was greatest in weaker predictions and smallest in strong predictions, such as age (Figure 2). For the weakest predictive models, the training dataset size made little difference in effect size inflation, likely because effect size inflation is a consequence of low power based on the external sample size. For stronger models (e.g., age), we saw a greater effect of training size. There was little to no inflation, but smaller training sizes produced worse predictions. For sample sizes comparable to the median in the field, the median inflation rates ranged from Δr of -0.12 to -0.05 for age, 0.10 to 0.20 for attention problems, and -0.17 to 0.21 for matrix reasoning, where negative inflation means deflation.


Conclusions:
For attention problems and matrix reasoning, typical sample sizes for external validation in the field are underpowered. Relatedly, due to low power and publication bias, effect sizes may be overestimated for certain phenotypes. External validation is expected to become more widespread as the field confronts reproducibility challenges (9), and this work provides a starting point for understanding the sample sizes needed to power external validation studies adequately.
Modeling and Analysis Methods:
Classification and Predictive Modeling 1
Connectivity (eg. functional, effective, structural) 2
Methods Development
Multivariate Approaches
Keywords:
FUNCTIONAL MRI
Machine Learning
Multivariate
Statistical Methods
Other - reproducibility; external validation; power
1|2Indicates the priority used for review
Provide references using author date format
1. Yeung, A. W. K., More, S., Wu, J. & Eickhoff, S. B (2022). Reporting details of neuroimaging studies on individual traits prediction: A literature survey. Neuroimage 256, 119275.
2. Casey, B. J. et al (2018). The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54.
3. Alexander, L. M. et al (2017). An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4, 170181.
4. Somerville, L. H. et al (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5-21 year olds. Neuroimage 183, 456–468.
5. Satterthwaite, T. D. et al (2014). Neuroimaging of the Philadelphia neurodevelopmental cohort. Neuroimage 86, 544–553.
6. Satterthwaite, T. D. et al (2016). The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. Neuroimage 124, 1115–1119.
7. Shen, X., Tokoglu, F., Papademetris, X. & Constable, R. T (2013). Groupwise whole-brain parcellation from resting-state fMRI data for network node identification. Neuroimage 82, 403–415.
8. Pedregosa, F. et al (2011). Scikit-learn: Machine learning in Python. The Journal of machine Learning research 12, 2825–2830.
9. Marek, S. et al (2022). Reproducible brain-wide association studies require thousands of individuals. Nature 605, E11.