Poster No:
1414
Submission Type:
Abstract Submission
Authors:
Maitrei Kohli1, Pedro Da Costa2, Robert Leech2, James Cole1
Institutions:
1Centre of Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom, 2Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, United Kingdom
First Author:
Maitrei Kohli
Centre of Medical Image Computing, Department of Computer Science, University College London
London, United Kingdom
Co-Author(s):
Pedro Da Costa
Institute of Psychiatry, Psychology & Neuroscience, King's College London
London, United Kingdom
Robert Leech
Institute of Psychiatry, Psychology & Neuroscience, King's College London
London, United Kingdom
James Cole, PhD
Centre of Medical Image Computing, Department of Computer Science, University College London
London, United Kingdom
Introduction:
Classification of individual-level disease stage in Alzheimer's disease (AD) is key for stratification of patients in clinical trials. Machine learning (ML) methods can make patient-specific predictions of disease stage using neuroimaging data. While the goals of these translational models are clear, the details of the analytic processing and modelling are not. This has caused conflicting results, with inconsistent methods, arbitrary processing steps and false positives due to small sample sizes [1][2]. We propose a novel neuroimaging analysis approach based on automated machine learning (autoML) for predictive classification of individual-level AD stage. We hypothesize that our AutoML-based approach will remove experimenter bias, eliminate arbitrary analytic choices, and improve predictive accuracy for clinically relevant outcomes.
Methods:
AutoML-multiverse framework
Our innovative AutoML-Multiverse framework, building on prior work [3, 4] efficiently searches a prediction space formed from thousands of ML pipelines. It distills high-dimensional data to a low-dimension configuration space that can be efficiently sampled using Bayesian Optimisation. Instead of identifying a single best pipeline, this approach constructs fully data-driven ensembles, combining complementary information from different pipelines to improve predictions. For our initial analyses, we leveraged a configuration space crafted from 20,000 ML pipeline instances across 64 OpenML datasets, as per [3], and initiated our test dataset's interaction with the pre-existing autoML-system with a warm start.
Objective and Analysis
Our primary aim was stratification across three diagnostic categories: (i) Cognitively normal (CN) vs. Alzheimer's disease (AD); (ii) AD vs. mild cognitive impairment (MCI) vs. CN; and (iii) distinguishing stable MCI (sMCI) from fast progressive (pMCI). We compared our autoML framework against nine separate ML models and a stacked ensemble of these models. Each task was run twice: first with only neuroimaging features and secondly with only age, sex, and MMSE scores.
Data
We used volumetric data extracted from T1-weighted structural MRI (sMRI) scans and clinical data sourced from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Volumetric features were ventricles, hippocampus, whole-brain, entorhinal, fusiform, and mid-temporal regions. Data were ICV corrected and standard scaled. We had n = 606 (303 CN and 303 AD) for task 1, n = 1930 (760 CN, 867 MCI, and 303 AD) for task 2, and n = 369 (224 sMCI and 145 pMCI) for task 3.
Results:
The results are depicted in Figure 1 using the holdout test dataset. Standalone ML models demonstrated limited predictive capability, while the stacked ensemble model exhibits a marginal performance improvement compared to individual models. The autoML ensemble models consistently performed with higher accuracy across the tasks. Additionally, our findings highlight the utility of neuroimaging (sMRI) data, particularly in differentiating sMCI from pMCI, with the autoML ensemble 73% accuracy in discriminating between these categories. This emphasizes the clinical relevance of leveraging sMRI-based methodologies, providing valuable insights for precise stratification in clinical settings.
Conclusions:
The preliminary results yield three insights: firstly, autoML ensemble exhibit superior predictive capabilities compared to individual pipelines, highlighting their efficacy in predictive modelling. Secondly, the autoML-multiverse approach, assembling ensembles of pipelines in a data-driven manner, circumvents biases and arbitrary decision-making. Thirdly, the integration of neuroimaging with autoML emerges as a potent and clinically relevant tool, particularly evident in tasks such as distinguishing slow and fast progressors, suggesting potential applicability in future clinical trials. These initial results are promising, encouraging rigorous further analysis and experimentation to affirm their robustness and generalisability.
Disorders of the Nervous System:
Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s) 2
Modeling and Analysis Methods:
Classification and Predictive Modeling 1
Methods Development
Keywords:
Degenerative Disease
Machine Learning
Multivariate
STRUCTURAL MRI
1|2Indicates the priority used for review
Provide references using author date format
1. Botvinik-Nezer, R. (2020), ‘Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582, 84–88.
2. Marek, S. (2022), ‘Reproducible brain-wide association studies require thousands of individuals’. Nature. 603(7902):654–60.
3. Da Costa, P.F. (2023), ‘BUILDING A DATA-DRIVEN CONFIGURATION SPACE FOR AUTOMATED MACHINE LEARNING’. A PREPRINT.
4. Dafflon J, F. (2022), ‘A guided multiverse study of neuroimaging analyses’. Nat Commun 13, 3758. https://doi.org/10.1038/s41467-022-31347-8