Poster No:
1994
Submission Type:
Abstract Submission
Authors:
Hoda Kalabizadeh1, Ludovica Griffanti2, Natalie Voets3, Grace Gillis2, Clare Mackay2, Ana Namburete1, Nicola Dinsdale1
Institutions:
1Department of Computer Science, University of Oxford, UK, 2Department of Psychiatry, University of Oxford, UK, 3Nuffield Department of Clinical Neurosciences, University of Oxford, UK
First Author:
Co-Author(s):
Natalie Voets
Nuffield Department of Clinical Neurosciences, University of Oxford
UK
Grace Gillis
Department of Psychiatry, University of Oxford
UK
Clare Mackay
Department of Psychiatry, University of Oxford
UK
Introduction:
Hippocampus volume (HV), typically measured via MRI segmentation, is a well-established biomarker for Alzheimer's disease (AD) [1]. Most automated segmentation methods have been developed and validated for research datasets. However, domain shifts [2] between clinical and research datasets often significantly degrade model performance. Due to differences in imaging acquisitions (scanner shift) or disease severity (population shift), current segmentation methods are often unsuitable for clinical populations. We investigated the performance of popular segmentation methods on a clinical AD dataset. Fig 1a shows a schematic of the current data setting.
Methods:
For the clinical dataset we used a sample of 29 patients from the Oxford Brain Health Clinic (BHC) [3] with labels only used for evaluation. The HarP [4] dataset was used as a research dataset, consisting of 133 labelled MRI volumes. We explored the impact of increasing the number of inputs by splitting images into hemispheres, and also of mirroring each hemisphere. Both datasets included cognitively unimpaired individuals, Mild Cognitive Impairment (MCI) patients and dementia patients. Fig 1b compares the true left HV between HarP and BHC populations.
We tested several publicly available out-of-the-box (OOB) tools for automatic segmentation: FSL FIRST [5], FreeSurfer [6] and SynthSeg [7] on the clinical dataset. For a deep learning (DL) baseline, we used a UNet [8] trained on the HarP data as our labelled reference. To investigate generalisation to the unlabelled clinical data, we expanded the UNet with data augmentation and unsupervised domain adaptation (UDA) approaches. Augmentation is an effective technique for improving a model generalisability [9] by increasing robustness to likely data variations. We explored both standard (affine transforms, flips, noise, intensity changes) and MRI-specific augmentation (motion, bias field). We implemented a UDA model [10] by including a domain classifier to enable adversarial training, which aims to do the main task while learning domain invariant features. The domain classifier does not require segmentation masks and was trained using unlabelled clinical data.
Results:
Fig 1c shows the Dice scores (DSC) for the segmentation methods, tested on the HarP and BHC datasets. The increase in training data size improved most models. OOB models, which are commonly trained and/or validated against research populations, all struggled with our clinical population. As shown on the violin plot in Fig 1d, FreeSurfer and FIRST, had instances of failing (DSC= 0). The UNet models with augmentations improved performance compared to OOB methods. However, the performance on BHC data was worse than on HarP data, with some particularly low-performing outliers. UDA was comparable to the augmentation methods for most participants, however, the UDA method led to particularly low DSC for certain individuals, possibly due to correlation between disease state and scanner leading to the removal of important information during DA. The UNet with basic augmentations was the best-performing model and used for further analysis. Fig 1e shows a positive correlation (r=0.70, p=1.31×10-9) between true HV and achieved DSC, showing that larger volumes had higher DSC. This may be expected as individuals in HarP are on average younger and healthier than BHC, thus larger BHC hippocampi are more similar to those in the HarP training data. Performance was worse for MCI/dementia patients. Fig 2 shows the brains corresponding to (a)highest and (b)lowest DSC; brains in the latter have visibly larger ventricular atrophy.
Conclusions:
Our findings highlighted that domain shifts between research and clinical data extend beyond acquisition differences. Despite model generalisation or UDA techniques, population shifts due to disease severity and extent of atrophy may also cause challenges for translating research tools to clinical practice and should be considered in future model development.
Disorders of the Nervous System:
Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s) 2
Modeling and Analysis Methods:
Segmentation and Parcellation 1
Neuroanatomy, Physiology, Metabolism and Neurotransmission:
Neuroanatomy Other
Novel Imaging Acquisition Methods:
Anatomical MRI
Keywords:
Aging
Degenerative Disease
Machine Learning
MRI
Segmentation
STRUCTURAL MRI
1|2Indicates the priority used for review
Provide references using author date format
[1] Henry, M. S. (2013), 'The development of effective biomarkers for Alzheimer’s disease: a review', International Journal of Geriatric Psychiatry 28, 331–340.
[2] Che, T. (2021), 'Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models', Proceedings of the AAAI Conference on Artificial Intelligence 35, 7002–7010 .
[3] O’Donoghue, M. C. (2023), 'Oxford brain health clinic: protocol and research database', BMJ Open 13, e067808.
[4] Boccardi, M. (2015), 'Training labels for hippocampal segmentation based on the EADC-ADNI harmonized hippocampal protocol', Alzheimer’s & Dementia 11, 175–183 .
[5] Patenaude, B. (2011), 'A Bayesian model of shape and appearance for subcortical brain segmentation', NeuroImage 56, 907–922.
[6] Fischl, B. (2002). 'Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain', Neuron 33, 341–355.
[7] Billot, B. (2023), 'SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining', Medical Image Analysis 86, 102789.
[8] Ronneberger, O. (2015), 'U-Net: Convolutional Networks for Biomedical Image Segmentation', arXiv.org, https://arxiv.org/abs/1505.04597.
[9] Zhang, C. (2017) 'Understanding deep learning requires rethinking generalization', arXiv.org, https://arxiv.org/abs/1611.03530 .
[10] Dinsdale (2021), 'Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal', NeuroImage 228, 117689.