Evaluating Deep Learning Hippocampal Segmentation Pipelines for Alzheimer’s Disease

Poster No:

163 

Submission Type:

Abstract Submission 

Authors:

Jiongqi Qu1,2, Sophie Martin1,2, James Cole1,2

Institutions:

1Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom, 2Dementia Research Centre, Queen Square Institute of Neurology, University College London, London, United Kingdom

First Author:

Jiongqi Qu, MRes  
Centre for Medical Image Computing, Department of Computer Science, University College London|Dementia Research Centre, Queen Square Institute of Neurology, University College London
London, United Kingdom|London, United Kingdom

Co-Author(s):

Sophie Martin, MRes  
Centre for Medical Image Computing, Department of Computer Science, University College London|Dementia Research Centre, Queen Square Institute of Neurology, University College London
London, United Kingdom|London, United Kingdom
James Cole, PhD  
Centre for Medical Image Computing, Department of Computer Science, University College London|Dementia Research Centre, Queen Square Institute of Neurology, University College London
London, United Kingdom|London, United Kingdom

Introduction:

Deep learning has recently shown considerable promise at hippocampal segmentation from structural MRI. However, most studies focus primarily on segmentation accuracy, and overlook generalisability which is key for the clinical deployment of automated methods. Here, we aimed to evaluate 1) sensitivity to Alzheimer's disease (AD): by assessing group differences in hippocampal volume between patients with AD and cognitively normal (CN) people, and 2) reliability: by assessing intra-patient agreement of hippocampal volumes. We evaluated three "off-the-shelf" deep learning approaches (FastSurfer [Henschel et al. 2020], SynthSeg [Billot et al. 2023], nnUNet [Isensee et al. 2021)], as well as an nnUNet trained from scratch and benchmarked these models against an atlas-based method (FreeSurfer [Fischl 2012]).

Methods:

To assess sensitivity, 816 scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) [Jack Jr et al. 2008] and 1276 scans from the National Alzheimer's Coordinating Center (NACC) [Beekly et al. 2007] were used, with age- and sex-matched groups. To assess reliability, we used BNU1, HNU1 and IPCAS1 from the Consortium for Reliability and Reproducibility (CoRR) [Zuo et al. 2014] database, which contains 2 repeat scans (n=48), 10 repeat scans (n=28) and 2 repeat scans (n=29) correspondingly. Among the pipelines, the original nnUNet required sub-sectioned brain data, which had unclear pre-processing steps and involved human fine-tuning. It was also trained using a mixture of healthy and non-affective psychotic disorder patients' data. To avoid under-evaluating the pipeline's performance due to these issues, an nnUNet was retrained using 366 scans from the Open Access Series of Imaging Studies (OASIS) [LaMontagne et al. 2019] data.

Results:

For sensitivity to Alzheimer's, all pipelines achieved large effect sizes (Cohen's d [Cohen 2013] > 0.5; Figure 1a). The effect sizes of the original nnUNet were substantially lower than the other pipelines. After retraining, nnUNet reached comparable performance with FreeSurfer. Generally, group-different effects were lower in NACC participants compared to ADNI, though the Cohen's d values shared a similar ordering across the two datasets. FastSurfer returned absolute volumes that were most similar to FreeSurfer (see Figure 1b for the hippocampal volume distributions in the AD and CN groups). The original nnUNet often returned lower hippocampal volumes, while SynthSeg and the retrained nnUNet tended to overestimate the volumes.

For test-retest reliability (Figure 2), the BNU1 results indicated that the four deep learning pipelines had better segmentation stability than FreeSurfer with higher median values and narrower and less overlapped confidence intervals. However, there were no significant differences between them. For the HNU1 data, both nnUNets produced unstable results for left and right hippocampal segmentation. However, their intra-class correlation coefficient (ICC(3,1)) [Weir 2005] values were still much higher than FreeSurfer (>0.94). In the IPCAS1 data, the original nnUNet was less stable and its confidence intervals overlapped with FreeSurfer by >60%.
Supporting Image: fig1.png
Supporting Image: fig2.png
 

Conclusions:

We found that deep learning hippocampal segmentation methods can achieve comparable or better results than FreeSurfer, in terms of sensitivity to AD and test-retest reliability. These findings could help guide study design considerations or sample size calculations for research into the hippocampus in AD.

Future research could evaluate reducing similarity constraints and allow more freedom in variation during training (especially as a ground truth hippocampal segmentation does not exist), data augmentation, increasing the proportion of AD data and minimising the amount of preprocessing require.

Disorders of the Nervous System:

Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s) 1

Modeling and Analysis Methods:

Segmentation and Parcellation 2

Keywords:

Aging
Computing
Data analysis
Degenerative Disease
Machine Learning
MRI
Segmentation

1|2Indicates the priority used for review

Provide references using author date format

Beekly, Duane L et al. (2007). “The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set”. In: Alzheimer Disease & Associated Disorders 21.3, pp. 249–258.

Billot, Benjamin et al. (2023). “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining”. In: Medical image analysis 86, p. 102789.

Cohen, Jacob (2013). Statistical power analysis for the behavioral sciences. Academic press.

Fischl, Bruce (2012). “FreeSurfer”. In: Neuroimage 62.2, pp. 774–781.

Henschel, Leonie et al. (2020). “Fastsurfer-a fast and accurate deep learning based neuroimaging pipeline”. In: NeuroImage 219, p. 117012.

Isensee, Fabian et al. (2021). “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation”. In: Nature methods 18.2, pp. 203–211.

Jack Jr, Clifford R et al. (2008). “The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods”. In: Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine 27.4, pp. 685–691.

LaMontagne, Pamela J et al. (2019). “OASIS-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease”. In: MedRxiv, pp. 2019–12.

Weir, Joseph P (2005). “Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM”. In: The Journal of Strength & Conditioning Research 19.1, pp. 231–240.

Zuo, Xi-Nian et al. (2014). “An open science resource for establishing reliability and reproducibility in functional connectomics”. In: Scientific data 1.1, pp. 1–13.