Poster No:
1372
Submission Type:
Abstract Submission
Authors:
JiHoon Kim1, Roshan Rane2,3,4, Kerstin Ritter2,3
Institutions:
1Freie Universität Berlin, Department of Education and Psychology, Berlin, Germany, 2Charité – Universitätsmedizin Berlin, Department of Psychiatry and Psychotherapy, Berlin, Germany, 3Bernstein Center for Computational Neuroscience, Berlin, Germany, 4Einstein Center for Neurosciences Berlin, Berlin, Germany
First Author:
JiHoon Kim
Freie Universität Berlin, Department of Education and Psychology
Berlin, Germany
Co-Author(s):
Roshan Rane
Charité – Universitätsmedizin Berlin, Department of Psychiatry and Psychotherapy|Bernstein Center for Computational Neuroscience|Einstein Center for Neurosciences Berlin
Berlin, Germany|Berlin, Germany|Berlin, Germany
Kerstin Ritter
Charité – Universitätsmedizin Berlin, Department of Psychiatry and Psychotherapy|Bernstein Center for Computational Neuroscience
Berlin, Germany|Berlin, Germany
Introduction:
Vision Transformer (ViT) is gaining attention for predictive modeling of associations between brain structure and phenotype in neuroimaging [1]. Adapting ViT to neuroimaging is challenging due to its data-hungry nature. This nature may bring a risk of overfitting, and poor generalization with limited data [2,3]. Transfer learning is crucial to overcome these challenges [4]. Two in-domain pre-trained models: one for sex classification and the other for age estimation, show promise in neuroimaging [5,6]. However, a systematic comparison of the impact of different kinds of pre-training on different phenotypes in diverse sample sizes is lacking. Therefore, we pre-train two in-domain pre-trained models and conduct a systematic comparison of the models in predicting different phenotypes across different sample sizes.
Methods:
We use T1-weighted brain images from the UK Biobank dataset (N=36,598), subjected to minimal processing: skull stripping, bias correction, and Montreal Neurological Institute (MNI) template registration. Images are downsampled to 96x96x96 voxels. We utilize randomly initialized Vision Transformer (ViT) and 'DenseNet-121' Convolution Neural Network (CNN) models, referred to as Vanilla ViT and Vanilla CNN.
We conduct two-stage experiments consisting of pre-training and fine-tuning stage (Figure 1). In the pre-training stage, we pre-train ViT models for sex classification (ViT w.SEX) and age estimation (ViT w.AGE). In the fine-tuning stage, we perform three binary classification tasks: biological sex, chronological age group (68 – 82 vs. 44 – 61), and high vs. low alcohol use. Four models: ViT w.SEX, ViT w.AGE, Vanilla ViT, and Vanilla CNN are trained per task across sample sizes from 200 to 7,000. We evaluate model performance on a holdout set (N=3,055) using balanced accuracy (BACC). Lastly, we apply Chefer's feature visualization method to obtain ViT classification relevance feature values and overlay normalized mean values of true positive samples onto the MNI templates [7].

Results:
After pre-training, ViT w.SEX achieved BACCs of 98.89% and 98.84%, and ViT w.AGE showed mean absolute errors of 2.94 and 2.88 in validation and holdout sets, respectively. Figure 2 (a) shows the model performance of three prediction tasks on the holdout set after fine-tuning. ViT w.SEX and ViT w.AGE outperformed other models in Sex and age group predictions, respectively. In alcohol consumption prediction, ViT w.SEX performed better than Vanilla ViT in small sample sizes (200 to 1,000). Also, ViT w.SEX reaches its best BACC of 61.53% in training size 7,000.
We observed different feature importance patterns for each model. Within each model, patterns are consistent across varying training sizes. Figure 2 (b) shows the feature visualization of the high alcohol use in training size 7,000. Notably, Vanilla ViT and ViT w.SEX have common features but also ViT w.SEX highlighted brain stem, right cerebellum, left corpus callosum, and left midcingulate cortex regions, setting it apart from Vanilla ViT.

Conclusions:
We show that phenotype prediction strongly depends on the pre-training. The results of sex and age group prediction showed the importance of aligning pre-training tasks with target phenotypes. ViT w.SEX showed competitive results in alcohol consumption prediction, especially in smaller datasets. It may indicate that the prediction is confounded by sex.
We provide feature visualizations of high alcohol use. The literature consistently reported associations between alcohol intake and alterations in grey matter volume of widespread areas such as the cerebellum, frontal, and temporal lobes, as well as various subcortical structures [8, 9, 10]. Our findings echo these observations. ViT w.SEX may outperform Vanilla ViT based on the distinct highlighted features.
We demonstrate that the effectiveness of in-domain pre-training can vary based on factors such as the size of the fine-tuning dataset, and the relationship between the pre-training and target phenotypes.
Modeling and Analysis Methods:
Classification and Predictive Modeling 1
Methods Development 2
Keywords:
Data analysis
Machine Learning
Modeling
STRUCTURAL MRI
Other - Alcohol Consumption
1|2Indicates the priority used for review
Provide references using author date format
[1] Zhao Z (2023), 'Conventional machine learning and deep learning in Alzheimer’s disease diagnosis using neuroimaging: A review', Frontier Computational Neuroscience 17:1038636
[2] Dosovitskiy A (2021), 'An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale', Available at: http://arxiv.org/abs/2010.11929
[3] Li J (2022), 'Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives', Available at: http://arxiv.org/abs/2206.01136
[4] Gani H (2022), 'How to Train Vision Transformer on Small-scale Datasets?', Available at: http://arxiv.org/abs/2210.07240
[5] Cole JH (2017), 'Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker', Neuroimage 163:115–124
[6] Lu B (2022), 'A practical Alzheimer’s disease classifier via brain imaging-based deep learning on 85,721 samples', Journal of Big Data 9:101
[7] Chefer H (2020), 'Transformer Interpretability Beyond Attention Visualization', Available at: http://arxiv.org/abs/2012.09838
[8] Guggenmos M (2017), 'Quantitative neurobiological evidence for accelerated brain aging in alcohol dependence', Translational Psychiatry 7:1–7
[9] Nutt D (2021), 'Alcohol and the Brain', Nutrients 13:3938
[10] Topiwala A (2022), 'Alcohol consumption and MRI markers of brain structure and function: Cohort study of 25,378 UK Biobank participants', Neuroimage Clinical 35:103066