Brain-based classification of 17 diagnostic groups in the UK Biobank

Poster No:

1416 

Submission Type:

Abstract Submission 

Authors:

Ty Easley1, Lexi (Xiaoke) Luo1, Kayla Hannon1, Petra Lenzini1, Janine Bijsterbosch1

Institutions:

1Washington University in Saint Louis, Saint Louis, MO

First Author:

Ty Easley  
Washington University in Saint Louis
Saint Louis, MO

Co-Author(s):

Lexi (Xiaoke) Luo  
Washington University in Saint Louis
Saint Louis, MO
Kayla Hannon  
Washington University in Saint Louis
Saint Louis, MO
Petra Lenzini  
Washington University in Saint Louis
Saint Louis, MO
Janine Bijsterbosch  
Washington University in Saint Louis
Saint Louis, MO

Introduction:

Many studies have trained machine learning classifiers on features derived from non-invasive structural and/or functional neuroimaging data to differentiate between cases and healthy controls in a range of diseases (1). However, the vast majority of published diagnostic classification efforts are single-disease studies performed in disease-specific cohorts. As such, a comprehensive analysis across diseases in population data is lacking. This work uses the UK Biobank (UKB) dataset to systematically compare brain-based classification models across 17 different ICD-10 diagnostic groups.

Methods:

The 17 ICD-10 diagnostic groups were derived from Chapter V (mental and behavioral disorders) and Chapter VI (diseases of the nervous system) of the ICD-10 at two different levels of the hierarchy depending on sample size. Only ICD-10 diagnostic groups with N≥125 case participants were included, resulting in 17 diagnostic groups (see x-axis in Fig. 1). In total, this work included N=5,861 unique cases, each with healthy controls matched on sex, age, and resting state head motion out of a pool of UKB individuals with no ICD-10 labels in either Chapter V or VI.

Random forest classification models with rigorous shuffle-splits (2) were adopted to predict case versus control labels separately for each diagnostic group, while estimating stability as well as accuracy of case-control classifications. In addition to the separate diagnosis-specific case-control classifications, we performed a multiclass classification. Multiple separate classification models were trained using 20 different feature sets comprising either neuroimaging or socio-demographic features. Neuroimaging features were derived from both structural (surface and volume) and resting-state functional data (network amplitudes and functional connectivity). Diagnostic classification accuracies were benchmarked against age classification (oldest vs. youngest) from the same feature sets and against additional classifier types (K-nearest neighbors and linear support vector machine). Statistical significance of classification was computed as the empirical probability of classifying above chance accuracy (0.5).

Results:

Structural neuroimaging data features failed to classify 16 out of 17 diagnostic groups significantly more accurately than chance (p ≳ 0.25); only the Demyelinating diseases ICD-10 diagnostic group (G35-37) was classified significantly above chance (=0.63, p=0.013, n=248). After incorporating functional neuroimaging and sociodemographic feature sets into classification, only the Depression ICD-10 diagnostic group (F32) was classified significantly above chance (=0.58, p=3.5e-3, n=2692) after correcting for multiple comparisons (Fig. 1). Both sociodemographic and functional neuroimaging features significantly classified patients in the Depression (F32) group, but prediction accuracies still remained low. Multiclass classification accuracies were low for all but the largest class sizes (Fig. 2a), and true positive rate was nearly deterministically predicted by class size (R2(pseudo)=0.989, p=6.14e-12; Fig. 2b). As a contrasting benchmark, age classification showed high accuracy in both large (=0.94, p=1.4e-66, n=2676) and small (=0.90, p=1.3e-15, n=246) sample sizes.
Supporting Image: Figure_2.png
   ·Figure 2. (a) Confusion matrix of the multiclass classification for all 17 ICD-10 diagnostic groups along with healthy controls. (b) Plot of sensitivity (true positive rate) vs proportional group size
 

Conclusions:

We showed that most ICD-10 diagnostic groups were not classified above chance from neuroimaging or sociodemographic features in the UK Biobank. In particular, our findings shed light on the limited validity of the ICD-10 diagnostic ontology. Consistent with other research pointing to the limited reliability of diagnostic coding systems including the ICD-10 (3) and DSM-V (4, 5), we demonstrate that these labels are suboptimal clinical targets for machine learning models and may impede meaningful biomarker discovery (6, 7). Our findings highlight the importance of sample size and effect size of drivers of diagnostic classification accuracy, and provide an important benchmark for future work in the UKB and beyond.

Disorders of the Nervous System:

Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s)
Psychiatric (eg. Depression, Anxiety, Schizophrenia) 2

Education, History and Social Aspects of Brain Imaging:

Education, History and Social Aspects of Brain Imaging

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Keywords:

DISORDERS
Machine Learning
Psychiatric Disorders

1|2Indicates the priority used for review
Supporting Image: Figure_1.png
   ·Figure 1. Summary of classification accuracy distributions across all 17 diagnostic groups for structural, functional, and sociodemographic classification features.
 

Provide references using author date format

1. B. Rashid, V. Calhoun, Towards a brain-based predictome of mental illness. Hum. Brain Mapp. 41, 3468–3535 (2020).
2. K. Dadi, et al., Population modeling with machine learning can enhance measures of mental health. Gigascience 10 (2021).
3. J. Stausberg, N. Lehmann, D. Kaczmarek, M. Stein, Reliability of diagnoses coding with ICD-10. Int. J. Med. Inform. 77, 50–57 (2008).
4. S. A. Shankman, et al., Reliability and validity of severity dimensions of psychopathology assessed using the Structured Clinical Interview for DSM-5 (SCID). Int. J. Methods Psychiatr. Res. 27 (2018).
5. D. A. Regier, et al., DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. Am. J. Psychiatry 170, 59–70 (2013).
6. A. Nikolaidis, et al., Suboptimal phenotypic reliability impedes reproducible human neuroscience. bioRxiv, 2022.07.22.501193 (2022).
7. M. Gell, et al., The Burden of Reliability: How Measurement Noise Limits Brain-Behaviour Predictions. bioRxiv, 2023.02.09.527898 (2023).