Diagnosis-informed neuro-subtyping of autism spectrum disorder

Poster No:

1426 

Submission Type:

Abstract Submission 

Authors:

Zehua Chen1, Peilun Song1, Yaping Wang1, Xiujuan Geng2

Institutions:

1Zhengzhou University, Zhengzhou, NA, 2The Chinese University of Hong Kong, Hong Kong, NA

First Author:

Zehua Chen  
Zhengzhou University
Zhengzhou, NA

Co-Author(s):

Peilun Song  
Zhengzhou University
Zhengzhou, NA
Yaping Wang  
Zhengzhou University
Zhengzhou, NA
Xiujuan Geng  
The Chinese University of Hong Kong
Hong Kong, NA

Introduction:

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder with heterogeneous symptoms and neurobiological features, which becomes a major hindrance on accurate diagnosis and treatment [6]. Clustering ASD subjects into subtypes with more consistent neural features in each group has practical implications to overcome these obstacles. However, existing findings of neural-subtyping are preliminary and diverse. Moreover, it is unclear whether the degree of deviation from the normal population can be used for ASD subtyping. We aimed to investigate reliable ASD subtypes by clustering methods utilizing diagnosis labels and functional connectivity (FC) as neural features [4].

Methods:

Datasets used in this study were from ABIDE I & II, consisting of 1877 subjects including 1030 healthy controls and 847 ASD subjects. Data was the preprocessed following the standard processing pipeline. The FC was estimated followed by Fisher transform between pairs of 116 regions defined by AAL. To reduce the effect caused by non-ASD factors, age, sex and site information were regressed out before sending FC values for clustering. We first conducted dimension reduction based on two types of methods. One is an orthogonal extension of projective non-negative matrix factorization (OPNNMF), a data-driven method for extracting biologically interpretable and reproducible feature representations [8]. The other approach is based on depicting FC profiles on a network level, i.e., computing the within- and between-network connectivities instead of ROI-based FC [3]. A total of 78 network-level connectivites (12 within- and 66 between-network FCs) were used. After feature reduction, we performed both semi-supervised, HeterogeneitY through DiscRiminative Analysis (HYDRA), and unsupervised k-means clustering for comparison. When conducting HYDRA, both the input features and diagnostic labels (controls or ASD) were used to detect multiple SVM classifiers for hyperplanes to separate the disease samples from controls, and meanwhile assist in assigning each disease sample to a different cluster. Therefore, a total of four approaches were carried out: OPNNMF feature reduction and HYDRA clustering; OPNNMF and k-means; network-based feature reduction and HYDRA; and network-based reduction and k-means.

Two cluster validity indices Silhouette Coefficient (SC) and Calinski-Harabasz (CH) were used for performance validation [1]. SC calculates the pairwise difference between the inter- and intra-cluster distances [7]. CH is the ratio of all inter-cluster dispersion to the sum of the intra-cluster dispersion [2]. To evaluate the reliability, we conducted clustering 100 times. Under each repetition, 90% of subjects were randomly stratified for clustering. The adjusted rand index (ARI) [5] and label accuracy (ACC) were used to evaluate the reliability across 100 times. ARI quantifies the similarity between different clustering results. ACC measures label consistency of clustering results across each iteration.

Results:

The number of components for OPNNMF was searched from 1 to 1877 by a stepsize of 5. The number of clusters was looped from 2 to 8. Our experimental results show that, the optimal value of the number of components with OPNNMF was 1195 (Fig.1 (a) & (b)). Two clusters show the best performance under all 4 clustering approaches (Fig.1 (c)-(f)). OPNNMF together with HYDRA produced the best performance out of the 4 methods. The average clustering accuracy was above 90% under the semi-supervised method. We further examined the FC matrices under each approach. All methods generated distinct FC patterns for each cluster (Fig.2). Feature reduction with OPNNMF produced similar sub-group FC patterns regardless of the clustering methods.
Supporting Image: fig_1.jpg
Supporting Image: fig2.jpg
 

Conclusions:

With a large cohort of ASD subjects, our results show that diagnosis-informed subtyping method performed better in terms of generating more distinct and reliable subtypes compared to unsupervised methods.

Disorders of the Nervous System:

Neurodevelopmental/ Early Life (eg. ADHD, autism) 2

Modeling and Analysis Methods:

Classification and Predictive Modeling 1
fMRI Connectivity and Network Modeling

Keywords:

Autism
FUNCTIONAL MRI
Machine Learning

1|2Indicates the priority used for review

Provide references using author date format

1. Arbelaitz, O., (2013), ‘An extensive comparative study of cluster validity indices’, Pattern Recognition: The Journal of the Pattern Recognition Society, vol. 46, no.1.
2. Caliński, T., (1974), ‘A dendrite method for cluster analysis’, Communications in Statistics, pp. 1-27.
3. Han, S., et al. (2019), ‘The distinguishing intrinsic brain circuitry in treatment-naieve first episode schizophrenia: Ensemble learning classification’, Neurocomputing, 365 (Nov.6), pp. 44-53 .
4. Hong, S.J., et al. (2019), ‘Atypical functional connectome hierarchy in autism’, Nat Commun. 2019 Mar 4;10(1):1022.
5. Hubert, L., (1985), ‘Comparing partitions’, J. Classif, vol. 2, no. 1, pp. 193-218.
6. Lombardo, M.V., et al. (2019), ‘Big data approaches to decomposing heterogeneity across the autism spectrum’. Mol Psychiatry. 2019 Oct;24(10):1435-1450.
7. Rousseeuw, P.J., (1984), ‘Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis’, Journal of Computational and Applied Mathematics, pp. 53-65
8. Wen, J., et al. (2021), ‘Multi-scale semi-supervised clustering of brain images: deriving disease subtypes’, Medical Image Analysis.