Compressed representation of brain genetic transcription

Poster No:

884 

Submission Type:

Abstract Submission 

Authors:

James Ruffle1, Henry Watkins1, Robert Gray1, Harpreet Hyare1, Michel Thiebaut de Schotten2, Parashkev Nachev1

Institutions:

1UCL Queen Square Institute of Neurology, London, UK, 2Groupe d’Imagerie Neurofonctionnelle, Institut des Maladies Neurodégénératives- UMR 5293, CNRS, CEA, Bordeaux, France

First Author:

James Ruffle  
UCL Queen Square Institute of Neurology
London, UK

Co-Author(s):

Henry Watkins  
UCL Queen Square Institute of Neurology
London, UK
Robert Gray  
UCL Queen Square Institute of Neurology
London, UK
Harpreet Hyare  
UCL Queen Square Institute of Neurology
London, UK
Michel Thiebaut de Schotten  
Groupe d’Imagerie Neurofonctionnelle, Institut des Maladies Neurodégénératives- UMR 5293, CNRS, CEA
Bordeaux, France
Parashkev Nachev  
UCL Queen Square Institute of Neurology
London, UK

Introduction:

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios.

Methods:

Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets.

Results:

Qualitatively, PCA, kPCA, and NMF yielded less structured representations than t-SNE, UMAP, or the auto-encoder. Projection of each component into brain anatomical space revealed patterns varying in their anatomical coherence. PCA, kPCA, and NMF broadly differentiated between cerebellum and the rest of the brain in the first component, and (weakly for NMF) between surface and deeper regions in the second, without any regional specificity. T-SNE highlighted a dorsoventral gradient across the whole brain in the first component and a rostrocaudal one in the second. UMAP yielded a more finely granular structure, but exhibited abrupt regional variations of doubtful anatomical fidelity. The auto-encoder representation captured multiple scales of spatial organisation in an anatomically plausible manner, distributed across the two components. Testing on held-out data, we evaluated the models' ability to reconstruct the source from representations of varying dimensionality and input image resolution. For all input resolutions and representational dimensionalities, the auto-encoder achieved the best root-mean- squared-error (RMSE) (ANOVA p<0.0001) (Figure 1). Inspection of each 2D representation annotated by each feature revealed varying degrees of qualitative coherence (Figure 2). The most expressive representations-UMAP, t-SNE, and auto-encoding-yielded the most structured apparent relationships, with auto-encoding in particular revealing multiple scales of related organization. The auto-encoder achieved the best average RMSE and R2 on the held-out test set across all experiments (mean RMSE 0.1295, mean R2 0.3563). A one-way ANOVA found a significant difference in model performance across representational methods (p<0.0001). Tukey post-hoc comparison showed the auto-encoder, UMAP, and t-SNE all yielded significantly superior performance (by R2) than PCA, kPCA, and NMF (all p<0.0001). There was no significant difference between auto-encoder, UMAP, and t- SNE performance (p=0.866 or higher), or between PCA, kPCA, and NMF (p=0.555 or higher).

Conclusions:

We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

Genetics:

Genetic Modeling and Analysis Methods 2
Transcriptomics 1

Neuroanatomy, Physiology, Metabolism and Neurotransmission:

Transmitter Receptors
White Matter Anatomy, Fiber Pathways and Connectivity

Physiology, Metabolism and Neurotransmission :

Physiology, Metabolism and Neurotransmission Other

Keywords:

Machine Learning
Myelin
Neurotransmitter
White Matter
Other - Allen Human Brain atlas; deep learning; representation learning; brain transcription

1|2Indicates the priority used for review
Supporting Image: Figure1.png
   ·Figure 1
Supporting Image: Figure2.png
   ·Figure 2
 

Provide references using author date format

XGBoost - https://xgboost.readthedocs.io
PyTorch - https://pytorch.org
scikit-learn - https://scikit-learn.org/stable/
UMAP - https://umap-learn.readthedocs.io/en/latest/