Poster No:
1903
Submission Type:
Abstract Submission
Authors:
Bradley Baker1, Vince Calhoun2, Sergey Plis3
Institutions:
1TReNDs, Atlanta, GA, 2GSU/GATech/Emory, Decatur, GA, 3Georgia State University, Atlanta, GA
First Author:
Co-Author(s):
Introduction:
In this work, we present a novel empirical method for analyzing Deep Neural Network (DNN) learning dynamics, which builds off of previous theoretical work studying changes in gradient rank in auto-differentiation. Our method utilizes the singular values of the gradient and of the component matrices which are used to compute to study evolution of dominant modes during model training. Furthermore, because we analyze gradients dynamically within auto-differentiation, we have the unique opportunity to analyze these dynamics as they adhere to individual samples. Thus, whenever samples have some kind of group labelling, we can perform statistical comparisons of gradient trajectories between groups without breaking normal training behavior. This further allows our method to stand out from post-hoc methods which not only occur outside of training, but can only be evaluated for between different classes in disjoint contexts.
To demonstrate the applicability of our method, we identify distinct training dynamics specific to Major Depressive Disorder (MDD), Bipolar Disorder (BPD), Schizophrenia, and Schizoaffective disorders across multiple studies. We show how these dynamics differ between popular model architectures used in neuroimaging, including convolutional neural networks, transformers and more.
Methods:
Reverse-mode auto-differentiation is a computational technique for computing the partial derivatives of complex functions, and has been widely applied to deep learning optimization in the form of back-propagation. Gradients for a neural network's weights are computed by first performing a "forward pass" through the network, and recording the input activations at each layer. Then, a "backward pass" is performed in which the partial derivatives on the output neurons are computed and fed into higher layers. The gradients at each layer are thus computed as an outer product between these accumulated input activations and partial derivatives with respect to the output.
To analyze the spectrum of the gradient during training, we compute the singular value decomposition (SVD) of the gradient and its constituent matrices. Because we are working within AD, we can computed the SVD of gradients belonging to particular samples, and thus to particular groups. We then perform statistical tests between singular value trajectories during training to evaluate group differences.
We begin with a Multi-Layer perceptron on FreeSurfer volumes. We then analyze functional MRI from the COBRE and FBIRN data sets. To assess dynamics in recurrent models, we perform Spatially Constrained Independent Component Analysis using the NeuroMark template, which provides us with 53 neurologically relevant spatially independent maps and associated time-series. Using the time-series data, we show can reveal group-specific gradient dynamics in 1D and 2D CNNs, LSTMs and the BERT transformer. We then utilize the spatial maps to demonstrate group-specific gradient dynamics in 3D-CNNs.
Results:
The MLP autoencoder trained on FSL data shows significant differences between SZ and HC groups in the middle singular values of the output layer, while the corresponding classifier shows more group differences in the input layer. The LSTM, BERT and 1D CNN models all show significant differences between male SZ and HC groups, with the LSTM showing differences mostly in hidden neurons, the 1D CNN showing significant differences at the output layer, and BERT showing differences across the entire model for the autoencoder task with almost no significant differences in the classifier.
Conclusions:
In this work, we have demonstrated a novel, dynamic introspection technique for DNNs applied to neuroimaging. Our method utilizes inherent structure of auto-differentiation to provide analysis which does not affect training, and we can use to compare group differences. We applied the method to multiple neuroimaging studies and demonstrate how group differences appear in different places during training and in the architecture.
Disorders of the Nervous System:
Psychiatric (eg. Depression, Anxiety, Schizophrenia)
Modeling and Analysis Methods:
Classification and Predictive Modeling 2
Exploratory Modeling and Artifact Removal
Methods Development 1
Keywords:
Data analysis
Machine Learning
Modeling
Psychiatric Disorders
1|2Indicates the priority used for review
Provide references using author date format
S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M ̈uller, and W. Samek, “On
pixel-wise explanations for non-linear classifier decisions by layer-wise relevance
propagation,” PloS one, vol. 10, no. 7, e0130140, 2015
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R.
M ̈uller, “How to explain individual classification decisions,” The Journal of Ma-
chine Learning Research, vol. 11, pp. 1803–1831, 2010.
Y. Du et al., “Neuromark: An automated and adaptive ica based pipeline to iden-
tify reproducible fmri markers of brain disorders,” NeuroImage: Clinical, vol. 28,
p. 102 375, 2020
Keator, David B., et al. "The function biomedical informatics research network data repository." Neuroimage 124 (2016): 1074-1079.
A. R. Mayer et al., “Functional imaging of the hemodynamic sensory gating re-
sponse in schizophrenia,” Human brain mapping, vol. 34, no. 9, pp. 2302–2312,
2013
J. Oh, B.-L. Oh, K.-U. Lee, J.-H. Chae, and K. Yun, “Identifying schizophrenia us-
ing structural mri with a deep learning algorithm,” Frontiers in psychiatry, vol. 11,
p. 16, 2020
P. Patel, P. Aggarwal, and A. Gupta, “Classification of schizophrenia versus normal
subjects using deep learning,” in Proceedings of the Tenth Indian Conference on
Computer Vision, Graphics and Image Processing, 2016, pp. 1–6.
M. M. Rahman, N. Lewis, and S. Plis, “Geometrically guided saliency maps,” in
ICLR 2022 Workshop on PAIR {\textasciicircum} 2Struct: Privacy, Accountability,
Interpretability, Robustness, Reasoning on Structured Data, 202
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra,
“Grad-cam: Visual explanations from deep networks via gradient-based localiza-
tion,” in Proceedings of the IEEE international conference on computer vision,
2017, pp. 618–626.
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks:
Visualising image classification models and saliency maps,” arXiv preprint arXiv:1312.6034,
2013.
M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,”
in International conference on machine learning, PMLR, 2017, pp. 3319–3328.