Experience-dependent plasticity in fMRI: an investigation using linear regression models & VideoMAE

Poster No:

1480 

Submission Type:

Abstract Submission 

Authors:

Katrina Portelli1, Simon Dahan1, Yourong Guo2, Anderson Winkler3, Logan Williams1, Emma Robinson1

Institutions:

1King's College London, London, London, 2King’s College London, London, London, 3University of Texas Rio Grande Valley, Brownsville, TX

First Author:

Katrina Portelli  
King's College London
London, London

Co-Author(s):

Simon Dahan  
King's College London
London, London
Yourong Guo  
King’s College London
London, London
Anderson Winkler, Dr.  
University of Texas Rio Grande Valley
Brownsville, TX
Logan Williams  
King's College London
London, London
Emma Robinson, Dr  
King's College London
London, London

Introduction:

Self-supervision algorithms that approximate neural processing of speech offer potential for data-driven hypotheses of sensory processing and model-free extraction of spatio-temporal features from complex stimuli [1,3,4]. We take this approach to investigate experience-dependent changes in brain activity following repeated movie watching, using data from the Human Connectome Project (HCP). We use a vision transformer to construct a latent encoding of movie frames that predict neural responses to movie stimuli. Following validation, we use this model to identify areas whose activity changes with repeated exposure to the same movies.

Methods:

Functional magnetic resonance imaging (fMRI) data was obtained from the HCP 7T release [9]. 184 participants were scanned while passively viewing a series of movie clips, acquired in up to four separate sessions. At the end of each session, the same test-retest movie clip was presented. We limited analyses to participants who had completed four runs of interest (n = 172). fMRI responses were projected from volume to the cortical surface space and aligned using MSMAll [2,7].

Movie frames were encoded using a video masked autoencoder (VideoMAE) [8], pre-trained on the Kinetics400 movie dataset [4]. Spatio-temporal activations were extracted from segments of 3-second movie clips. These activations were linearly regressed against brain activity from the central time point of each clip. A ridge regression model was trained for each subject, regularised with Thikonov penalties, accounting for temporal lag using the approach of [3]. Training data consisted of fMRI compiled from all sessions, excluding the test-retest clips.

The linear model's performance was assessed for each session by generating correlation maps at the vertex level (fig.1b), using Pearson correlation between the predicted and actual time series in response to the test-retest clip. Correlation values were averaged across the 360 regions of the HCP multimodal parcellations, propagated to individuals as described in [2]. Correlations were then compared across sessions using repeated measures permutation analysis of linear models (PALM) corrected for multiple comparisons.

Results:

Here we focus on fMRI responses from scanning sessions 1 and 4. We first assessed correlations between the linear regression model's predicted fMRI timeseries and the actual fMRI timeseries from session 1. Significant correlations were found in 87 and 91 parcels in the left and right hemispheres, respectively (fig.1c). These areas functionally fall within the dorsal and ventral visual streams, auditory association areas and frontal regions related to decision-making and reward.

Comparing correlation coefficients in the significant parcels from session 1 to the same parcels in session 4 revealed significant decreases in prediction accuracy (fig.2). These decreases were observed in 88 parcels (44/87 in the left and 44/91 in the right hemisphere). Specifically, parcels in the left and right superior and inferior temporal cortex, frontal opercular cortex, orbitofrontal cortex, posterior cingulate cortex and the MT+ cortex. Activity in these cortices has been linked to the processing of dynamic visual stimuli [5].

In a subset of parcels, lateralisation of changes in accuracy was observed. Parcels specific to the right hemisphere included the premotor eye field, retrosplenial cortex, area PFm, area LO3 and visual area 8. In contrast to the left hemisphere, prediction accuracy in the right early visual areas V2 V3, was not significantly different between sessions.
Supporting Image: ohbm_figure1.jpg
Supporting Image: ohbm_figure2.jpg
 

Conclusions:

We show videoMAE-derived features reliably predicted neural responses to movie clips in visual processing areas. Repeat exposure to movie stimuli was associated with a change in prediction accuracy in a subset of these areas, demonstrating the potential of these models for identifying areas undergoing experience-dependent neural plasticity, thereby advancing our mechanistic understanding of sensory processing.

Learning and Memory:

Learning and Memory Other 2

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Keywords:

Design and Analysis
FUNCTIONAL MRI
Learning
Machine Learning
Other - self-supervised learning

1|2Indicates the priority used for review

Provide references using author date format

1. Caucheteux, C., Gramfort, A. & King, JR (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour 7, 430–441.
2. Glasser, M.F., Coalson, T., Robinson, E. et al. (2016). A multi-modal parcellation of human cerebral cortex. Nature 536, 171–178.
3. Huth, A., de Heer, W., Griffiths, T. et al.(2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458
4. Kay, W., Carreira, J., Simonyan, K. et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.
5. Lahnakoski, J.M., Salmi, J., Jaaskelainen, I.P. et al. (2012). Stimulus-Related Independent Component and VoxelWise Analysis of Human Brain Activity during Free Viewing of a Feature Film. PLoS ONE 7(4): e35215.
6. Millet, J., Caucheteux, C., Orhan P. et al. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. arXiv preprint arXiv:2206.01685.
7. Robinson, E., Garcia, K., Glasser, M.F. et al. (2018). Multimodal Surface Matching with Higher-Order Smoothness Constraints. NeuroImage, 167, 453–465.
8. Tong, Z., Song, Y., Wang, J., & Wang, L. (2022). VideoMae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35, 10078-10093.
9. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., and Ugurbil, K. (2013). The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79.