Poster No:
1975
Submission Type:
Abstract Submission
Authors:
Juhyeon Lee1, JunHo Seo2, Hyunsung Kim2, Minki Kim3, Sangsoo Jin1, ByungJun Lee2, Jong-Hwan Lee1
Institutions:
1Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea, Republic of, 2Department of Artificial Intelligence, Korea University, Seoul, Korea, Republic of, 3Department of Computer Convergence Software, Korea University, Sejong, Korea, Republic of
First Author:
Juhyeon Lee
Department of Brain and Cognitive Engineering, Korea University
Seoul, Korea, Republic of
Co-Author(s):
JunHo Seo
Department of Artificial Intelligence, Korea University
Seoul, Korea, Republic of
Hyunsung Kim
Department of Artificial Intelligence, Korea University
Seoul, Korea, Republic of
Minki Kim
Department of Computer Convergence Software, Korea University
Sejong, Korea, Republic of
Sangsoo Jin
Department of Brain and Cognitive Engineering, Korea University
Seoul, Korea, Republic of
ByungJun Lee
Department of Artificial Intelligence, Korea University
Seoul, Korea, Republic of
Jong-Hwan Lee
Department of Brain and Cognitive Engineering, Korea University
Seoul, Korea, Republic of
Introduction:
Deep neural networks have facilitated the investigation of human brain information processing across vision, speech, and language in the brain encoding framework [1]. Despite the remarkable advances and utility of deep reinforcement learning (RL) in real-world scenarios with potential neuroscientific contributions [2], investigation using the deep RL model has been confined to non-real-world scenarios [3]. This study leverages a real-world human RL paradigm and Deep Q-Network (DQN) [4] to explore the neural representations of human RL from the perspective of a deep RL agent.
Methods:
fMRI data were acquired while subjects engaged in the Photographer paradigm [5]. Subjects navigated Google Street View to capture images, maximizing reward reflecting similarity between image embedding of the capture and target text embedding by CLIP [6] (Fig. 1a). In each of the five cities, every subject captured eight images. We analyzed preprocessed fMRI data from 32 subjects (M/F=16/16, mean age=23.2). They split into two groups: 16 from the 2022 experiment as group 1 and 16 in 2023 as group 2. We obtained beta-valued brain maps for capture trials using a general linear model.
A DQN-based RL agent independently performed the Photographer paradigm. To extract image embeddings during street exploration, a pretrained CLIP-Image encoder was attached to the DQN (Fig. 1b). Implemented using the Pytorch, the RL agent was trained to maximize the discounted episodic return over 3,000,000 timesteps with a discount factor of 0.999. The learning rate was 0.0001 and the target update rate was maintained at 0.005.
Images captured by subjects were fed to the trained RL agent, and embeddings of each layer/module were extracted. For Representational Similarity Analysis (RSA) on each subject, we calculated a representational dissimilarity matrix (RDM) of layer embedding between trials and excluded within-run blocks. Similarly, a neural RDM was obtained using the multivoxel beta-values within a searchlight with a four-voxel size radius from a center voxel. The similarity between an RL agent's RDM and a neural RDM was determined via Spearman's ρ, which was z-scored and smoothed with a 6 mm FWHM Gaussian kernel. One-sample t-tests were applied to individual RSA maps, generating group inference maps for all subjects or each group for reproducibility analysis.

Results:
Subjects could learn the strategy to enhance reward scores across runs (p=0.026, Fig. 2a). The RL agent achieved an average score of 98.5 across the five cities, indicating training success. The capture probability of the RL agent marginally increased across runs (p=0.09; Fig. 2b). The trajectory revealed the RL agent's traversal across streets to capture high-scoring images containing objects closely aligned with the target (Fig. 2c). Examining data from all subjects, lower layers of the image encoder resembled the early visual area, whereas middle layers with higher-order visual areas and higher-order cognitive areas such as frontal lobe [7] (Fig. 2de). DQN hidden layers were associated with medial prefrontal [8], superior parietal [3], posterior cingulate cortices [9], and putamen [10]. The output layer exhibited high similarity with the paracentral gyrus and M1.
The two groups showed similar neural representations for the image encoder despite greater statistical significance in group 2 (Fig. 2fg). However, group 1 displayed a larger similarity to the DQN in the areas related to reward and cognitive functions. RL appeared more evident in group 1 than in group 2 from reward scores [5], and the number of subjects who suspected object 'person' for high reward score differed as 11 and 6. Beta-values in the superior occipital gyrus were greater in group 2 during exploration (two-sample t-test, p<0.01).

Conclusions:
To our knowledge, our study is the first to demonstrate the reproducible hierarchical neural representations of human RL in real-world scenarios using a deep RL agent.
Higher Cognitive Functions:
Higher Cognitive Functions Other 2
Learning and Memory:
Learning and Memory Other
Modeling and Analysis Methods:
Classification and Predictive Modeling
Multivariate Approaches 1
Other Methods
Keywords:
Computational Neuroscience
Computing
FUNCTIONAL MRI
Learning
Machine Learning
Modeling
Multivariate
Other - Naturalistic Imaging; Reinforcement learning; Representational Similarity Analysis
1|2Indicates the priority used for review
Provide references using author date format
[1] A. Saxe, S. Nelli, and C. Summerfield, “If deep learning is the answer, what is the question?,” Nature Reviews Neuroscience, vol. 22, no. 1, pp. 55–67, 2021.
[2] M. Botvinick, J. X. Wang, W. Dabney, K. J. Miller, and Z. Kurth-Nelson, “Deep Reinforcement Learning and Its Neuroscientific Implications,” Neuron, vol. 107, no. 4, pp. 603–616, Aug. 2020, doi: 10.1016/j.neuron.2020.06.014.
[3] L. Cross, J. Cockburn, Y. Yue, and J. P. O’Doherty, “Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments,” Neuron, 2020, doi: 10.1016/j.neuron.2020.11.021.
[4] V. Mnih et al., “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[5] S. Jin, J. Lee, and J.-H. Lee, “How to Be a Good Photographer: Multi-modal Learning In a Real-life Environment,” presented at the Organization for Human Brain Mapping (OHBM), Montreal, Canada, Jul. 2023.
[6] A. Radford et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[7] Q. Zhou, C. Du, and H. He, “Exploring the brain-like properties of deep neural networks: a neural encoding perspective,” Machine Intelligence Research, vol. 19, no. 5, pp. 439–455, 2022.
[8] M. S. Tomov, P. A. Tsividis, T. Pouncy, J. B. Tenenbaum, and S. J. Gershman, “The neural architecture of theory-based reinforcement learning,” Neuron, vol. 111, no. 8, pp. 1331-1344. e8, 2023.
[9] J. M. Pearson, S. R. Heilbronner, D. L. Barack, B. Y. Hayden, and M. L. Platt, “Posterior cingulate cortex: adapting behavior to a changing world,” Trends in cognitive sciences, vol. 15, no. 4, pp. 143–151, 2011.
[10] N. Viñas-Guasch and Y. J. Wu, “The role of the putamen in language: a meta-analytic connectivity modeling study,” Brain Structure and Function, vol. 222, pp. 3991–4004, 2017.
Acknowledgment: This work was supported by the National Research Foundation (NRF) grant (NRF-2021M3E5D2A01022515, No. RS-2023-00218987) and the Electronics and Telecommunications Research Institute (ETRI) grant [23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System] funded by the Korea government (MSIT) and by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2019-0-00079 , Artificial Intelligence Graduate School Program [Korea University]).