Towards semantic visual decoding of naturalistic movies with high-density diffuse optical tomography

Poster No:

2423 

Submission Type:

Abstract Submission 

Authors:

Wiete Fehner1, Zachary Markow1, Morgan Fogarty1, Aahana Bajracharya2, Dana Wilhelm3, Alexander Huth4, Joseph Culver3

Institutions:

1Washington University in St. Louis, St. Louis, MO, 2Washington University in St.Louis, St. Louis, MO, 3Washington University School of Medicine, St. Louis, MO, 4The University of Texas at Austin, Austin, TX

First Author:

Wiete Fehner, MS  
Washington University in St. Louis
St. Louis, MO

Co-Author(s):

Zachary Markow  
Washington University in St. Louis
St. Louis, MO
Morgan Fogarty, MS  
Washington University in St. Louis
St. Louis, MO
Aahana Bajracharya, MA, MS  
Washington University in St.Louis
St. Louis, MO
Dana Wilhelm  
Washington University School of Medicine
St. Louis, MO
Alexander Huth  
The University of Texas at Austin
Austin, TX
Joseph Culver, PhD  
Washington University School of Medicine
St. Louis, MO

Introduction:

Functional magnetic resonance imaging (fMRI) studies have validated advanced decoding approaches for visual scenes, semantic categories, and natural language using naturalistic stimuli like movies and podcasts [5], [6], [9]. These advancements are pivotal for the evolution of brain-computer interfaces, which aim to restore communication capabilities impaired by conditions such as aphasia or locked-in syndrome [2].
While fMRI is the natural choice regarding signal-to-noise ratio, whole brain field of view (FOV), and spatial specificity, its use is limited in naturalistic settings due to its size, physical constraints, and cost. In contrast, high-density diffuse optical tomography (HD-DOT) is portable and cost-efficient, allowing the extension of cortical brain mapping to naturalistic environments. This study aims to develop an experimental paradigm to decode semantic categories from natural movie scenes using very high-density DOT (VHD-DOT). One of the primary challenges in using DOT is the lack of anatomical co-registration across sessions. Therefore, establishing repeatability and consistency in cap placements across different sessions is critical for successful decoding.
Our group has shown that HD-DOT can decode visual stimuli such as checkerboards and naturalistic movie clips [4], [7], [10]. However, the field of view was limited to visual regions and the diversity of the training data was restricted. Here, we have expanded the FOV to include the entire head and incorporated a more diverse range of training movies. This marks a significant step towards decoding semantic categories from naturalistic movies. Having a detailed mapping of semantic categories is crucial for natural language decoding [5], [9].

Methods:

One participant underwent two 83-minute sessions of VHD-DOT imaging, employing a cap equipped with 255 sources and 252 detectors at 9.75 mm inter-optode spacing. To ensure repeatability, a precision protocol involving photometric alignment and facial markers was used for cap placement [1].
Each session included three 10-minute training and testing movie runs. Training movies were unique. In contrast, testing movie clips contained repetitions of three unique 1-minute clips. Clips were utilized from previously validated fMRI studies [5], [6] that were manually categorized using semantic categories from WordNet [8] by [5].
Data processing followed an established DOT pipeline by [3], including pre-processing, reconstruction, and spectroscopy. Beta maps for individual runs were generated using a general linear model (GLM) for both auditory and visual localizer tasks. A correlation analysis was conducted to assess repeatability. The correlation coefficients between all possible combinations of task blocks across sessions were computed. The mean correlation value across these pairs was considered a measure of repeatability between sessions. Similarly, a correlation analysis was performed on the test movie clips to evaluate repeatability across runs. Template-based decoding was applied to the test movie data (Figure 1).
Supporting Image: Figure1_OHBM2024.jpg
 

Results:

An experimental design for using VHD-DOT for semantic visual decoding was developed. Correlation analysis indicates a high degree of repeatability across imaging sessions, as shown by localizer task repeatability (Figure 1). Within-session repeatability was established across different test movie runs for both imaging sessions. Repeatability was demonstrated by positively correlated voxels over time, shown by a right-shifted histogram, while the correlation distribution for non-matching clips was centered, confirming distinct clip-induced activation patterns. The template-based decoding results provided additional support for these findings, particularly for clips 1 and 2 from the first session (Figure 2).
Supporting Image: Figures2_OHBM2024.jpg
 

Conclusions:

The VHD-DOT task and movie data have shown high repeatability, which is crucial for intricate decoding tasks. This lays a foundation for semantic visual decoding of naturalistic movies using full-head VHD-DOT.

Higher Cognitive Functions:

Higher Cognitive Functions Other

Language:

Language Comprehension and Semantics

Modeling and Analysis Methods:

Methods Development 2

Novel Imaging Acquisition Methods:

NIRS 1
Imaging Methods Other

Keywords:

Cognition
Data analysis
Design and Analysis
Language
Near Infra-Red Spectroscopy (NIRS)
Other - Semantics; HD-DOT; Novel Imaging Methods; Naturalistic Imaging

1|2Indicates the priority used for review

Provide references using author date format

Bajracharya, A. et al (2023). Precision Functional Mapping of Cortical Activity Using High-Density Diffuse Optical Tomography (HD-DOT). https://doi.org/10.1364/boda.2023.jtu4b.15
Dobkin, B. H. (2007). Brain-computer interface technology as a tool to augment plasticity and outcomes for neurological rehabilitation. The Journal of Physiology, 579(3), 637–642. https://doi.org/10.1113/jphysiol.2006.123067
Eggebrecht, A. T. et al (2014). Mapping distributed brain function and networks with diffuse optical tomography. Nature Photonics, 8(6), 448–454. https://doi.org/10.1038/nphoton.2014.107
Fishell, A. K. et al (2019). Mapping brain function during naturalistic viewing using high-density diffuse optical tomography. Scientific Reports, 9(1). https://doi.org/10.1038/s41598-019-45555-8
Huth, A. G. et al (2012). A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain. Neuron, 76(6), 1210–1224. https://doi.org/10.1016/j.neuron.2012.10.014
Huth, A. G. et al (2016). Decoding the Semantic Content of Natural Movies from Human Brain Activity. Frontiers in Systems Neuroscience, 10. https://doi.org/10.3389/fnsys.2016.00081
Markow, Z. E. et al (2023). Template- and model-based decoding of movie identities with high-density diffuse optical tomography of neural hemodynamics. https://doi.org/10.1117/12.2649294
Miller, G.A. (1995) WordNet: A Lexical Database for English. Communications of the ACM, 38, 39-41. https://doi.org/10.1145/219717.219748
Tang, J. et al (2023). Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26(5), 858–866. https://doi.org/10.1038/s41593-023-01304-9
Tripathy, K. et al (2021). Decoding visual information from high-density diffuse optical tomography neuroimaging data. NeuroImage, 226, 117516–117516. https://doi.org/10.1016/j.neuroimage.2020.117516