Print Close

Internal production reveals the spatiotemporal neural dynamics of speech

Poster No:

1055

Submission Type:

Abstract Submission

Authors:

Joan Orpella¹^,2, Francesco Mantegna¹, M. Florencia Assaneo³, David Poeppel¹^,4

Institutions:

¹New York University, New York, NY, ²Georgetown University, Washington D.C., DC, ³Universidad Nacional Autónoma de México, Santiago de Queretaro, Mexico, ⁴Ernst Strüngmann Institute, Frankfurt, Germany

First Author:

Joan Orpella
New York University|Georgetown University
New York, NY|Washington D.C., DC

Co-Author(s):

Francesco Mantegna
New York University
New York, NY

M. Florencia Assaneo, Ph.D.
Universidad Nacional Autónoma de México
Santiago de Queretaro, Mexico

David Poeppel
New York University|Ernst Strüngmann Institute
New York, NY|Frankfurt, Germany

Introduction:

Speech production models converge on a sequence of necessary stages (e.g., phonological retrieval, phonetic encoding)(1,2). Establishing the neural substrates for these stages has proven surprisingly challenging, primarily due to the correlational nature of most experiments. While recent ECoG studies(3,4) overcome this limitation by using machine-learning to decode speech from neural data, their spatial coverage remains limited. Here, we used MEG with its excellent temporal resolution and whole-brain coverage along with a decoding approach to track participants' sequence of speech representations over time and brain space. To avoid data artifacts from muscle activity during the preparation and execution of speech, we used internal speech, which closely mirrors overt speech(5,6).

Methods:

31 subjects participated; 22 (mean age=28.19(6.57)) were tested on syllables pa-ta-ka and 9 (mean age=23(7.94)) on ta-tu-ti. Participants either internally produced or passively viewed one of three visually presented syllables (Fig 1A). We estimated the onset of subjects' internal productions based on their productions in an overt version of the task. To determine whether/when each internally produced syllable can be decoded from the noninvasive neural data, we trained and tested a linear classifier at each millisecond for each of the possible pairwise syllable contrasts and for each participant/condition. The average time course of decoding performance for these contrasts per participant was subjected to a permutation test to determine the times of significant decoding (Fig 1B). To establish the neural sources of syllable decoding, we projected each participant's sensor data to source space and performed a decoding analysis for each source and pairwise syllable contrast, taking as input features the times corresponding to each identified peak (trough to trough; Fig 1B). The averages within participant of the pairwise syllable decoding results for each source were morphed to a standard source space and subjected to a permutation test to determine spatial clusters of significant syllable decoding for each peak (Fig 1C). We acquired electromyographic data from participants' upper lip and jaw to confirm that the MEG signals were not contaminated by micromovements.

Results:

Fig 1B shows that the identity of syllables –differing only by the onset stop consonant and internally planned/produced by participants– can be robustly recovered from the MEG data with high temporal resolution. The succession of decoding peaks suggests that a sequence of distinct neural representations underlies internal syllable production. This succession is markedly different from that obtained from passively viewing the syllables (Fig 1B). We sought to determine the neural sources of syllable decoding corresponding to the identified peaks. Fig 1C shows the resulting sequence of neural speech representations decoded from the planning and internal production of speech. The sequence, which includes robust decoding from left speech motor/auditory areas immediately preceding/following the expected syllable onset time (based on subjects' overt productions), adheres to current processing models. Data from the ta-tu-ti cohort validated these results and extended them by revealing a greater presence of auditory (vs motor) representations for this set.

Conclusions:

Subtle phonemic contrasts can be recovered from the neural substrates identified by current models for each stage of the speech production process. These results resolve longstanding questions regarding the informational content of previously reported dynamic brain activity patterns for speech production. The data also provide direct evidence for the generation of precise sensory predictions during speech production, so far only inferred from speaking induced suppression(7,8) or responses to altered feedback(9,10) and supports the close parallels between core processes for internal and overt speech.

Language:

Speech Production ¹

Modeling and Analysis Methods:

EEG/MEG Modeling and Analysis

Multivariate Approaches

Motor Behavior:

Motor Planning and Execution ²

Novel Imaging Acquisition Methods:

MEG

Keywords:

Language

Machine Learning

MEG

Motor

Multivariate

Other - Speech; Production; Decoding; Machine-learning

^1|2Indicates the priority used for review

Provide references using author date format

1. Levelt, W. J. M. (1989), "Speaking: From intention to articulation". The MIT Press.
2. Guenther, F. H. (2016), "Neural Control of Speech". The MIT Press.
3. Metzger, S. L. et al. (2023), "A high-performance neuroprosthesis for speech decoding and avatar control". Nature, vol. 620, pp. 1037–1046.
4. Wang, R. et al. (2023), "Distributed feedforward and feedback cortical processing supports human speech production". Proc. Natl. Acad. Sci., vol. 120, pp. 1–12.
5. Tian, X. & Poeppel, D. (2010), "Mental imagery of speech and movement implicates the dynamics of internal forward models". Front. Psychol., vol. 1, pp. 1–23.
6. Soroush, P. Z. et al. (2022), "The Nested Hierarchy of Overt , Mouthed , and Imagined Speech Activity Evident in Intracranial Recordings". Neuroimage, vol. 269, pp. 1–26.
7. Flinker, A. et al. (2010), "Single-trial speech suppression of auditory cortex activity in humans". J. Neurosci., vol. 30, pp. 16643–16650.
8. Numminen, J., Salmelin, R. & Hari, R. (1999), "Subject’s own speech reduces reactivity of the human auditory cortex". Neurosci. Lett., vol. 265, pp. 119–122.
9. Houde, J. F. & Jordan, M. I. (1998), "Sensorimotor adaptation in speech production". Science, vol. 279, pp. 1213–1216.
10. Tremblay, S., Shiller, D. M. & Ostry, D. J. (2003), "Somatosensory basis of speech production". Nature, vol. 423, pp. 866–869.