Print Close

Information retrieval using Large Language Models for automated neuroimaging meta-analysis

Poster No:

2251

Submission Type:

Abstract Submission

Authors:

Alejandro De La Vega¹, Jérôme Dockès², Kendra Oudyk³, James Kent¹, Jean-Baptiste Poline³

Institutions:

¹University of Texas at Austin, Austin, TX, ²Inria, Palaiseau, Other, ³McGill University, Montreal, Quebec

First Author:

Alejandro De La Vega
University of Texas at Austin
Austin, TX

Co-Author(s):

Jérôme Dockès
Inria
Palaiseau, Other

Kendra Oudyk
McGill University
Montreal, Quebec

James Kent
University of Texas at Austin
Austin, TX

Jean-Baptiste Poline
McGill University
Montreal, Quebec

Introduction:

Over 5,000 neuroimaging articles are published yearly, representing a vast but unwieldy knowledge base. Meta-analysis can help make sense of this deluge, but annotating relevant information is painstaking and time consuming. Although there have been successful efforts to automate meta-analysis using text mining (e.g. Neurosynth), they were limited to frequency based features which are unable to differentiate fine-grained cognitive constructs or extract detailed methodological information. Breakthroughs in large language models (LLM), promise to enable high-quality information retrieval with little labeled training data (Labrak et al 2023). In two studies, we evaluate the performance of pre-trained LLMs for extracting information relevant for automated neuroimaging meta-analysis, such as methods and demographics.

Methods:

We used Open AI's GPT, a commercial LLM trained on a domain-general corpus, to extract information from the unstructured text of neuroimaging articles using prompt-based Zero Shot Learning (ZSL) and evaluated results against expert human annotations. In Study 1, we identified 751 studies from the NeuroVault database that were annotated by users for "task" using the Cognitive Atlas Ontology (Poldrack et al. 2011), providing a rare sample of annotated fMRI studies. Using only the abstracts, we used ZSL to predict task across 129 unique labels (e.g., 'stroop', 'go/no-go task', 'monetary incentive delay task')

In Study 2, we evaluated GPT's ability to extract demographic information from the full text of articles and compared it to a heuristic based on Poldrack et al. 2017. This consists of two steps: a semantic search to identify the most relevant section of each paper, and a zero-shot prompt to extract participant count. We separated each article into sections of <2000 characters and generated a latent embedding for each section using OpenAI's Ada model. By computing the distance between a search query ('How many participants or subjects were recruited for this study?') and each section, we ranked sections for relevancy, and used ZSL to "identify groups of participants". We evaluated extracted data in 200 articles annotated for participant demographics, including number of participants for each group. We used the first half of annotated studies to prototype our approach and evaluated performance on a hold-out set of 103 studies. The methods used here are incorporated into an open-source package (publang) that simplifies the application of LLMs for information retrieval, and will be used to incorporate these features to the NeuroSynth ecosystem.

Results:

In Study 1, GPT-3 matched abstracts to the correct task with 87% accuracy. Using GPT-4, the latest model with over 1 trillion parameters, we achieved 100% accuracy. In comparison, using tf-idf to vectorize abstracts into 7,006 features, SVC only achieved 34% accuracy on the test set (Fig 1)

In Study 2, GPT-3 extracted sample size with an excellent 13.8% Mean Absolute Percentage Error (MAPE) and perfect recall, making a prediction for all studies. In contrast, the heuristic algorithm had a low error rate (15% MAPE), but had poor recall, only making a prediction for 58/103 studies (Fig 2). Upon inspection, we discovered GPT's accuracy was potentially high, as "errors" were often due to behavioral sample size being extracted in addition to final imaging sample. suggesting prompt engineering could improve performance. Preliminarily qualitative analyses showed that GPT was also able to extract addition information such as age, gender and disease.

Supporting Image: ScreenShot2023-12-01at123110PM.png

·Figure 1. Predict task from article abstract

Supporting Image: ScreenShot2023-12-01at123415PM.png

·Figure 2. Accuracy of extracted sample size

Conclusions:

Zero Shot Learning using general-purpose LLMs retrieves information from articles with high accuracy and recall without the need to train or develop domain-specific models. Extracted information is well suited to assist human-guided systematic literature synthesis. Paired with a flexible ecosystem for meta-analysis, such as Neurosynth, this technology can enable a powerful new class of automated fine-grained neuroimaging meta-analysis.

Modeling and Analysis Methods:

Classification and Predictive Modeling

Methods Development ²

Neuroinformatics and Data Sharing:

Informatics Other ¹

Keywords:

FUNCTIONAL MRI

Machine Learning

Meta- Analysis

^1|2Indicates the priority used for review

Provide references using author date format

Labrak, Y. (2023). “A Zero-Shot and Few-Shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks.” arXiv. http://arxiv.org/abs/2307.12114.
Poldrack, R. (2011). “The Cognitive Atlas: Toward a Knowledge Foundation for Cognitive Neuroscience.” Frontiers in Neuroinformatics 5 (September): 17.
Yarkoni, T. (2011) “Large-Scale Automated Synthesis of Human Functional Neuroimaging Data.” Nature Methods 8 (8): 665–70.