Poster No:
2234
Submission Type:
Abstract Submission
Authors:
Sebastian Urchs1, Alyssa Dai1, Arman Jahanpour1, Michelle Wang1, Nikhil Bhagwat1, Brent McPherson1, Rémi Gau1, David Keator2, Jeffrey Grethe3, Satrajit Ghosh4, David Kennedy5, Romain Valabrègue6, Stephen Whitmarsh7, Stéphane Lehéricy8, Yaroslav Halchenko9, Mallar Chakravarty10, Jean-Baptiste Poline1
Institutions:
1McConnell Brain Imaging Centre, The Neuro, McGill University, Montreal, Quebec, 2Psychiatry and Human Behavior, University of California,, Irvine, CA, 3Department of Neurosciences, School of Medicine, University of California, San Diego, CA, 4Harvard/MIT, Boston, MA, 5University of Massachusetts Chan Medical School, Worcester, MA, 6Paris Brain Institute (ICM), Paris, Ile De France, 7Data Analysis Core facility, Sorbonne Université, Institut du Cerveau - Paris Brain Institute, Paris, Paris, 8Pitié-Salpêtrière Hospital, AP-HP, Paris, Ile De France, 9Dartmouth College, Hanover, NH, 10Brain Imaging Centre, Douglas Research Centre, Montreal, Quebec
First Author:
Sebastian Urchs
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Co-Author(s):
Alyssa Dai
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Arman Jahanpour
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Michelle Wang
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Nikhil Bhagwat
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Brent McPherson
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
Rémi Gau
McConnell Brain Imaging Centre, The Neuro, McGill University
Montreal, Quebec
David Keator, PhD
Psychiatry and Human Behavior, University of California,
Irvine, CA
Jeffrey Grethe
Department of Neurosciences, School of Medicine, University of California
San Diego, CA
David Kennedy
University of Massachusetts Chan Medical School
Worcester, MA
Stephen Whitmarsh
Data Analysis Core facility, Sorbonne Université, Institut du Cerveau - Paris Brain Institute
Paris, Paris
Introduction:
Sharing and combining neuroimaging datasets is impactful [1], mandated by funders and journals [2], and essential for obtaining the increasingly large samples needed to identify and validate robust brain-behaviour markers [3]. However, the growing amount and detail of participant information in datasets also raises questions on data governance [4] and legal constraints to data sharing [5] that can differ across institutions. These constraints hamper the pooling of data in centrally curated data platforms and the open sharing of data.
Federated data governance is an alternative approach that stores and curates data locally under the control of the collecting institution (data owner), and connects and integrates data in a decentralised manner through the adoption of common technical standards and protocols. Recent efforts in genomics show the promise of this federated model [6] but also the greater need for coordination on terminologies and the often prohibitive technical complexity.
Neurobagel provides user-friendly tools for a research group to 1) annotate their own data using existing, standardised vocabularies and 2) create harmonised representations of the subject data in a local Neurobagel node, in turn 3) enabling cross-dataset subject search based on specific attributes. Here we introduce a federation architecture that builds on existing Neurobagel tools to provide participant-level search across decentralised nodes, which remain under the control of their local data owners, while respecting site-specific data visibility constraints.
Methods:
The federation API (or f-API) is a Dockerized API built on existing local Neurobagel node architecture. Each Neurobagel node exposes a node API (or n-API) for querying data within, and the node owner can choose the restrictiveness of query results (Fig. 1). The f-API and n-API are designed so tools can be developed that can consume either API. The f-API supports two use cases: 1) deployed to the public internet as the federation engine for openly accessible Neurobagel nodes, 2) deployed intra-institute to enable internal federated queries across nodes managed by different local data owners (e.g., research groups). The node-specific control over query result granularity helps ensure that regardless of federation scope, individual dataset sharing constraints can be satisfied.
The f-API reads an index of known nodes it can federate over, that by default includes all public Neurobagel nodes so that they will be readily searchable even by an internally deployed f-API. When a query is received, the f-API forwards it asynchronously to all known nodes and combines the responses, including complete or aggregated information about matching participants, into a single set of results to the user (Fig. 2). The f-API also exposes the list of nodes it federates over, allowing a user to query only a specific subset of nodes. Adding a new node to the f-API simply requires updating the local index of nodes and restarting the f-API service.

·Figure 1

·Figure 2
Results:
Our public f-API searches over 23500 participants and 342 datasets from 3 public Neurobagel nodes, 2 of which provide only aggregated results. The index of public Neurobagel nodes is available on GitHub, making it easy to update. The graphical Neurobagel query tool (query.neurobagel.org) has been updated to be compatible with the federation API and lets users choose which nodes to include in a query. An internal f-API was tested at the Douglas Research Centre, where internal users can now search harmonised local data alongside those from the publicly available Neurobagel nodes.
Conclusions:
The Neurobagel federation architecture will allow local data owners to participate in a federated data governance system by adopting existing Neurobagel tools for annotation and deployment of local nodes. We will expand this architecture by aligning with existing protocols for federated authentication [7], granting data owners finer grained control over what users can query about their data.
Neuroinformatics and Data Sharing:
Databasing and Data Sharing 1
Workflows 2
Keywords:
Data Organization
Open Data
Open-Source Code
Open-Source Software
Other - Data Federation
1|2Indicates the priority used for review
Provide references using author date format
1. Milham MP, Craddock RC, Son JJ, Fleischmann M, Clucas J, Xu H, et al. Assessment of the impact of shared brain imaging data on the scientific literature. Nat Commun. 2018;9: 2818.
2. NOT-OD-21-013: Final NIH Policy for Data Management and Sharing. [cited 26 Nov 2023]. Available: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
3. Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022; 1–7.
4. Boeckhout M, Zielhuis GA, Bredenoord AL. The FAIR guiding principles for data stewardship: fair enough? Eur J Hum Genet. 2018;26: 931–936.
5. The General Data Protection Regulation. 2016/679 Apr 27, 2016. Available: http://data.europa.eu/eli/reg/2016/679/2016-05-04
6. Rehm HL, Page AJH, Smith L, Adams JB, Alterovitz G, Babb LJ, et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 2021;1. doi:10.1016/j.xgen.2021.100029
7. Rueda M, Ariosa R, Moldes M, Rambla J. Beacon v2 Reference Implementation: a toolkit to enable federated sharing of genomic and phenotypic data. Bioinformatics. 2022;38: 4656–4657.