FMRIPrep preprocessing of the UK Biobank using CBRAIN for NeuroHub

Poster No:

2236 

Submission Type:

Abstract Submission 

Authors:

Xuan Mai PHAM1,2,3, Rida Abou-Haidar1,2,3, Natacha Beck1,2,3, Sergiy Boroday1,2,3, Samir Das1,2,3, Xavier Lecours-Boucher1,2,3, Brent McPherson1,3, Darcy Quesnel1,2,3, Pierre Rioux1,2,3, Bryan Caron1,2,3, Jean-Baptiste Poline1,3, Alan Evans1,2,3

Institutions:

1Montreal Neurological Institute, McGill University, Montreal, Quebec, Canada, 2McGill Centre for Integrative Neuroscience, Montreal, Quebec, Canada, 3Ludmer Centre for Neuroinformatics and Mental Health, Montreal, Quebec, Canada

First Author:

Xuan Mai PHAM  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada

Co-Author(s):

Rida Abou-Haidar  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Natacha Beck  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Sergiy Boroday  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Samir Das  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Xavier Lecours-Boucher  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Brent McPherson  
Montreal Neurological Institute, McGill University|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada
Darcy Quesnel  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Pierre Rioux  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Bryan Caron, Dr.  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada
Jean-Baptiste Poline  
Montreal Neurological Institute, McGill University|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada
Alan Evans  
Montreal Neurological Institute, McGill University|McGill Centre for Integrative Neuroscience|Ludmer Centre for Neuroinformatics and Mental Health
Montreal, Quebec, Canada|Montreal, Quebec, Canada|Montreal, Quebec, Canada

Introduction:

NeuroHub (https://neurohub.ca) is a core platform of McGill University's Healthy Brains, Healthy Lives initiative (https://www.mcgill.ca/hbhl/). NeuroHub offers researchers an overarching data and computational platform to store and analyze data, collaborate with colleagues and work with computational infrastructure. In particular, NeuroHub provides a unifying and efficient access to the UK Biobank (Miller et al., 2016) to the McGill research community. CBRAIN (Sherif et al., 2014) allows scientists to launch large-scale big data analyses using advanced scientific tools through an easy to use web-based user interface.
CBRAIN has been used to perform large-scale preprocessing of the UK Biobank imaging data.

Methods:

All UK Biobank data is downloaded and packaged into a single, access controlled instance. This instance is maintained on the Digital Research Alliance of Canada's Beluga system through a large-scale, multi-year data storage allocation. The downloaded images are arranged together with the associated tabular participant entries into a BIDS structure (see Figure 1). The imaging data is then pre-processed by the NeuroHub team using tools such as CIVET (Kim et al., 2005), TractoFlow (Theaud et al., 2020) and fMRIPrep (Esteban et al., 2018). Derived data is shared back with other application members and avoids unnecessary duplication of computational effort. Since November 2023, preprocessing of the UKBiobank subjects with fMRIprep has been completed using CBRAIN and the outputs have been packaged for user-level access.

The outputs of the fMRIPrep pipeline run over 37,732 subjects from the UK Biobank dataset with neuroimaging data have been generated and are available on both the CBRAIN portal and directly on Beluga. The dataset is voluminous, containing 16,935,185 files and uses 117 terabytes of disk space. It took nearly 5 months for the HPC systems to produce the files, and over 103 years of computing time. During processing, over two billion intermediate files were produced. The fMRIPrep task parameters include different output spaces offering multiple options to the users depending on their area of interest.

Results:

Authorized users can access the fMRIPrep outputs through the NeuroHub portal and the command-line on the host HPC system. All modes of access adhere to the data use agreement with the UK Biobank.

The portal offers a secure and friendly Graphical User Interface to the 37,732 fMRIPrep outputs. By selecting the file(s) of interest, users can open, expand and visualize the file(s) directly in the browser without the need of downloading (see Figure 2).

On the Beluga system, the outputs of the fMRIPrep pipeline are available through a series of SquashFS files and need to be mounted via an Apptainer container. They are packed in 189 SquashFS files named as fmriprep_000_1000011-1025826.sqfs, for example. In this first file, the subject IDs 1000011 to 1025826 are present. Each SquashFS file contains about 200 subjects. Because the amount of data and the number of files are so large, users are not recommended to copy any of the SquashFS files. Instead, we offer multiple ways to directly access the contents. Detailed documentation is provided on how to browse the fMRIPrep outputs interactively and how to use tools and scripts for processing non-interactively.

Conclusions:

NeuroHub provides McGill researchers with a unifying and efficient point of access to the UK Biobank. Users are able to access the UK Biobank fMRIprep output through HPC resources via the NeuroHub and CBRAIN portals as well as the command-line. The coordinated preprocessing of UK Biobank data efficiently leverages available compute resources and avoids costly duplication of storage and effort. By doing so, NeuroHub offers a wide spectrum of preprocessed data (Diffusion-weighted imaging, Civet Output and now fMRIprep output) available for users without the need of running pipelines so they can focus on advancing their research.

Neuroinformatics and Data Sharing:

Databasing and Data Sharing 1
Workflows 2
Informatics Other

Keywords:

Computational Neuroscience
Computing
Data analysis
Data Organization
Informatics
Other - Preprocessing data

1|2Indicates the priority used for review
Supporting Image: Figure1UKBiobankfMRIprepoutputsflow.png
Supporting Image: Figure2ExamplefMRIPrepoutputinCBRAIN.png
 

Provide references using author date format

Das S, Zijdenbos AP, Harlap J, Vins D, Evans AC, “LORIS: a web-based data management system for multi-center studies,” Front. Neuroinformatics, vol. 5, 2012, doi: 10.3389/fninf.2011.00037.

Esteban O, Markiewicz CJ, Blair RW, Moodie CA, Isik AI, Erramuzpe A, Kent JD, Goncalves M, DuPre E, Snyder M, Oya H, Ghosh SS, Wright J, Durnez J, Poldrack RA, Gorgolewski KJ. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat Meth. 2018; doi:10.1038/s41592-018-0235-4

Harding RJ, Bermudez P, Beauvais M, Bellec P, Hill S, Knoppers BM, Pavlidis P, Poline J-B, Roskams J, Stikov N, Stone J, Strother S, CONP Consortium, Evans, AC (2022, February 10). The Canadian Open Neuroscience Platform – An Open Science Framework for the Neuroscience Community. https://doi.org/10.31219/osf.io/eh349

Kim, J.S., Singh, V., Lee, J.K., Lerch, J., et al.: Automated 3-D extraction and evaluation of the inner and outer cortical surfaces using a Laplacian map and partial volume effect classification. Neuroimage 27(1), 210–221 (2005)

Miller, K, Alfaro-Almagro, F, Bangerter, N et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat Neurosci 19, 1523–1536 (2016). https://doi.org/10.1038/nn.4393

Rioux P, Kiar G, Hutton A, Evans AC and Brown ST, 2020 Deploying large fixed file datasets with SquashFS and Singularity. PEARC ’20: Practice and Experience in Advanced Research Computing, https://dl.acm.org/doi/10.1145/3311790.3401776

Sherif T, Rioux P, Rousseau M-E, Kassis N, Beck N, Glatard T, Adalat R, Das S, Evans AC (2014) “CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research,” Front. Neuroinformatics, vol. 8, May 2014, doi: 10.3389/fninf.2014.00054

Theaud, G., Houde, J.-C., Boré, A., Rheault, F., Morency, F., Descoteaux, M.,TractoFlow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity, NeuroImage, https://doi.org/10.1016/j.neuroimage.2020.116889.