Poster No:
1911
Submission Type:
Abstract Submission
Authors:
Peter Van Dyken1, Mohamed Yousif1, Ali Khan2
Institutions:
1Schulich School of Medicine and Dentistry, London, Ontario, 2University of Western Ontario, London, Ontario
First Author:
Co-Author(s):
Mohamed Yousif
Schulich School of Medicine and Dentistry
London, Ontario
Ali Khan
University of Western Ontario
London, Ontario
Introduction:
The general adoption of the Brain Imaging Data Structure (BIDS) specification (Gorgolewski et al., 2016) has enabled a robust ecosystem of neuroimaging apps that process BIDS formatted datasets (Gorgolewski et al., 2017). Common to these apps is an indexer that reads and parses the dataset files and returns desired files in response to queries. pybids has been the de facto standard library for this purpose (Yarkoni et al., 2019), but is limited by slow runtime, a problem especially evident on large datasets (indexing a 100,000 file dataset can take several minutes). This creates a computational bottleneck for downstream apps and hampers interactive programming. Previous attempts to address this problem, notably ancpbids and bids2table, were also written in Python, fundamentally limiting their speed. To overcome this limitation, we developed rsbids, a BIDS indexer written in the Rust programming language. Unlike Python, Rust is compiled and can achieve speeds equal to or greater than that of C++ libraries. Unlike C++, rust libraries are relatively easy to compile and the compiler enforces memory safety, making them easier to write and maintain (Bugden & Alahmar, 2022). rsbids can be installed from PyPI and seamlessly used in Python programs. It has a pybids compatible API, allowing its use as a drop-in pybids replacement. Here, we compare the performance of rsbids to previously developed BIDS indexers.
Methods:
Benchmarks were measured using the HBN EO/EC task dataset (LM et al., 2022), composed of 177,065 files and available on OpenNeuro. All benchmarks were calculated on a CentOS Linux 7 system with x86_64 architecture, an Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, a locally mounted SSD drive, and CPython 3.11.2. Where possible, rsbids was compared to pybids, ancpbids, and bids2table across four tasks, excluding tools from tasks they could not perform. First, the dataset was indexed without metadata (file names were read and parsed, but JSON-sidecar files were not read, bids2table excluded). Second, the dataset was indexed with metadata (as the previous task, but additionally indexing metadata in JSON-sidecar files, ancpbids excluded). Third, a single subject was queried from the indexed dataset (retrieving all the files associated with that subject). Fourth, 14 subjects, one run, and one suffix were queried from the indexed dataset (retrieving all files associated with the intersection of entity values). Benchmarks were measured as the average of five replicates.
Results:
When indexing without metadata, rsbids completed in 1.59 s, compared to 30.59 s for ancpbids and 45.80 s for pybids. When metadata was included, rsbids took 11.22 s, compared to 104.2 s for bids2table and 294.7 s for pybids. When indexing a single subject, rsbids took 13.1 ms, slightly slower than bids2table at 7.41 ms, but faster than pybids and ancpbids at 383 ms and 4,300 ms respectively. In the complex query, however, rsbids outperformed all other apps, with 15.1 ms compared to 399 ms for bids2table, 188 ms for pybids, and 40,000 ms for ancpbids.

·Runtime of four BIDS indexing apps in indexing and querying tasks
Conclusions:
rsbids achieved 10 - 35-fold speed gains for dataset indexing compared to other libraries. Querying was also generally faster than other libraries, especially ancpbids. Notably, although bids2table was slightly faster querying a single subject, its benchmark was negatively impacted in large queries, unlike rsbids, whose benchmark was relatively unaffected. rsbids is currently in an alpha release and yet to implement the dataset validation supported by pybids. This, along with testing, documentation, and configurability, will be emphasized in the next stages of development. Yet, even at this early stage, rsbids demonstrates it can relieve the indexing bottleneck even for very large datasets, giving it a promising future as the standard bids indexing library.
Modeling and Analysis Methods:
Methods Development 1
Neuroinformatics and Data Sharing:
Workflows 2
Informatics Other
Keywords:
Informatics
Open-Source Code
Open-Source Software
Workflows
1|2Indicates the priority used for review
Provide references using author date format
Bugden, W. (2022). Rust: The Programming Language for Safety and Performance. arXiv.
Gorgolewski, K. J. (2017). BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods. PLOS Computational Biology, 13(3), e1005209.
Gorgolewski, K. J. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), 160044.
LM, A. (2022). HBN EO/EC task. OpenNeuro.
Yarkoni, T. (2019). PyBIDS: Python tools for BIDS datasets. Journal of Open Source Software, 4(40), 1294.