Improved Pipeline Development and Quality Control using File-tree and FSL-pipe

Poster No:

2266 

Submission Type:

Abstract Submission 

Authors:

Michiel Cottaar1, Paul McCarthy2

Institutions:

1Oxford University, Oxford, United Kingdon, 2Oxford University, Oxford, United Kingdom

First Author:

Michiel Cottaar  
Oxford University
Oxford, United Kingdon

Co-Author:

Paul McCarthy  
Oxford University
Oxford, United Kingdom

Introduction:

Neuroimaging pipelines typically produce a wide variety of different files (especially if one counts temporary, intermediate files). The need for different parts of the pipeline to correctly find these files on disk causes most pipelines to only work with a very specific file structure. This limits the interoperability between pipelines.

To overcome this, one could either encourage a common file structure across all pipelines (e.g., BIDS1) or make it easier to change the file structure used by an existing pipeline. We present two new tools to do the latter: file-tree2 and FSL-pipe3 .

In this abstract we illustrate the benefits of these new tools for running quality control across many subjects (even for output from pipelines not based on file-tree or FSL-pipe) and for pipeline developers.

Methods:

File-tree is both a simple-to-write file format that defines a file structure and a python library to access that file structure (Figure 1A)2 Rather than directly including the file paths in the pipeline, the pipeline can refer to this file-tree to find the paths for any input, intermediate, or output files. FSL-pipe3 allows for the writing of declarative pipelines built on top of file-tree.

Results:

Benefits for quality control
Quality control (QC) across many subjects can be tedious. Such QC often relies on summary measures or 2D static images, which do not capture the full complexity of 3D neuroimaging data. Alternatively, one can open each subject's data in an interactive viewer; however setting up a tool like FSLeyes4 (potentially with several complementary image/timeseries sub-views, each with their own display parameters), repeatedly for each subject can be very time-consuming.

In FSLeyes, this has now been sped up. FSLeyes can load file-trees, which enables users to set up a plot to their liking for a single subject and then, with a single click, switch to the equivalent plot for any other subject (Figure 1B). In this example, the resulting image could be produced using either a file-tree describing the full HCP directory structure5 already included in FSL or by writing a file-tree describing the relevant part in Figure 1C.

Benefits for pipeline development
FSL-pipe is a new tool allowing one to write pipelines in a declarative manner. What this means is that, rather than describing what the computer must do and in what order, one instead provides the computer with a set of "recipes". For example, in Figure 2A we define a multi-step registration6,7 pipeline with 3 recipes. By then providing a file-tree that defines where the input/output files are located (Figure 2B), FSL-pipe can put these recipes into a pipeline by inferring any dependencies between them (Figure 2C). FSL-pipe will also ensure that relevant output directories exist before any job is run.

The code in Figure 2A is a fully functional pipeline with a command line interface allowing the user to:
1. Run only part of the pipeline (by requesting specific output files to be produced or selecting specific subject IDs) from the command line or using a GUI. By default, this pipeline will run for any subjects that have a T1-weighted image.
2. Overwrite any output files that already exist or to keep them.
3. Select whether jobs are run in sequence, in parallel using dask8, or submitted to a computing cluster. FSL-pipe will ensure that jobs run in the right order (and, where possible, in parallel).
Supporting Image: qc-figure.jpg
   ·Figure 1
Supporting Image: pipeline-figure.jpg
   ·Figure 2
 

Conclusions:

File-tree and FSL-pipe are currently being adopted throughout FSL and in version 2 of the UK biobank pipeline. However, they are generic tools that would also benefit pipelines not based on FSL, even including non-neuroimaging pipelines.

Documentation and tutorials:
- File-tree (including movie on QC): https://open.win.ox.ac.uk/pages/fsl/file-tree/
- FSL-pipe: https://open.win.ox.ac.uk/pages/fsl/fsl-pipe/

Neuroinformatics and Data Sharing:

Databasing and Data Sharing
Workflows 1
Informatics Other 2

Keywords:

Data Organization
Informatics
Open Data
Open-Source Code
Open-Source Software
Workflows
Other - pipelines

1|2Indicates the priority used for review

Provide references using author date format

1. Gorgolewski, K.J. et al. (2016) ‘The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.’, Sci Data, 3, p. 160044. doi:10.1038/sdata.2016.44.
2. Cottaar, M. (2022) ‘File-tree: define the content of a structured directory for visualisation or pipelines’. Zenodo. doi:10.5281/zenodo.6576809.
3. Cottaar, M. (2022) ‘Pipe-tree: declarative pipelines based on FileTrees’. Zenodo. doi:10.5281/zenodo.6577070.
4. McCarthy, P. (2023) ‘FSLeyes’. Zenodo. doi:10.5281/zenodo.8376979.
5. Glasser, M.F. et al. (2013) ‘The minimal preprocessing pipelines for the Human Connectome Project.’, Neuroimage, 80, pp. 105–24. doi:10.1016/j.neuroimage.2013.04.127.
6. Jenkinson, M. et al. (2002) ‘Improved optimization for the robust and accurate linear registration and motion correction of brain images.’, Neuroimage, 17(2), pp. 825–41. doi:10.1016/s1053-8119(02)91132-8.
7. Andersson, J.L., Jenkinson, M. and Smith, S. (2007) Non-linear registration, aka spatial normalisation. Oxford University. Available at: http://www.fmrib.ox.ac.uk/analysis/techrep/.
8. Dask Development Team (2016) Dask: Library for dynamic task scheduling. Available at: https://dask.org.