Junifer and Julearn: From Neuroimaging to Machine Learning models without expert-level coding skills

Poster No:

1404 

Submission Type:

Abstract Submission 

Authors:

Federico Raimondo1,2, Synchon Mandal1,2, Sami Hamdan1,2, Shammi More1,2, Leonard Sasse1,2,3, Vera Komeyer1,2,4, Amir Omidvarnia1,2, Simon Eickhoff1,2, Kaustubh Patil1,2

Institutions:

1Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich, Jülich, Germany, 2Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany, 3Max Planck School of Cognition, Leipzig, Germany, 4Department of Biology, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany

First Author:

Federico Raimondo  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany

Co-Author(s):

Synchon Mandal  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany
Sami Hamdan  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany
Shammi More  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany
Leonard Sasse  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf|Max Planck School of Cognition
Jülich, Germany|Düsseldorf, Germany|Leipzig, Germany
Vera Komeyer  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf|Department of Biology, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University Düsseldorf
Jülich, Germany|Düsseldorf, Germany|Düsseldorf, Germany
Amir Omidvarnia  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany
Simon Eickhoff  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany
Kaustubh Patil  
Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich|Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf
Jülich, Germany|Düsseldorf, Germany

Introduction:

Thanks to big data and computational power, the study of brain-cognition relationships using neuroimaging and machine learning (ML) has gained significant popularity [1]. Importantly, decisions in data processing [2] and predictive modeling [3] strongly impact the results. Also, misconceptions about ML procedures can distort or invalidate findings[4], hence escalating the neuroimaging reproducibility crisis [5]. These decisions and implementations can become increasingly complex, posing challenges for early-career researchers: they require proficiency in diverse skills to deal with large-scale datasets, algorithms, and complex ML set-ups, while they also require domain-specific knowledge for experiment design and interpretation. To mitigate these issues, we introduce two complementary instruments: Junifer, a feature-extraction tool that does not require coding, and Julearn, an easy-to-use ML library. These tools aim to create a bridge between preprocessed neuroimaging data and ML-based analysis by emphasizing ease-of-use and maintainability.

Methods:

ML-based neuroimaging analysis entails four steps: 1) preprocessing (e.g. using FreeSurfer, fMRIPrep, CAT, AFNI, ANTS, SPM, etc), 2) feature extraction, 3) ML model building and benchmarking, and 4) post-hoc model analysis (e.g. evaluate features importance) [6]. All steps entail technical and conceptual challenges. However, steps 2 and 3 are more vulnerable due to the complexity of the required code or scripts. We here propose to facilitate step 2 with Junifer, and step 3 with Julearn.
Junifer (https://juaml.github.io/junifer) is a no-code tool that allows parametrizing each step of a feature extraction pipeline in a text file, using the simple YAML syntax specification. By specifying the dataset (or its structure) and a list of markers to compute, the user can easily compute all the features required for their ML models. And with a just a few more lines of text, all the processing can be done in computational clusters. Among Junifer's most prominent features are a vast list of built-in datasets, parcellations, masks and markers, as well as processing in native and standard spaces (various MNI). Feature extraction is transparent and reproducible, as the full pipeline configuration is stored within each output file.
Julearn (https://juaml.github.io/julearn) is created for building and evaluating ML models. Built on top of the highly influential scikit-learn[7], Julearn offers a robust interface to build complex ML pipelines in a user-friendly way suitable also for novice programmers. It allows users to evaluate and compare ML models in a CV-consistent manner, minimizing the risks of leakage and overestimation of results. Further features include neuroimaging specific models, corrected statistical test for ML models' comparisons [9], interactive results visualization, and inspection tools to link to post-hoc analysis.

Results:

To exemplify, we aimed at evaluating the performance of a ML-model to predict chronological age from cortical and subcortical functional connectivity (FC) in native space. The 34-lines long text file allows us to compute the FC for all the participants and resting state recordings (N=4800) in the HCP-YA dataset, combining a cortical and a subcortical parcellation (Figure 1-A). With 42 more lines of Python code, we can evaluate the performance of the neuroimaging specific Connectome-Based Predictive Modelling (CBPM) [8]-based linear regression model, using nested cross-validation and hyperparameter tuning (Figure 1-B).

Conclusions:

Junifer and Julearn enable researchers to easily and correctly bridge neuroimaging and ML models. Junifer provides the tools to process large-scale neuroimaging datasets and extract tabular features for ML applications. Julearn allows users to build and compare ML models from any tabular dataset minimizing the code complexity, lowering the coding skills required to perform advanced ML-based analysis.

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Neuroinformatics and Data Sharing:

Workflows 2

Keywords:

Data analysis
Design and Analysis
Machine Learning
Open-Source Code
Open-Source Software

1|2Indicates the priority used for review
Supporting Image: figure1.png
 

Provide references using author date format

[1] J. Wu, J. Li, S. B. Eickhoff, D. Scheinost, and S. Genon, ‘The challenges and prospects of brain-based prediction of behaviour’, Nat Hum Behav, vol. 7, no. 8, Art. no. 8, Aug. 2023, doi: 10.1038/s41562-023-01670-1.
[2] G. Antonopoulos, S. More, F. Raimondo, S. B. Eickhoff, F. Hoffstaedter, and K. R. Patil, ‘A systematic comparison of VBM pipelines and their application to age prediction’, NeuroImage, vol. 279, p. 120292, Oct. 2023, doi: 10.1016/j.neuroimage.2023.120292.
[3] S. More et al., ‘Brain-age prediction: A systematic comparison of machine learning workflows’, Neuroimage, vol. 270, p. 119947, Apr. 2023, doi: 10.1016/j.neuroimage.2023.119947.
[4] L. Sasse et al., ‘On Leakage in Machine Learning Pipelines’. arXiv, Nov. 07, 2023. doi: 10.48550/arXiv.2311.04179.
[5] K. J. Gorgolewski and R. A. Poldrack, ‘A Practical Guide for Improving Transparency and Reproducibility in Neuroimaging Research’, PLOS Biology, vol. 14, no. 7, p. e1002506, Jul. 2016, doi: 10.1371/journal.pbio.1002506.
[6] S. M. Lundberg and S.-I. Lee, ‘A Unified Approach to Interpreting Model Predictions’, in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
[7] F. Pedregosa et al., ‘Scikit-learn: Machine Learning in Python’, Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2012, doi: 10.1007/s13398-014-0173-7.2.
[8] X. Shen et al., ‘Using connectome-based predictive modeling to predict individual behavior from brain connectivity’, Nat Protoc, vol. 12, no. 3, pp. 506–518, Mar. 2017, doi: 10.1038/nprot.2016.178.
[9] C. Nadeau and Y. Bengio, ‘Inference for the Generalization Error’, Machine Learning, vol. 52, no. 3, pp. 239–281, Sep. 2003, doi: 10.1023/A:1024068626366.