Clarifying the reliability paradox: poor test-retest reliability attenuates group differences

Poster No:

1851 

Submission Type:

Abstract Submission 

Authors:

Povilas Karvelis1, Andreea Diaconescu1,2,3,4

Institutions:

1CAMH, Toronto, ON, 2Department of Psychology, University of Toronto, Toronto, ON, 3Institute of Medical Sciences, University of Toronto, Toronto, ON, 4Department of Psychiatry, University of Toronto, Toronto, ON

First Author:

Povilas Karvelis  
CAMH
Toronto, ON

Co-Author:

Andreea Diaconescu  
CAMH|Department of Psychology, University of Toronto|Institute of Medical Sciences, University of Toronto|Department of Psychiatry, University of Toronto
Toronto, ON|Toronto, ON|Toronto, ON|Toronto, ON

Introduction:

Cognitive tasks that produce robust group effects tend to have poor test-rest reliability – a phenomenon known as the reliability paradox (Hedge et al., 2018). This is true for simple summary statistics of task behavior (Hedge et al., 2018), computational measures obtained from modelling task behavior (Karvelis et al., 2023), as well as task-based fMRI activations (Elliott et al., 2020). Most literature on this issue highlights how poor test-retest reliability undermines correlational individual differences research as well as translational personalized and precision psychiatry efforts. Our aim here is to demonstrate that poor test-retest reliability is detrimental not just for studying individual differences, but also for studying group differences (e.g., patients vs controls).

Methods:

To illustrate our argument, we ran model simulations. We generated synthetic datasets of varying levels of between-subject and error variance and investigated how it affected test-retest reliability, individual differences, within-subject effects, and between-subject effects. We used intra-class correlation coefficient (ICC) to estimate test-retest reliability, Cohen's d to estimate group difference effect size, and Pearson's r to estimate correlations. While our analysis is general and would apply to any scenario of comparing different groups, to make our demonstration more intuitive, we considered two illustrative cases: 1) comparing patients vs. controls and 2) comparing two groups created via a median split.

Results:

First, our simulations reproduce the reliability paradox and provide intuitive clarification that robust group effects are achieved by minimizing overall variance (not just between-subject variance; Fig. 1). Second, and most importantly, our simulations show that test-retest reliability attenuates observed between-subject effects just as much as it attenuates observed correlations – this was equally true in both cases under consideration (patients vs. controls and median split; Fig. 2).
Supporting Image: Fig1.png
Supporting Image: Fig2.png
 

Conclusions:

Our work highlights that the reliability paradox has even wider implications than originally stated: low test-retest reliability undermines not only individual differences but also group differences research. Note that this applies not only to studying patient groups but also to many other areas of research: sex differences, ethnic differences, age differences, etc. Overall, our findings further stress that improving test-retest reliability of cognitive measures is of paramount importance for improving the quality of research.

Modeling and Analysis Methods:

Activation (eg. BOLD task-fMRI) 2
Methods Development 1
Other Methods

Keywords:

Cognition
Computational Neuroscience
Modeling
Psychiatric Disorders
Other - Reliability

1|2Indicates the priority used for review

Provide references using author date format

Elliott, M. L. (2020). What is the test-retest reliability of common task-functional MRI measures? New empirical evidence and a meta-analysis. Psychological science, 31(7), 792-806.

Hedge, C. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior research methods, 50(3), 1166-1186.

Karvelis, P. (2023). Individual differences in computational psychiatry: a review of current challenges. Neuroscience & Biobehavioral Reviews, 105137.