Identifying direct risk factors in UK Biobank with simultaneous Bayesian-frequentist model-averaged hypothesis testing using Doublethink

Arning, N., Fryer, H. R. and D. J. Wilson (2025)
medRxiv doi: 10.1101/2024.01.01.24300687 (preprint)

Big data approaches to discovering non-genetic risk factors have lagged behind genome-wide association studies that routinely uncover novel genetic risk factors for diverse diseases. Instead, epidemiology typically focuses on candidate risk factors. Since modern biobanks contain thousands of potential risk factors, candidate approaches may introduce bias, inadequately control for multiple testing, and overlook important signals. Doublethink, a novel model-averaged hypothesis testing approach, offers a solution that simultaneously controls the Bayesian false discovery rate (FDR) and frequentist familywise error rate (FWER) while accounting for uncertainty in variable selection. Here we investigate direct risk factors for COVID-19 hospitalization from among 1,912 variables in 201,917 UK Biobank participants by implementing a Doublethink-based exposome-wide association study using Markov Chain Monte Carlo. Focusing on the 2020 outbreak, we find nine individual variables and six groups of variables exposome-wide significant at 9% FDR and 0.05% FWER. We identify significant direct effects among relatively overlooked risk factors including psychiatric disorders, dementia and prior infection, which we evaluate in relation to studies of other populations. We detect significant direct effects among some commonly reported risk factors like age, sex and obesity, but not others like diabetes, cardiovascular disease, hypertension, which may be mediated instead through variables representing general comorbidity. Doublethink produces interchangeable posterior odds and p-values for individual variables and arbitrary groups, facilitating flexible and powerful post-hoc hypothesis testing. We discuss the potential for impact and limitations of joint Bayesian-frequentist hypothesis testing, including the benefits of an agnostic exposome-wide approach to discovery.

See also Fryer, Arning and Wilson (2025).