Doublethink: simultaneous Bayesian-frequentist model-averaged hypothesis testing

Fryer, H. R., Arning, N. and D. J. Wilson (2025)
arXiv doi: 10.48550/arXiv.2312.17566 (preprint)

Establishing the frequentist properties of Bayesian approaches widens their appeal and offers new understanding. In hypothesis testing, Bayesian model averaging addresses the problem that conclusions are sensitive to variable selection. But Bayesian false discovery rate (FDR) guarantees are sensitive to subjective prior assumptions. Here we show that Bayesian model-averaged hypothesis testing is a closed testing procedure that controls the frequentist familywise error rate (FWER) in the strong sense. To quantify the FWER, we use the theory of regular variation and likelihood asymptotics to derive a chi-squared tail approximation for the model-averaged posterior odds. Convergence is pointwise as the sample size grows and, in a simplified setting subject to a minimum effect size assumption, uniform. The 'Doublethink' method computes simultaneous posterior odds and asymptotic $p$-values for model-averaged hypothesis testing. We explore Doublethink through a Mendelian randomization study and simulations, comparing to approaches like LASSO, stepwise regression, the Benjamini-Hochberg procedure, the harmonic mean p-value and $e$-values. We consider the limitations of the approach, including finite-sample inflation, and mitigations, like testing groups of correlated variables. We discuss the benefits of Doublethink, including post-hoc variable selection, and its wider implications for the theory and practice of hypothesis testing.

Daniel Wilson

Doublethink: simultaneous Bayesian-frequentist model-averaged hypothesis testing