## Generalized mean *p*-values for combining dependent tests: comparison of
generalized central limit theorem and robust risk analysis [version 1; peer review: 2 approved]

D. J. Wilson (2020)

*Wellcome Open Research* **5**:55 (pdf)

The test statistics underpinning several methods for combining *p*-values are special cases
of generalized mean *p*-value (GMP), including the minimum (Bonferroni procedure), harmonic mean
and geometric mean. A key assumption influencing the practical performance of such methods
concerns the dependence between *p*-values. Approaches that do not require specific knowledge
of the dependence structure are practically convenient. Vovk and Wang derived significance
thresholds for GMPs under the worst-case scenario of arbitrary dependence using results from
Robust Risk Analysis (RRA).
Here I calculate significance thresholds and closed testing procedures using Generalized
Central Limit Theorem (GCLT). GCLT formally assumes independence, but enjoys a degree of
robustness to dependence. The GCLT thresholds are less stringent than RRA thresholds, with the
disparity increasing as the exponent of the GMP *r*)*p*-value
dependence based on a Wishart-Multivariate-Gamma distribution for the underlying log-likelihood
ratios. In simulations under this model, the RRA thresholds produced tests that were usually
less powerful than Bonferroni, while the GCLT thresholds produced tests more powerful than
Bonferroni, for all *r* > -∞*r* > -1/2*p*-value procedure and Simes' (1986) test
represent good compromises in power-robustness trade-off for combining dependent tests.