Overview


Note this version of the tutorial addresses an error in the original paper in which the harmonic mean p-value for set \(\mathcal{R}\) is compared against a threshold \(\alpha_{|\mathcal{R}|}\), whereas \(\alpha_L\) should be used. See correction: http://blog.danielwilson.me.uk/2019/07/correction-harmonic-mean-p-value-for.html


The harmonic mean p-value (HMP) is a method for performing a combined test of the null hypothesis that no p-value is significant (Wilson, 2019). Unlike Fisher’s (1934) method, it is robust to dependence between the p-values, making it much more broadly applicable. Like Bonferroni correction, the HMP controls the strong-sense family-wise error rate (ssFWER), but it is potentially much more powerful. It is also more powerful than the BH procedure (Benjamini and Hochberg, 1995) which controls both the weak-sense family-wise error rate (wsFWER) and the false discovery rate (FDR), in the sense that the HMP tends to find one or more p-values or groups of p-values significant more often than BH procedure finds one or more p-values significant.

Method Robust to dependence Indicative power1 Controls
Significance very rare Significance uncommon Significance common FDR wsFWER ssFWER
Fisher \(\times\) \(\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
HMP \(\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\bullet\) \(\bullet\,\bullet\,\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)
BH \(\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\,\circ\) \(\checkmark\) \(\checkmark\) \(\times\)
Bonferroni \(\checkmark\) \(\bullet\,\bullet\,\circ\) \(\bullet\,\bullet\) \(\bullet\,\bullet\) \(\checkmark\) \(\checkmark\) \(\checkmark\)

There are two approaches to applying the HMP:

  1. One compares \(\overset{\circ}{p}_{\mathcal{R}}\), the HMP of a set of p-values \(\mathcal{R}\), to significance threshold \(\alpha_L\,w_\mathcal{R}\).

  2. Equivalently, one compares \(p_{\overset{\circ}{p}_{\mathcal{R}}}\), an asymptotically exact HMP, to the significance threshold \(\alpha\,w_\mathcal{R}\).

In the above,

  • \(\overset{\circ}{p}_{\mathcal{R}}=\left(\sum_{i\in\mathcal{R}}w_{i}\right)/\left(\sum_{i\in\mathcal{R}}w_{i}/p_{i}\right)\) defines the HMP for the set of p-values \(\mathcal{R}\). It is computed in R by \(\texttt{hmp.stat}\).

  • \(\alpha_L\) is a significance threshold that depends on the desired family-wise error rate \(\alpha\) and the total number of individual p-values \(L\). It is computed in R by \(\texttt{qharmonicmeanp}\).

  • \(w_1 \dots w_L\) are weights for the individual p-values that must sum to one, i.e. \(\sum_{i=1}^{L}w_{i}=1\), and \(w_\mathcal{R} = \sum_{i\in\mathcal{R}} w_i\) equals the sum of weights of the individual p-values in set \(\mathcal{R}\). The HMP is robust to the choice of weights, so it is reasonable to start with equal weights (\(w_{i}=1/L\)).

  • The asymptotically exact p-value is computed by R using the \(\texttt{p.hmp}\) command, which takes into account the total number of individual p-values, \(L\).

In this tutorial, I will focus on the second approach because one usually wishes to quote in reports a p-value that can be directly compared to the usual significance threshold, e.g. \(\alpha=0.05\). When the subscript \(\mathcal{R}\) is dropped from \(\overset{\circ}{p}_{\mathcal{R}}\) and \(p_{\overset{\circ}{p}_{\mathcal{R}}}\) it means \(\mathcal{R}\) includes all the p-values. This is the “headline” HMP.

The method may be used as follows:

  • The headline HMP is deemed significant when \(\overset{\circ}{p}\leq\alpha_L\) or, equivalently, \(p_{\overset{\circ}{p}}\leq\alpha\). Here significant means that we reject the null hypothesis that none of the p-values are significant.

  • If the headline HMP is not significant, neither is the HMP for any subset. If the headline HMP is significant, subsets may also be significant. The significance thresholds are all pre-determined so the number of subsets that are tested does not affect them.

  • The HMP for a subset of p-values is deemed significant when \(\overset{\circ}{p}_{\mathcal{R}}\leq\alpha_L\,w_{\mathcal{R}}\) or, equivalently, \(p_{\overset{\circ}{p}_{\mathcal{R}}}\leq\alpha\,w_{\mathcal{R}}\). Here significant means that we reject the null hypothesis that none of the p-values in subset \(\mathcal{R}\) are significant.

Preliminaries

To use this tutorial, copy and paste the R code from your web browser to the R console. In the HTML version, you can select and copy R code simply by clicking within the code snippet (as long as JQuery is enabled in your web browser and the page was compiled with Rmarkdown v2). To ensure the tutorial runs correctly, execute each code snippet once, in order. This tutorial was compiled using Rmarkdown (Allaire et al., 2018), with equations rendered by Mathjax.

If you haven’t already installed the package, type at the R console:

install.packages("harmonicmeanp")

Once you have installed the package, load it in the usual way:

library(harmonicmeanp)
## Loading required package: FMStable

Example 1. Sliding Window Analysis

Download the 312457 p-values from chromosome 12 of the genome-wide association study (GWAS) for neuroticism (Okbay et al., 2016). This file is an excerpt of http://ssgac.org/documents/Neuroticism_Full.txt.gz. For usage conditions see http://ssgac.org/documents/ReadMe_genetic_variants_associated_with_swb.txt. It took me a few seconds to download the data excerpt. The 8 megabyte file contains rs identifiers and SNP positions as per human genome build GRCh37/hg19 as well as the p-values.

system.time((gwas = read.delim("http://www.danielwilson.me.uk/files/Neuroticism_ch12.txt",
  header=TRUE,as.is=TRUE)))
##    user  system elapsed 
##   1.461   0.087   2.803
head(gwas)
##            rs    pos      p
## 1   rs7959779 149478 0.3034
## 2   rs4980821 149884 0.5905
## 3 rs192950336 150256 0.1125
## 4  rs61907205 151213 0.4896
## 5   rs2368809 151236 0.7066
## 6   rs4018398 151469 0.9420

The harmonic mean p-value (HMP) is a statistic with which one can perform a combined test of the null hypothesis that none of the p-values is significant even when the p-values are dependent. In GWAS, p-values will often be dependent because of genetic linkage. The HMP can be used to test the null hypothesis that no SNPs on chromosome 12 are significant. Let’s do it manually by first calculating the HMP, assuming equal weights. Note that a total of L=6524432 tests were performed genome-wide, so this number must be used to determine the weights if we are to control the genome-wide ssFWER, even though we are only analysing the 312457 SNPs on chromosome 12 in this example.

L = 6524432
gwas$w = 1/L
R = 1:nrow(gwas)
(HMP.R = sum(gwas$w[R])/sum(gwas$w[R]/gwas$p[R]))
## [1] 0.0008734522

One of the remarkable properties of the HMP is that for small values (e.g. below 0.05), the HMP can be directly interpreted as a p-value (Wilson, 2019). Since the HMP equals \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\) it can be directly interpreted, suggesting it is strongly significant (before multiple testing correction). To test this formally, first the HMP significance threshold is computed. For that I will assume a false positive rate of \(\alpha=0.05\), i.e. 5%.

# Specify the false positive rate
alpha = 0.05
# Compute the HMP significance threshold
(alpha.L = qharmonicmeanp(alpha, L))
## [1] 0.02593083

The multiple testing-adjusted threshold against which to evaluate the significance of the combined test is determined by the sum of the weights for the p-values being combined. The HMP for subset \(\mathcal{R}\) is significant when \(\overset{\circ}{p}_\mathcal{R}\leq \alpha_L w_\mathcal{R}\).

# Test whether the HMP for subset R is significance
w.R = sum(gwas$w[R])
alpha.L * w.R
## [1] 0.001241835

Therefore we can reject the null hypothesis of no association on chromosome 12 at level \(\alpha=0.05\) because 0.0008734522 is below 0.001241835.

An equivalent approach is to calculate an asymptotically exact p-value based on the HMP.

# Use p.hmp instead to compute the HMP test statistic and
# calculate its asymptotically exact p-value in one step
pharmonicmeanp(HMP.R, L=L, lower.tail=TRUE)
## [1] 0.00089113

As you can see, the asymptotically exact p-value of \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.00089113\) is very close to the HMP of \(\overset{\circ}{p}_{\mathcal{R}}=0.0008734522\) because the HMP is much smaller than one. Note however that direct interpretation of the HMP is anti-conservative compared to the asymptotically exact test, which is why the HMP had to be compared directly to the more stringent threshold \(\alpha_L=0.02593083\). The asymptotically exact p-value can be computed in one step:

# Note that the p.hmp function has been redefined to take argument L. Omitting L will issue a warning.
R = 1:nrow(gwas)
p.hmp(gwas$p[R],gwas$w[R],L)
##      p.hmp 
## 0.00089113

The combined p-value for chromosome 12 is useful because if the combined p-value is not significant, neither is any constituent p-value, after multiple testing correction, as always. Conversely, if the combined p-value is significant, there may be one or more subsets of constituent p-values that are also significant. These subsets can be hunted down because another useful property of the HMP is that the significance thresholds of these further tests are the same no matter how many combinations of subsets of the constituent p-values are tested. Specifically, for any subset \(\mathcal{R}\) of the L p-values, the HMP is compared against a threshold \(\alpha_L\,w_{\mathcal{R}}\) (equivalently, the asymptotically exact HMP is compared against a threshold \(\alpha\,w_{\mathcal{R}}\)), where \(w_{\mathcal{\mathcal{R}}}=\sum_{i\in\mathcal{R}}w_{i}\) and the \(w_{i}\)s are the weights of the individual p-values, constrained to sum to one. Assuming equal weights, \(w_{i}=1/L\), meaning that \(w_{\mathcal{R}}=\left|\mathcal{R}\right|/L\) equals the fraction of all tests being combined. In what follows I will mainly use the asymptotically exact p-values, rather than directly interpreting the HMP.

For example, separately test the p-values occurring at even and odd positions on chromosome 12:

R = which(gwas$pos%%2==0)
p.hmp(gwas$p[R],gwas$w[R],L)
##        p.hmp 
## 0.0009190465
w.R = sum(gwas$w[R])
alpha*w.R
## [1] 0.001200587
R = which(gwas$pos%%2==1)
p.hmp(gwas$p[R],gwas$w[R],L)
##        p.hmp 
## 0.0008647167
w.R = sum(gwas$w[R])
alpha*w.R
## [1] 0.001193928

Unsurprisingly, in view of genetic linkage, the two tests are both significant: for even positions, the combined p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.0009190465\) which was less than the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001200587\) and for odd positions, the combined p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}=0.0008647167\) which was less than the significance threshold of \(\alpha\,w_{\mathcal{R}}=0.001193928\).

Comparing p-values with different significance thresholds can be confusing. Instead, it is useful to calculate adjusted p-values, which are compared directly to the nominal significance threshold \(\alpha\). An adjusted p-value is simply divided by its weight w. For example:

R = which(gwas$pos%%2==0)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##      p.hmp 
## 0.03827487
R = which(gwas$pos%%2==1)
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##      p.hmp 
## 0.03621311

Now it is easy to see that we can rule out the null hypotheses of no significant p-values for both even-numbered and odd-numbered positions, assuming \(\alpha=0.05\).

Of course it makes little sense to combine p-values according to whether their position is an even or odd number. Instead we might wish to test the first 156229 SNPs on chromosome 12 separately from the second 156228 SNPs to begin to narrow down regions of significance.

R = 1:156229
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##   p.hmp 
## 12.1041
R = 156230:312457
p.R = p.hmp(gwas$p[R],gwas$w[R],L)
w.R = sum(gwas$w[R])
(p.R.adjust = p.R/w.R)
##      p.hmp 
## 0.01858945

This is much clearer: only in the second half of the chromosome can we reject the null hypothesis of no significant p-values at the \(\alpha=0.05\) level. For the first half of the chromosome, the adjusted p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=12.1041\). While p-values must be 1 or below, adjusted p-values need not be. For the second half of the chromosome, the adjusted p-value was \(p_{\overset{\circ}{p}_{\mathcal{R}}}/w_{\mathcal{R}}=0.01858945\) which is below the standard significance threshold of \(\alpha=0.05\).

Note that it was completely irrelevant that we had already performed tests of even- and odd-positioned SNPs: as mentioned above, the significance thresholds are pre-determined by the \(w_{\mathcal{R}}\)’s no matter how many subsets of p-values are tested and no matter in what combinations. We can test any subset of the p-values without incurring further multiple testing penalties. For example, let’s test 1 megabase windows overlaping at 0.2 megabase intervals. Testing overlapping versus non-overlapping windows has no effect on the significance thresholds, but of course it has an effect on the resolution of our conclusions and on the computational time.

# Define overlapping sliding windows of 1 megabase at 0.2 megabase intervals
win.1M.beg = outer(0:floor(max(gwas$pos/1e6)),(0:4)/5,"+")*1e6
# Calculate the combined p-values for each window
system.time({
  p.1M = apply(win.1M.beg,c(1,2),function(beg) {
    R = which(gwas$pos>=beg & gwas$pos<(beg+1e6))
    p.hmp(gwas$p[R],gwas$w[R],L)
  })
})
##    user  system elapsed 
##   3.576   0.719   4.298
# Calculate sums of weights for each combined test
system.time({
  w.1M = apply(win.1M.beg,c(1,2),function(beg) {
    R = which(gwas$pos>=beg & gwas$pos<(beg+1e6))
    sum(gwas$w[R])
  })
})
##    user  system elapsed 
##   3.371   0.783   4.175
# Calculate adjusted p-value for each window
p.1M.adj = p.1M/w.1M 

Now plot them

# Took a few seconds, plotting over 312k points
gwas$p.adj = gwas$p/gwas$w
plot(gwas$pos/1e6,-log10(gwas$p.adj),pch=".",xlab="Position on chromosome 12 (megabases)",
  ylab="Adjusted significance (-log10 adjusted p-value)",
  ylim=sort(-log10(range(gwas$p.adj,p.1M.adj,na.rm=TRUE)))) 
arrows(win.1M.beg/1e6,-log10(p.1M.adj),(win.1M.beg+1e6)/1e6,len=0,col="green2",lwd=2)
# Superimpose the significance threshold, alpha, e.g. alpha=0.05
abline(h=-log10(0.05),col="black",lty=2)
# For comparison, plot the conventional GWAS threshold of 5e-8. Need to convert
# this into the adjusted p-value scale. Instead of comparing each raw p-value
# against a Bonferonni threshold of alpha/L=0.05/6524432, we would be comparing each
# against 5e-8. So the adjusted p-values p/w=p*L would be compared against
# 5e-8*L = 5e-8 * 6524432 = 0.3262216
abline(h=-log10(0.3262216),col="grey",lty=3)