Unpublished Preprints

These papers have been, or are about to be, submitted for peer review.

Severe infections emerge from the microbiome by adaptive evolution

Young, B. C., Wu, C.-H., Gordon, N. C., Cole, K., Price, J. R., Lui, E., Sheppard, A. E., Perera, S., Charlesworth, J., Golubchik, T., Iqbal, Z., Bowden, R., Massey, R. C., Paul, J., Crook, D. W., Peto, T. E. A., Walker, A. S., Llewelyn, M. J., Wyllie, D. H. and D. J. Wilson (2017)
biorxiv doi:10.1101/116681 (preprint)

Bacteria responsible for the greatest global mortality colonize the human microbiome far more frequently than they cause severe infections. Whether mutation and selection within the microbiome accompany infection is unknown. We investigated de novo mutation in 1163 Staphylococcus aureus genomes from 105 infected patients with nose-colonization. We report that 72% of infections emerged from the microbiome, with infecting and nose-colonizing bacteria showing parallel adaptive differences. We found 2.8-to-3.6-fold enrichments of protein-altering variants in genes responding to rsp, which regulates surface antigens and toxicity; agr, which regulates quorum-sensing, toxicity and abscess formation; and host-derived antimicrobial peptides. Adaptive mutations in pathogenesis-associated genes were 3.1-fold enriched in infecting but not nose-colonizing bacteria. None of these signatures were observed in healthy carriers nor at the species-level, suggesting disease-associated, short-term, within-host selection pressures. Our results show that infection, like a cancer of the microbiome, emerges through spontaneous adaptive evolution, raising new possibilities for diagnosis and treatment.

The harmonic mean p-value and model averaging by mean maximum likelihood

D. J. Wilson (2017)
biorxiv doi:10.1101/171751 (preprint)

Analysis of 'big data' frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example in the search for genetic determinants of disease in genome-wide association studies (GWAS). Model averaging is a valuable technique for evaluating the combined evidence of groups of hypotheses, simultaneously testing multiple levels of groupings, and determining post hoc the optimal trade-off between group composition versus significance. Here I introduce the harmonic mean p-value (HMP) for assessing model-averaged fit, which arises from a new method for model averaging by mean maximum likelihood (MAMML), underpinned by generalized central limit theorem. Through a human GWAS for neuroticism and a joint human-pathogen GWAS for hepatitis C viral load, I show how HMP easily combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses, enhancing the potential for scientific discovery. HMP and MAMML have broad implications for the analysis of large datasets by enabling model averaging for classical statistics.