Unpublished Preprints

These papers have been, or are about to be, submitted for peer review.

Panton-Valentine leukocidin is the key determinant of Staphylococcus aureus pyomyositis in a bacterial genome-wide association study.

Young, B. C., Earle, S. G., Soeng, S., Sar, P., Kumar, V., Hor, S., Sar, V., Bousfield, R., Sanderson, N. D., Barker, L., Stoesser, N., Emary, K. R. W., Parry, C. M., Nickerson, E. K., Turner, P., Bowden, R., Crook, D., Wyllie, D., Day, N. P. J., Wilson, D. J. and C. E. Moore (2018)

Preprint: biorxiv doi:10.1101/430538
Abstract: Pyomyositis is a severe bacterial infection of skeletal muscle, commonly affecting children in tropical regions and predominantly caused by Staphylococcus aureus. To understand the contribution of bacterial genomic factors to pyomyositis, we conducted a genome-wide association study of S. aureus cultured from 101 children with pyomyositis and 417 children with asymptomatic nasal carriage attending the Angkor Hospital for Children in Cambodia. We found a strong relationship between bacterial genetic variation and pyomyositis, with estimated heritability 63.8% (95% CI 49.2-78.4%). The presence of the Panton-Valentine leukocidin (PVL) locus increased the odds of pyomyositis 130-fold (p=10-17.9). The signal of association mapped both to the PVL-coding sequence and the sequence immediately upstream. Together these regions explained >99.9% of heritability. Our results establish staphylococcal pyomyositis, like tetanus and diphtheria, as critically dependent on expression of a single toxin and demonstrate the potential for association studies to identify specific bacterial genes promoting severe human disease.

GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes.

D. J. Wilson and The CRyPTIC Consortium (2019)

Preprint: biorxiv doi:10.1101/523316
Abstract: The dN/dS ratio provides evidence of adaptation or functional constraint in protein-coding genes by quantifying the relative excess or deficit of amino acid-replacing versus silent nucleotide variation. Inexpensive sequencing promises a better understanding of parameters such as dN/dS, but analysing very large datasets poses a major statistical challenge. Here I introduce genomegaMap for estimating within-species genome-wide variation in dN/dS, and I apply it to 3,979 genes across 10,209 tuberculosis genomes to characterize the selection pressures shaping this global pathogen. GenomegaMap is a phylogeny-free method that addresses two major problems with existing approaches: (i) it is fast no matter how large the sample size and (ii) it is robust to recombination, which causes phylogenetic methods to report artefactual signals of adaptation. GenomegaMap uses population genetics theory to approximate the distribution of allele frequencies under general, parent-dependent mutation models. Coalescent simulations show that substitution parameters are well-estimated even when genomegaMap's simplifying assumption of independence among sites is violated. I demonstrate the ability of genomegaMap to detect genuine signatures of selection at antimicrobial resistance-conferring substitutions in M. tuberculosis and describe a novel signature of selection in the cold-shock DEAD-box protein A gene deaD/csdA. The genomegaMap approach helps accelerate the exploitation of big data for gaining new insights into evolution within species.