Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study

Walker, T. M., Ip, C. L. C., Harrell, R. H., Evans, J. T., Kapatai, G., Dedicoat, M. J., Eyre, D. W., Wilson, D. J., Hawkey, P. M., Crook, D. W., Parkhill, J., Harris, D., Walker, A. S., Bowden, R., Monk, P., Smith, E. G. and T. E. A. Peto (2012)
Lancet Infectious Diseases doi:10.1016/S1473-3099(12)70277-3. (pdf)

Background
Tuberculosis incidence in the UK has risen in the past decade. Disease control depends on epidemiological data, which can be difficult to obtain. Whole-genome sequencing can detect microevolution within Mycobacterium tuberculosis strains. We aimed to estimate the genetic diversity of related M tuberculosis strains in the UK Midlands and to investigate how this measurement might be used to investigate community outbreaks.

Methods
In a retrospective observational study, we used Illumina technology to sequence M tuberculosis genomes from an archive of frozen cultures. We characterised isolates into four groups: cross-sectional, longitudinal, household, and community. We measured pairwise nucleotide differences within hosts and between hosts in household outbreaks and estimated the rate of change in DNA sequences. We used the findings to interpret network diagrams constructed from 11 community clusters derived from mycobacterial interspersed repetitive-unit-variable-number tandem-repeat data.

Findings
We sequenced 390 separate isolates from 254 patients, including representatives from all five major lineages of M tuberculosis. The estimated rate of change in DNA sequences was 0.5 single nucleotide polymorphisms (SNPs) per genome per year (95% CI 0.3-0.7) in longitudinal isolates from 30 individuals and 25 families. Divergence is rarely higher than five SNPs in 3 years. 109 (96%) of 114 paired isolates from individuals and households differed by five or fewer SNPs. More than five SNPs separated isolates from none of 69 epidemiologically linked patients, two (15%) of 13 possibly linked patients, and 13 (17%) of 75 epidemiologically unlinked patients (three-way comparison exact p<0.0001). Genetic trees and clinical and epidemiological data suggest that super-spreaders were present in two community clusters.

Interpretation
Whole-genome sequencing can delineate outbreaks of tuberculosis and allows inference about direction of transmission between cases. The technique could identify super-spreaders and predict the existence of undiagnosed cases, potentially leading to early treatment of infectious patients and their contacts.