Reanalysis of Chinese Treponema pallidum samples: all Chinese samples cluster with SS14-like group of syphilis-causing treponemes
BMC Research Notes volume 11, Article number: 16 (2018)
Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Genetic analyses of TPA reference strains and human clinical isolates have revealed two genetically distinct groups of syphilis-causing treponemes, called Nichols-like and SS14-like groups. So far, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Recently, Sun et al. (Oncotarget 2016. https://doi.org/10.18632/oncotarget.10154) described a new “phylogenetic group” (called Lineage 2) among Chinese TPA strains. This lineage exhibited a “mosaic genomic structure” of Nichols-like and SS14-like lineages.
We reanalyzed the primary sequencing data (Project Number PRJNA305961) from the Sun et al. publication with respect to the molecular basis of Lineage 2. While Sun et al. based the analysis on several selected genomic single nucleotide variants (SNVs) and a subset of highly variable but phylogenetically poorly informative genes, which may confound the phylogenetic analysis, our reanalysis primarily focused on a complete set of whole genomic SNVs. Based on our reanalysis, only two separate TPA clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all Chinese strains.
The bacterium Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Other subspecies comprise Treponema pallidum subsp. pertenue (TPE) and Treponema pallidum subsp. endemicum (TEN), the causative agents of yaws and bejel, respectively. Since the pathogenic treponemes cannot be continuously cultivated under in vitro conditions, much of our understanding of these pathogens comes from accumulation of genetic and genomic data . As previously shown by whole genome fingerprinting , analysis of several treponemal specific loci [3,4,5], whole genome sequence alignments of TPA reference strains [1, 6], and targeted whole genome sequencing of clinical isolates [7, 8], there is the evidence for two separate genetic subclusters within TPA treponemes, called Nichols-like and SS14-like, differing considerably at the DNA level (~ 400 nt differences).
Recently, Sun et al.  sequenced and analyzed eight TPA samples (propagated in rabbit testes) from syphilis-positive Chinese patients and compared them to the available treponemal genomes. Based on their results, a new “phylogenetic group” of TPA strains was identified and named Lineage 2 (Lineage 1 = Nichols-like, Lineage 3 = SS14-like). This Lineage 2 exhibited a “mosaic genomic structure” characterized by the insertion of Lineage 1-derived genes into the Lineage 3-derived genomic backbone. The authors also indicated that the analyzed Chinese TPA strains (Lineage 2) might be derived from recombination or lateral gene transfer events between Lineage 1 and Lineage 3. Until today, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Therefore, evidence for a new phylogenetic lineage would provide a new insight into the diversity and the phylogenetic relationships of TPA strains.
We reanalyzed the primary sequencing data from Sun et al.  with respect to the molecular basis of Lineage 2 using available SRA data (Illumina HiSeq 2500, 151 bp paired-end; Project Number PRJNA305961) and available TPA reference genomes. In contrast to Sun et al. , we used BWA MEM (instead of Bowtie2) and both Nichols and SS14 reference genome sequences for the genomic alignments and SNV analysis, supplemented with the de novo genome assembly analysis. Briefly, sequencing data were pre-processed with Trimmomatic. Sequencing reads were mapped to the reference genomes using BWA MEM. Resulting mappings were post-processed with Samtools to exclude low quality (MAPQ < 10), secondary, and not properly paired mappings. SNV for individual sequenced samples were called using FreeBayes. Hard-filters were applied to keep only high quality variants as recommended by FreeBayes authors; with a minimal depth of at least 5 (DP > 5) and variant call quality of at least 50 (QUAL > 50). Multiple whole genome alignment SNVs were called using NUCmer. Results were used in the phylogenetic analysis and processed with a custom R script. Only SNV detected in all analyzed samples were used in the analysis. For more details on the data collection and analysis, see Additional file 1.
In our analysis, we mapped 1.65–35.61% of all input read pairs to either the SS14 or the Nichols reference genome (see Additional file 2), with average coverage depth ranging between 58 and 1184× for both reference genomes (see Additional file 3). The remaining read pairs mapped primarily to the rabbit reference genome (57.01–89.80%; see Additional file 2). We achieved 99.19–99.35% and 98.87–99.08% genome coverage in the SS14 and Nichols genome alignments, respectively (see Additional file 3). The genomic regions which cannot be covered by uniquely mapped reads were located mainly in paralogous tpr genes (C, D, E, F, G, I, J, and K), RNA operons, and genes containing repetitive sequences, i.e. tp0433 (arp) and tp0470 (see Additional file 4). Overall mapping and genome coverage statistics calculated for the individual Chinese strains are summarized in additional files (see Additional files 2, 3 and 4).
Based on the multiple whole genome alignment of 14 TPA genome sequences, the number of identified SNVs obtained from the sequencing data for the individual Chinese samples ranged between 14 and 19 when compared to the SS14 reference genome and 282–327 when compared to the Nichols reference genome (see Additional file 5). Moreover, additional detailed variant calling analysis of the sequencing data of individual Chinese samples (using FreeBayes) was performed and showed similar results (data not shown). Two separate branches, supported by a bootstrap support greater than 95%, were identified: one consisted of Nichols, DAL-1, Chicago, and Sea 81-4 strains sharing the same phylogenetic cluster (Lineage 1), the other was formed by the SS14-like TPA strains, including all the Chinese strains (Lineage 3). The clustering of the Chinese samples within Lineage 3 is shown in the Fig. 1. A clustering of Chinese strains with genome sequences of clinical TPA isolates from different countries is shown in a supplementary material in Arora et al. (original Supplementary Figure 6) .
In addition, we identified the tprD2 allele-specific sequence  among the sequencing reads from the Chinese SRA data in all samples (see Additional file 6). While the Nichols reference genome harbors identical copies of tprC and tprD genes, the SS14 reference genome carries the tprD2 allele which is not identical to the tprC gene and differs from the tprD allele by more than 300 nucleotides. The sequence alignment of these alleles comprising the most variable region is shown in Additional file 7.
Moreover, all Chinese samples were found to contain indels identical to those identified in the SS14 genome (see Additional file 6) when compared to the Nichols-like TPA strains.
The analysis of two loci, tp0136 and tp0548, potentially differentiating Nichols-like and SS14-like groups of TPA strains and isolates [11, 12], revealed that these genes were identical to the corresponding SS14 orthologous genes (see Additional file 8). In addition, the analysis of selected genes differentiating Nichols and SS14 reference strains at three or more nucleotide positions (tp0131, tp0136, tp0179, tp0304, tp0346, tp0462, tp0488, tp0515, tp0548, tp0558, and tp1031), showed them to be identical or nearly identical to the SS14 strain (data not shown).
The identification of two genetically distinct TPA lineages has been described by earlier genetic studies [5, 6] and these findings have been supported by recent whole genome sequencing studies [7, 8]. In Arora et al. , the phylogenetic analysis of 28 clinical isolates from different countries showed a clear separation of TPA isolates into two clusters, SS14-like and Nichols-like, although the Nichols-like cluster revealed greater variability. Moreover, Pinto et al.  described three clades—SS14-like (clade I), Nichols-like (clade II) and clade III (represented by only a single genome of the TPA Sea 81-4 strain ), it remains to be clarified whether this putative clade III will be supported by additional strains in the future. However, the Sea 81-4 strain shares the same phylogenetic cluster as the Nichols-like TPA strains (Fig. 1). Until now, no genetic intermediates having a mosaic structure of Nichols-like and SS14-like nucleotide sequences within TPA strains have been identified.
Sun et al.  described a new Lineage 2 of TPA based on a phylogenetic analysis of several genomic SNVs and sequences of tpr genes, presenting the Chinese strains as SS14-like strains containing recombined sequences originating from Nichols-like strains. This mosaic structure of Lineage 2 was characterized by the insertion of Lineage 1-derived genes (in particular tprC, D, G, and J genes) into the Lineage 3-derived genomic backbone.
There are, however, several issues in the presented analyses of Sun et al. . Sun et al. reported the achievement of 99.99% of the genome coverage for all samples using TPA Nichols reference genome (original Table 1 in Sun et al. ). However, TPA genomes have several repetitive regions (representing ~ 1% of the genome length) which cannot be covered by uniquely mapped reads. These regions comprise mainly tpr genes and RNA operons. Long-read sequencing, such as Pacific Biosciences, Oxford Nanopore or even Roche/454, could help to sequence repeat-containing and paralogous regions. However, this sequencing was not performed by Sun et al. . The Bowtie2 settings used by the authors without proper post-processing caused mappings of treponemal reads to wrong genomic locations as well as mapping of the host genome (rabbit) reads to the reference genome. The used read mapping stringency together with the use of inappropriate reference sequence (TPA Nichols) resulted in false chimeric sequences, designated as “Lineage 2”. Inappropriate consensus assembly conditions, combined with the absence of de novo assembly, resulted also in the overlooking of the presence of tprD2 allele-specific sequences in the raw data and filtering out of these tprD2 allele-specific sequences during assembly. Unlike the SS14 genome containing tprC and tprD2 alleles, the Nichols reference genome harbors identical alleles in the tprC and tprD genes.
Sun et al. (, original Figure 4A) used alignment settings and phylogenetic trees of tprC/D loci to draw evolutionary inferences. The use of tpr genes alone to disentangle evolutionary relationships is problematic since these genes are likely subject to intra-strain genomic recombination events [3, 14,15,16] as well as selection pressures, which may confound phylogenetic analyses.
To date, all clinical isolates typed using tp0136 and tp0548 genes, routinely used in sequencing-based molecular typing scheme of syphilis-causing strains, grouped with either Nichols-like or SS14-like TPA groups [11, 12, 17, 18]. Moreover, more widely used enhanced CDC typing scheme sequencing an 83 nt-long fragment of the tp0548 gene , showed that 94.5% of 1974 characterized clinical isolates from different countries belong to the SS14-like group , which is consistent with the findings related to a recent spread of an epidemic cluster . For all the Chinese strains, tp0136 and tp0548 genes together with 11 other loci (differentiating Nichols and SS14 reference strains at three or more nucleotide positions) were shown to be identical or nearly identical to the SS14-like TPA strains.
Our reanalysis was based on all whole genomic SNVs rather than a subset of several genomic SNVs and highly variable but phylogenetically poorly informative genes (i.e. tpr genes) that confounded the Sun et al.  analysis and resulted in misleading conclusions. Based on the whole genome SNV reanalysis, only two separate clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all the Chinese strains. Our data clearly showed that all Chinese strains clustered within SS14-like TPA strains.
Only available SRA data deposited in the NCBI SRA database (Project Number PRJNA305961) were reanalysed.
Only ~ 99% of genome coverage can be achieved for all TPA Chinese strains due to several repetitive regions which cannot be covered by uniquely mapped reads.
Paralogous and repetitive genomic regions comprising tpr genes were excluded during the processing of the sequencing data, therefore mosaic structure identified in tpr genes by Sun et al.  cannot be proved.
Treponema pallidum subsp. pallidum
Treponema pallidum subsp. pertenue
Treponema pallidum subsp. endemicum
sequence read archive
single nucleotide variant
Šmajs D, Norris SJ, Weinstock GM. Genetic diversity in Treponema pallidum: implications for pathogenesis, evolution and molecular diagnostics of syphilis and yaws. Infect Genet Evol. 2012;12:191–202.
Mikalová L, Strouhal M, Čejková D, Zobaníková M, Pospíšilová P, Norris SJ, et al. Genome analysis of Treponema pallidum subsp. pallidum and subsp. pertenue strains: most of the genetic differences are localized in six regions. PLoS ONE. 2010;5:e15713.
Gray RR, Mulligan CJ, Molini BJ, Sun ES, Giacani L, Godornes C, et al. Molecular evolution of the tprC, D, I, K, G and J genes in the pathogenic genus Treponema. Mol Biol Evol. 2006;23:2220–33.
Harper KN, Ocampo PS, Steiner BM, George RW, Silverman MS, Bolotin S, et al. On the origin of treponematoses: a phylogenetic approach. PLoS Negl Trop Dis. 2008;2:e148.
Nechvátal L, Pětrošová H, Grillová L, Pospíšilová P, Mikalová L, Strnadel R, et al. Syphilis-causing strains belong to separate SS14-like or Nichols-like groups as defined by multilocus analysis of 19 Treponema pallidum strains. Int J Med Microbiol. 2014;304:645–53.
Pětrošová H, Pospíšilová P, Strouhal M, Čejková D, Zobaníková M, Mikalová L, et al. Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters. PLoS ONE. 2013;8:e74319.
Arora N, Schuenemann VJ, Jäger G, Peltzer A, Seitz A, Herbig A, et al. Origin of modern syphilis and emergence of a contemporary pandemic cluster. Nat Microbiol. 2016;2:16245.
Pinto M, Borges V, Antelo M, Pinheiro M, Nunes A, Azevedo J, et al. Genome-scale analysis of the non-cultivable Treponema pallidum revers extensive within-patient genetic variation. Nat Microbiol. 2016;2:16190.
Sun J, Meng Z, Wu K, Liu B, Zhang S, Liu Y, et al. Tracing the origin of Treponema pallidum in China using next-generation sequencing. Oncotarget. 2016. https://doi.org/10.18632/oncotarget.10154.
Centurion-Lara A, Giacani L, Godornes C, Molini BJ, Brinck Reid T, Lukehart SA. Fine analysis of genetic diversity of the tpr gene family among treponemal species, subspecies and strains. PLoS Negl Trop Dis. 2013;7:e2222.
Flasarová M, Pospíšilová P, Mikalová L, Vališová Z, Dastychová E, Strnadel R, et al. Sequencing-based molecular typing of Treponema pallidum strains in the Czech Republic: all identified genotypes are related to the sequence of the SS14 strain. Acta Derm Venereol. 2012;92:669–74.
Grillová L, Pětrošová H, Mikalová L, Strnadel R, Dastychová E, Kuklová I, et al. Molecular typing of Treponema pallidum in the Czech Republic during 2011 to 2013: increased prevalence of identified genotypes and of isolates with macrolide resistance. J Clin Microbiol. 2014;52:3693–700.
Giacani L, Iverson-Cabral SL, King JC, Molini BJ, Lukehart SA, Centurion-Lara A. Complete genome sequence of the Treponema pallidum subsp. pallidum Sea81-4 strain. Genome Announc. 2014;2:e00333-14.
Centurion-Lara A, LaFond RE, Hevner K, Godornes C, Molini BJ, Van Voorhis WC, et al. Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol. 2004;52:1579–96.
Giacani L, Molini BJ, Kim EY, Godornes BC, Leader BT, Tantalo LC, et al. Antigenic variation in Treponema pallidum: TprK sequence diversity accumulates in response to immune pressure during experimental syphilis. J Immunol. 2010;184:3822–9.
Giacani L, Brandt SL, Puray-Chavez M, Reid TB, Godornes C, Molini BJ, et al. Comparative investigation of the genomic regions involved in antigenic variation of the TprK antigen among treponemal species, subspecies, and strains. J Bacteriol. 2012;194:4208–25.
Gallo Vaulet L, Grillová L, Mikalová L, Casco R, Rodríguez Fermepin M, Pando MA, et al. Molecular typing of Treponema pallidum isolates from Buenos Aires, Argentina: frequent Nichols-like isolates and low levels of macrolide resistance. PLoS ONE. 2017;12:e0172905.
Mikalová L, Grillová L, Osbak K, Strouhal M, Kenyon C, Crucitti T, et al. Molecular typing of syphilis-causing strains among human immunodeficiency virus-positive patients in Antwerp, Belgium. Sex Transm Dis. 2017;44(6):376–9.
Marra C, Sahi S, Tantalo L, Godornes C, Reid T, Behets F, et al. Enhanced molecular typing of Treponema pallidum: geographical distribution of strain types and association with neurosyphilis. J Infect Dis. 2010;202:1380–8.
Conceptualization: MS, JO, NA, KN, FGC, DS. Formal analysis: MS, JO, NA, KN, FGC. Funding acquisition: MS, LM, DS. Investigation: MS, JO, LM, NA, KN, FGC, DS. Writing—original draft: MS, JO, DS. Writing—review and editing: MS, JO, LM, NA, KN, FGC, DS. All authors read and approved the final manuscript.
Access to computing and storage facilities, owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated. We acknowledge the CF New Generation Sequencing Bioinformatics supported by the CIISB research infrastructure (LM2015043 funded by MEYS CR) for their support with obtaining scientific data presented in this paper. We also thank Thomas Secrest (Secrest Editing, Ltd.) for his assistance with the English revision of the manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
Sequencing data for eight Chinese TPA strains (SHC-0, SHD-R, SHE-V, SHG-I2, B3, C3, K3, and Q3)  analyzed during the current study are available in the NCBI SRA database, Project Number PRJNA305961 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA305961). All data generated during this study are included in this article and its Additional information files. The datasets designated as “not shown” are available from the corresponding author on request.
Consent for publication
Ethics approval and consent to participate
This work was supported by grants from the Grant Agency of the Czech Republic to DS and MS (GA17-25455S, GJ17-25589Y), by grant from the Czech Health Research Council to DS (17-31333A), and by funds from the Faculty of Medicine of the Masaryk University to junior researchers LM and MS.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Data analysis and methods used in the reanalysis of Chinese Treponema pallidum samples.
Mapping statistics of input read pairs mapped to the reference genomes. Sequencing reads derived from the Chinese strain SRA data were mapped to the Treponema pallidum subsp. pallidum (TPA) SS14 and Nichols reference genomes  and to the rabbit genome (Statistics was calculated from post-processed mappings; repetitive and homologous sequences and PCR duplicated reads were excluded from the statistics).
Genome coverage statistics for individual Chinese strains. Sequencing reads derived from the Chinese strains SRA data were mapped to the TPA SS14 and Nichols reference genomes . The number of bases with a coverage depth of 1 or more, number of bases with more than 10× coverage depth and average/median coverage depth are shown. Statistics were calculated from previously post-processed mappings; repetitive and homologous regions and PCR duplicated reads were excluded from statistical analysis.
A Genome coverage statistics—number of non-covered bases for individual Chinese strains. SS14 genome (CP004011.1)  was used as a reference for mapping; coordinates according to the CP004011.1. The list includes all positions without any coverage by mapped reads. B Genome covearge statistics—number of zero covered bases for individual Chinese strains. Nichols genome (CP004010.2)  was used as a reference for mapping; coordinates according to the CP004010.2. The list includes all positions without any coverage by mapped reads.
Number of SNVs from whole genome alignments produced by NUCmer. Only SNVs detected in all analyzed genomes (i.e., positions with the “N” base in any of the compared genomes were not considered) were used in the analysis. Genes tp0433 (arp), tp0470, and tp0897 (tprK) were excluded from analyses. Chinese strains are shown in bold.
Analysis of indels (deletions/insertions) between SS14-like and Nichols-like TPA strains. The Nichols genome (CP004010.2)  was used as a reference for the comparison of TPA strains.
Alignment of tprD/tprD2 alleles. tprD and tprD2 alleles were downloaded from the NCBI GenBank database for each reference Nichols and SS14 TPA strain, CP004010.2 and CP004011.1 , respectively. While the Nichols reference genome harbors identical copies of tprC and tprD genes, the SS14 reference genome carries the tprD2 allele, which is not identical to the tprC gene and differs from the tprD allele by roughly 320 nucleotides. As shown in the alignment, we were able to identify the tprD2 allele (in positions 800–1791 according to the SS14 tprD2 allele) among the sequencing reads from the Chinese SRA data. The alignment was performed using SeqMan software (DNASTAR, Madison, WI, USA).
The phylogenetic trees of the tp0136 and tp0548 genes. The phylogenetic trees were constructed using the Maximum Likelihood method based on the Tamura–Nei model. The bar scale represents the number of substitutions per site. The analysis involved 14 TPA nucleotide sequences including eight derived from the Chinese samples: SHC-0, SHD-R, SHE-V, SHG-I2, B3, C3, K3, and Q3. The T. pallidum subsp. pertenue Fribourg-Blanc sequence  was used as an outgroup. There were totals of 1547 and 1317 positions in the final dataset for tp0136 and tp0548 genes, respectively. For both genes, two separate clusters were identified: one cluster of Nichols-like TPA strains (TPA Lineage 1), and a second cluster of SS14-like TPA strains including all tested Chinese strains (TPA Lineage 3). Both clusters were supported by bootstrap values greater than 95%.
Genomic SNVs used for phylogenetic analysis. List of SNVs used for construction of phylogenetic trees. Only SNVs detected in all analyzed genomes (i.e., positions with the “N” base in any of the compared genomes were not considered) were used in the analysis. Genes tp0433 (arp), tp0470, and tp0897 (tprK) were excluded from analyses. Altogether, 2444 unique SNV positions were identified when the TPE Fribourg-Blanc genome sequence was used as an outgroup. Coordinates (positions) and gene annotations according to the SS14 genome (CP04011.1); “NA” = not annotated (IGR, intergenic region); “.” = deletion.
About this article
Cite this article
Strouhal, M., Oppelt, J., Mikalová, L. et al. Reanalysis of Chinese Treponema pallidum samples: all Chinese samples cluster with SS14-like group of syphilis-causing treponemes. BMC Res Notes 11, 16 (2018). https://doi.org/10.1186/s13104-017-3106-7
- Treponema pallidum
- Genome sequencing
- Phylogenetic analysis
- Single nucleotide variant