Genome-based analysis of non-ribosomal peptide synthetase and type-I polyketide synthase gene clusters in all type strains of the genus Herbidospora

The genus Herbidospora comprises actinomycetes belonging to the family Streptosporangiaceae and currently contains five recognized species. Although other genera of this family often produce bioactive secondary metabolites, Herbidospora strains have not yet been reported to produce secondary metabolites. In the present study, to assess their potential as secondary metabolite producers, we sequenced the whole genomes of the five type strains and searched for the presence of their non-ribosomal peptide synthetase (NRPS) and type-I polyketide synthase (PKS) gene clusters. These clusters are involved in the major secondary metabolite–synthetic pathways in actinomycetes. The genome sizes of Herbidospora cretacea NBRC 15474T, Herbidospora mongoliensis NBRC 105882T, Herbidospora yilanensis NBRC 106371T, Herbidospora daliensis NBRC 106372T and Herbidospora sakaeratensis NBRC 102641T were 8.3, 9.0, 7.9, 8.5 and 8.6 Mb, respectively. They contained 15–18 modular NRPS and PKS gene clusters. Thirty-two NRPS and PKS pathways were identified, among which 9 pathways were conserved in all 5 strains, 8 were shared in 2–4 strains, and the remaining 15 were strain-specific. We predicted the chemical backbone structures of non-ribosomal peptides and polyketides synthesized by these gene clusters, based on module number and domain organization of NRPSs and PKSs. The relationship between 16S rRNA gene sequence-based phylogeny of the five strains and the distribution of their NRPS and PKS gene clusters were also discussed. The genomes of Herbidospora strains carry as many NRPS and PKS gene clusters, whose products are yet to be isolated, as those of Streptomyces. Herbidospora members should synthesize large and diverse metabolites, many of whose chemical structures are yet to be reported. In addition to those conserved within this genus, each strain possesses many strain-specific gene clusters, suggesting the diversity of these pathways. This diversity could be accounted for by genus-level vertical inheritance and recent acquisition of these gene clusters during evolution. This genome analysis suggested that Herbidospora strains are an untapped and attractive source of novel secondary metabolites.


Background
Actinomycetes are rich sources for bioactive secondary metabolites. In particular, members of the genus Streptomyces have attracted attention as the most useful screening sources for new drug leads. Since the discovery of streptomycin from Streptomyces griseus, a large number of antibiotics have been identified from cultures of this genus [1,2]. Consequently, the chance of finding novel secondary metabolites from Streptomyces members has recently dwindled. Thus, the focus of screening has moved to less exploited genera of rare actinomycetes. For example, members of the family Streptosporangiaceae are reported to be a promising source, and many novel compounds have been isolated from genera such as Streptosporangium in this family [3].
The genus Herbidospora was established as a new genus of the family Streptosporangiaceae in 1993 and currently contains five species: Herbidospora cretacea, Herbidospora yilanensis, Herbidospora daliensis, Herbidospora sakaeratensis and Herbidospora mongoliensis [4][5][6][7]. Although this genus belongs to the family Streptosporangiaceae, no secondary metabolites have been reported from Herbidospora strains in over 20 years, which motivated us to assess the potential of Herbidospora members as secondary metabolite producers.
Recent genome projects of actinomycetes revealed that each actinomycete genome encodes various biosynthetic pathways, and half to three quarters are associated with non-ribosomal peptide synthase (NRPS) and polyketide synthase (PKS) pathways [8]. This suggested that nonribosomal peptide and polyketide compounds are the major secondary metabolites of actinomycetes [8]. Nonribosomal peptides, polyketides and their hybrid compounds often show pharmaceutically useful bioactivities, many of which have been developed into various drugs, such as antibiotics, anticancer agents and immunosuppressants. Therefore, NRPS and PKS genes in actinomycete strains are often assessed to screen potential secondary metabolite producers [9,10].
Genes for each non-ribosomal peptide and/or polyketide synthesis are generally organized into a gene cluster, in which NRPS and PKS genes play main roles to synthesize non-ribosomal peptides and polyketide chains, respectively. NRPSs and type-I PKSs are mega-synthases, containing multiple catalytic domains organized into modules, where each module carries out a cycle of chain elongation. Typically, each module contains at least three domains: a condensation (C) domain, an adenylation (A) domain and a thiolation (T) domain in NRPS modules; and a ketosynthase (KS) domain, an acyltransferase (AT) domain and an acyl carrier protein (ACP) in type-I PKS modules. Optional domains may also be present in each module to chemically modify elongating chains. The products are synthesized from simple building blocks such as acyl-CoA and amino-acid units based on an accepted theory called the assembly line rule [11]; therefore, the chemical structures of synthesized peptides and/or polyketide backbones can be predicted from domain organizations of the NRPS and/or PKS gene clusters, respectively.
In this study, we sequenced the whole genomes of all type strains of the genus Herbidospora because no Herbidospora genome sequence was registered in public databases when we began this study. We then examined the NRPS and type-I PKS gene clusters in the genome sequences and predicted the chemical backbone structures of these metabolites to assess the potential of the genus as secondary metabolite producers, and to provide information on the novelty and diversity of NRPS and PKS pathways. We also discussed how diversity was acquired during the evolution of Herbidospora species, based on the relationship between the distribution of these pathways and the taxonomic position of each strain.

Whole-genome sequencing
Genomic DNAs of H. cretacea NBRC 15474 T , H. mongoliensis NBRC 105882 T , H. yilanensis NBRC 106371 T , H. daliensis NBRC 106372 T and H. sakaeratensis NBRC 102641 T were prepared from liquid-dried cells in ampoules provided from the NBRC culture collection, using a Qiagen EZ1 tissue kit and an EZ1 advanced instrument (Qiagen), and sequenced using paired-end sequencing with MiSeq (Illumina). The sequence redundancy for the five draft genomes ranged from 61.7 to 70.6. The sequence reads were assembled using Newbler version 2.6 software and subsequently assessed using GenoFinisher software [12].

Analysis of NRPS and type-I PKS gene clusters
Coding sequences in the draft genome sequences were predicted using Prodigal version 2.6 [13]. NRPS and type-I PKS gene clusters were determined as previously reported [9,10]. PKS and NRPS genes having only a single domain were excluded from the present analysis, because we considered them atypical; we focused on multi-domain genes.

Searches for orthologous gene clusters among strains
A BLASTP search was performed using the NCBI Protein BLAST program against the non-redundant protein sequence database. We considered genes of distinct strains to be orthologous when their closest homologs in the BLASTP search were the same, and also when their domain organizations were identical or almost the same.

Prediction of metabolites derived from NRPS and/or type-I PKS gene clusters
We used antiSMASH [14], a website for antibiotics and secondary metabolite analysis, to predict substrates for A domains and AT domains. Based on the substrates and the assembly line rule [11], we predicted the amino acid combinations of peptide chains and chemical structures of polyketide chains synthesized by NRPS and type-I PKS gene clusters, respectively.

Phylogenetic tree based on 16S rRNA gene sequences
16S rRNA gene sequences were downloaded from 'Sequence Information' of the NBRC Culture Catalogue [15], and aligned using ClustalX2 [16]. A phylogenetic tree was reconstructed by the neighbor-joining method [17]. The resultant tree topologies were evaluated by bootstrap analysis [18]. The 16S rRNA gene sequence of Acrocarpospora corrugata NBRC 13972 T was used as the outgroup.

Results and discussion
We sequenced the whole genomes of all the type strains in the genus Herbidospora. The genome sizes ranged from 7.9 to 9.0 Mb, showing medium size compared with those of Streptomyces strains (5.0-11.9 Mb) and of strains in the family Streptosporangiaceae (5.5-13 Mb). The five strains each possessed 15-18 gene clusters for NRPS, PKS/NRPS hybrid and type-I PKS pathways, which were similar to the numbers found in Streptomyces [8,10,[19][20][21][22]. The numbers of the three types of gene clusters in each strain are listed in Table 1. Table 2 shows details of all the clusters found in each genome. Orthologous genes and gene clusters are aligned in the same row of the table. These orthologous genes showed the same domain organization; therefore, their gene clusters should synthesize the same products, as shown in the 'Presumable product' column of Table 2. Among the 32 gene clusters (nrps-1 to -16, pks/nrps-1 to -4, pks-1 to -12) identified from the 5 strains, 9 were conserved in all strains, 8 were shared in 2-4 strains, and 15 were strain-specific. During this study, the draft genome sequence of H. cretacea NRRL B-16917 was published in GenBank/EMBL/DDBJ databases (accession no., JODQ00000000.1). However, it is questionable whether strain NRRL B-16917 is H. cretacea, because its 16S rRNA gene showed higher sequence similarity to those of type strains of H. yilanensis (99.1 %), H. sakaeratensis (98.8 %), H. daliensis (98.3 %) than to the type strain of H. cretacea (98.0 %), and its phylogenetic position was not close to the type strain of H. cretacea in the phylogenetic tree based on 16S rRNA gene sequences (data not shown). The scientific name of strain NRRL B-16917 is unclear; therefore, we did not analyze its NRPS and PKS gene clusters and focused on those of the five type strains in the present study. Table 2 suggested that nine presumable products (nrps-1 to -7, pks/nrps-1, pks-1) are common among all five type strains belonging to the genus Herbidospora. Nrps-1 is assumed to be involved in the synthesis of a siderophore similar to albachelin [23] . Nrps-2 to -6 are predicted to synthesize non-ribosomal peptides comprising 4, 4, 3, 2 and 2 amino acids, respectively, based on their module numbers. Nrps-7 had only a single NRPS module; therefore, we were not able to predict the chemical structure of the product as a peptide. Pks/nrps-1 is a PKS/NRPS hybrid gene encoding a protein comprising three modules for             the synthesis of a starter unit, a polyketide unit and an amino-acid unit, respectively. Pks-1 gene clusters contained seven PKS genes, whose assembly line was composed of nine modules. According to the assembly line rule and the substrates of their AT domains, the gene clusters were assumed to synthesize the polyketide chain shown in Fig. 1a. The structure has similar characteristics to those of antifungal polyene compounds, containing multiple carbon-carbon conjugated double bonds and multiple hydroxyl groups.

Gene clusters shared between/among two to four strains
Nrps-8 gene clusters were present in three strains (H. yilanensis NBRC 106371 T , H. daliensis NBRC 106372 T and H. sakaeratensis NBRC 102641 T ). The nrps-8 gene clusters had two modules; therefore, the products were predicted to be dipeptides. Nrps-9 gene clusters present in H. yilanensis NBRC 106371 T and H. sakaeratensis NBRC 102641 T possessed five modules. According to the predicted substrates of the A domains in each module, the products would be hexapeptides including glycine (Gly), asparagine acid (Asp), lysine and threonine (Thr) as the building blocks. Nrps-10 and nrps-11 gene clusters were present in H. cretacea NBRC 15474 T and H. mongoliensis NBRC 105882 T . The nrps-10 gene clusters contained six modules and two A domains were predicted to incorporate serine (Ser) as the substrates; therefore, the products were predicted to be hexapeptides including two Ser molecules. In contrast, nrps-11 had only a single module, and we were not able to predict the peptide product. products would be molecules derived from C 14 polyketide chains. Substrates of AT domains in modules 5 and 6 were predicted to be methylmalonyl-CoA or ethylmalonyl-CoA, and those of all the remaining modules were malonyl-CoA. Four pairs of dehydratase (DH)-ketoreductase (KR) and one trio of DH-enoylreductase (ER)-KR were present as the optional domains in the gene clusters; therefore, four keto groups would be reduced to four conjugated double bonds and one keto group would be completely reduced, respectively. Hence, we predicted the chemical structure of the polyketide backbones shown in Fig. 1b. Pks-3 genes were present in H. cretacea NBRC 15474 T , H. yilanensis NBRC 106371 T and H. daliensis NBRC 106372 T . They contained only a single module and showed low sequence similarities to characterized PKS genes (data not shown); therefore, we were not able to predict the metabolites. Pks-4 genes were present in H. yilanensis NBRC 106371 T and H. daliensis NBRC 106372 T . They were predicted to be iterative type-I PKSs for enediyne synthesis, called PksE, because they showed higher sequence similarities to PksEs than to normal modular type-I PKSs (data not shown) and included a pair of KR-DH domains, specific for PksE, after the ACP. Their products would be polyketide compounds with an enediyne core [24]. Pks-5 gene clusters were present in H. cretacea NBRC 15474 T and H. mongoliensis NBRC 105882 T ; however, they were not completely sequenced. Hence, we were not able to predict whole chemical structure of the polyketide chain.
H. cretacea NBRC 15474 T possessed 2 specific PKS gene clusters, named pks-6 and pks-7. The pks-6 gene cluster contained 35 modules encoded by 12 PKS genes. To the best of our knowledge, this is the largest type-I PKS gene cluster ever reported. We predicted the chemical structure of the polyketide backbone synthesized by pks-6, as shown in Fig. 1d, which is most likely a novel compound because no similar compounds were found in our database searches. The pks-7 gene cluster contained 6 modules encoded by 4 PKS genes and we predicted the metabolites to be hexaketide compounds with 2 C-C double bonds and one hydroxyl group.
H. mongoliensis NBRC 105882 T possessed a specific PKS/NRPS gene cluster and 3 specific PKS gene clusters, named pks/nrps-2, pks-8, pks-9 and pks10. Pks/ nrps-2 contained 1 PKS module and 13 NRPS modules, encoded by a PKS gene and seven NRPS genes, respectively. According to the module numbers and the A domain substrates, the product was predicted to be large polyketide-non-ribosomal peptide hybrid compound including Ser, Thr and asparagine (Asn). The pks-8 gene cluster contained seven PKS genes encoding 21 modules. As shown in Fig. 1f, its products would be large polyketide compounds with 6 C-C double bonds and 11 hydroxyl groups. The pks-9 contained 3 PKS genes encoding 9 modules. Its products were predicted to be nonaketide compounds with 3 conjugated double bonds and 5 hydroxyl groups (Fig. 1g). The pks-10 gene encoded only a single module; therefore, we were not able to predict the chemical structure of its metabolite.
H. daliensis NBRC 106372 T possessed 2 specific NRPS gene clusters, named nrps-12 and nrps-13. The nrps-12 gene cluster encoded 22 modules, the products of which were predicted to be peptide compounds comprising 22 amino-acids, including 2 phenylalanine (Phe), 7 Asp, 1 Ser, 1 Val, 3 tyrosine (Tyr), and 1 Asn molecules. In contrast, nrps-13 contained only two modules, whose A domains were predicted to incorporate Gly and Ala, respectively. Hence, the products will be dipeptides including Gly and Ala molecules.
H. sakaeratensis NBRC 102641 T possessed 3 specific NRPS gene clusters, one specific PKS/NRPS hybrid gene cluster and two specific PKS gene clusters named nrps-14 to -16, pks/nrps-4, pks-11 and pks-12. In the nrps-14 gene cluster, three NRPS genes were present encoding six modules. The A domains of four modules were predicted to incorporate Val, Ser, Asn and Asn, suggesting that nrps-14 would produce hexapeptides including Val, Ser and two Asn molecules. Nrps-15 had only one module, but the domain organization (A-C-T) was different from that of normal NRPS (C-A-T). Because a CoA-ligase, an ACP, and an NRPS comprising only one C domain were also encoded adjacent to nrps-15 gene (data not shown), this gene cluster might synthesize compounds composed of a starter molecule and a Gly molecule loaded by the unusual NRPS. Nrps-16 also had only a single module; therefore, we were not able to predict its peptide products. The pks/nrps-4 gene cluster encoded at least 15 PKS modules and one NRPS module, but it was not completely sequenced because the sequence of a PKS gene named s08-orf1 was partial and the adjacent genes remain unclear. Although we were not able to predict the whole chemical structure synthesized by this gene cluster, the product will include a C 28 or longer polyketide chain. The pks-11 gene cluster encoded at least twenty modules, although s22-orf1 was not completely sequenced and the adjacent genes were unclear. This product was predicted to be a large compound including a polyhydroxyl polyketide chain, as shown in Fig. 1i. The pks-12 gene cluster encoded only a single module; therefore, we were not able to predict chemical structures of its products.

Distribution and evolutionary history of NRPS and PKS gene clusters
We constructed a phylogenetic tree of the type strains of the genus Herbidospora based on 16S rRNA gene sequences. By mapping the inferred ancestral nodes of the individual gene clusters onto the tree, we traced the evolutionary histories of these pathways (Fig. 2). Nine gene clusters, underlined in Fig. 2, appeared to have been acquired early in the evolution of the genus Herbidospora, because they are conserved in all the type strains. By contrast, 15 gene clusters, indicated by asterisks in Fig. 2, would have been acquired relatively recently, appearing toward the branch terminals in the tree. Gene clusters shared between/among 2-4 strains are in boldface in Fig. 2 Fig. 2 Phylogenetic tree of the type strains of the genus Herbidospora, based on 16S rRNA gene sequences, depicting the inferred ancestry of NRPS and PKS gene clusters. Bootstrap values (>50 %) from 1000 replicates are shown at branch nodes. Arrows indicate acquisitions and losses of NRPS and PKS gene clusters. Gene clusters conserved all the five strains, shared between/among two to four strains, and specific to each strain are underlined, boldfaced, and asterisked, respectively NBRC 102641 T , suggesting that they were acquired just branching off from H. mongoliensis and lost just before evolution to H. sakaeratensis. Nrps-8 gene clusters are present in H. yilanensis NBRC 106371 T , H. sakaeratensis NBRC 102641 T and H. daliensis NBRC 106372 T , suggesting acquisition just before evolution to these three species. Similarly, the nrps-9 and pks-4 gene clusters would also have been acquired at the same point; however, these clusters seemed to have been lost during evolution to H. daliensis and H. sakaeratensis, respectively. To confirm the hypothesis, we conducted phylogenetic analysis of NRPSs and PKSs in gene clusters conserved among more than 4 strains. Except for pks-1, all the phylogenetic trees showed the same topology (Fig. 3) as that based on 16S rDNA sequences (Fig. 2). This supports that nrps-1 to -7, pks/nrps-1 and pks-2 were actually acquired early in the evolution and inherited vertically. In contrast, pks-1 may not be inherited in the same manner as these gene clusters, because the topology of pks-1 phylogenetic tree differed from those of other gene clusters and 16S rDNA sequence.

Conclusions
We concluded the following: (1) The genomes of Herbidospora strains carry as many NRPS and PKS gene clusters as those of other actinomycetes such as Streptomyces; however, their products are yet to be isolated; (2) members of the genus Herbidospora can synthesize large and diverse metabolites, many of whose chemical structures are yet to be reported; (3) each strain possesses 1-6 strain-specific NRPS and/or PKS gene clusters, in addition to those conserved within this genus, suggesting diversity of these pathways; and (4) the diversity of NRPS and PKS pathways in each strain has increased by genus-level vertical inheritance and relatively recent acquisitions of these gene clusters during evolution of this genus.
To summarize, in this study, we sequenced whole genomes of all the five type strains belonging to the genus Herbidospora and examined their NRPS and PKS gene clusters. Each strain harbored 15-18 modular NRPS and PKS gene clusters. Through the comparison of these gene clusters, 32 NRPS and PKS pathways were identified from the 5 strains. Among them, 9 pathways were conserved in all 5 strains, 8 were shared in 2-4 strains, and the remaining 15 were strain-specific suggesting the strain diversity of these pathways. We revealed that these strains harbor a wealth of NRPS and PKS pathways, many of whose products are large and have yet to be discovered. This study also provided useful information about the inferred numbers and molecular structures of secondary metabolites, such as non-ribosomal peptides and polyketides, potentially produced by these strains, suggesting that Herbidospora strains are an untapped and attractive source of novel secondary metabolites.