Comparative phylogenomics and multi-gene cluster analyses of the Citrus Huanglongbing (HLB)-associated bacterium Candidatus Liberibacter

Background Huanglongbing (HLB, previously known as citrus greening), is associated with Candidatus Liberibacter species and is a serious threat to citrus production world-wide. The pathogen is a Gram negative, unculturable, phloem-limited bacterium with limited known genomic information. Expanding the genetic knowledge of this organism may provide better understanding of the pathogen and possibly develop effective strategies for control and management of HLB. Results Here, we report cloning and characterization of an additional 14.7 Kb of new genomic sequences from three different genomic regions of the Candidatus Liberibacter asiaticus (Las). Sequence variation analyses among the available Ca. Liberibacter species sequences as well as the newly cloned 1.5 Kb of rpoB gene from different Ca. Liberibacter strains have identified INDELs and SNPs. Phylogenetic analysis of the deduced protein sequences from the cloned regions characterizes the HLB-associated Candidatus Liberibacter as a new clade in the sub-division of the α-proteobacteria. Conclusion Comparative analyses of the cloned gene regions of Candidatus Liberibacter with members of the order Rhizobiales suggest overall gene structure and order conservation, albeit with minor variations including gene decay due to the identified pseudogenes. The newly cloned gene regions contribute to our understanding of the molecular aspects of genomic evolution of Ca. Liberibacter.


Background
Huanglongbing (HLB) previously known as citrus greening, associated with Candidatus Liberibacter species is the most serious threat to citrus production due to reduced fruit quality and tree death [1]. Currently, three major forms of the disease are recognized as being associated with three different Candidatus Liberibacter species, Ca. Liberibacter asiaticus (Las), Ca. Liberibacter africanus (Laf), and Ca. Liberibacter americanus (Lam) [2]. The bacteria associated with HLB are Gram negative, unculturable, phloem-restricted α-proteobacteria and are transmitted by psyllids [1,3]. Until recently, the available genetic information included the 16S rRNA, the rplKAJL-rpoBC operon and the OMP regions [4][5][6][7][8][9]. Collectively, this amounts to ~15.6 Kb of the non-redundant DNA sequences from these three regions of the genome. In our recent report, we cloned and characterized 8.56 Kb of genomic sequences from a Las strain using a genomic walking method [10].
Studies of the comparative gene organization and gene order among related bacteria, can lead to improved understanding of the functional significance of gene arrangements among them [11,12]. Such information can either be derived from phylogenetic profiles [13] or from comparative genome analyses [14]. The information may also provide insight into these organisms' evolutionary history and metabolic capabilities [15].
In this study, we report cloning and characterization of an additional 14.7 Kb of new genomic DNA from Las following the recently described modified genomic walking method [10]. We further discuss the comparative phylogenomics of Ca. Liberibacter species gene structure and evolution to understand the gene organization of this obligate plant pathogen.

Cloning of new genomic regions, gene characteristics and homology studies
Primers were designed based on the previous sequence information and incorporated into the walking strategy as detailed earlier [10]. Using the sequence data from the genomic walk and primers listed in Table 1, genomic DNA fragments were amplified as PCR products directly from the DNA extracts of infected tissue. For details, see Additional file 1.

Region-1
This region was extended by 3,218 bp at the 5' end and by 2,673 bp on the 3' end to the previously reported sequence [10]. With this newly cloned sequence and the previously known sequence, there is 17.1 Kb of DNA sequence known from this region. BLAST based similarity searches of the 5' end identified a partial gene coding for fimbrial assembly protein PilP, and a full-length gene coding for phosphoserine aminotransferase (serA). However,  there is a 950 bp DNA sequence between this gene and the previously identified pseudogene D-3-phosphoglycerate dehydrogenase, where no gene models were identifiable. Partial gene sequence (2.3 Kb) of the rpoC gene was identified at the 3' end. In total there are now 12 genes and one pseudogene in this region.

Region-2
In this study, the 5' end of region-2 was extended by 1,776 bp and the 3' end by 1,908 bp. This brings the DNA sequence known in this region of the genome to 6.6 Kb. Importantly, we have now obtained the full-length sequences of the 16S region, as BLAST searches have identified a pseudogene caused by a frame shift mutation next to the 16S rRNA gene suggesting that we have walked out of the 16S rRNA gene region. At the 3' end, we cloned 1,908 bp of new DNA sequence of the 23S rRNA gene.

Region-3
This region was extended beyond previously reported [10] to a further 4,150 bp at the 5' end and by 1,037 bp of DNA sequence at the 3' end. With this new DNA sequence, the sequence for the omp gene region known from this organism is now 10.5 Kb long. BLAST similarity searches have identified a partial gene coding for putative transmembrane protein (rpsB) at the 5' end, and four fulllength genes coding for proteins and a pseudogene, including elongation factor Ts (tsf), uridylate kinase (pyrH), ribosome recycling factor (frr), frame shift induced pseudogene for undecaprenyl pyrophosphate synthetase (uppS), and phosphatidate cytidylyltransferase (cdsA2). Similarly, at the 3' end, the remaining sequence of the gene coding for the full-length lipid A biosynthesis acyl-[acyl-carrier-protein]-UDP-N-acetylglucosamine Oacyltransferase was cloned, as well as the partial DNA sequence coding for phosphatidate cytidyltransferase gene (COG3494). In total, there are now 11 genes and a pseudogene known in this region.

Inter-strain; inter-species sequence variations
At the time when this study was conducted, a total of 86 Ca. Liberibacter sequences were publicly available in the GenBank databases. These sequences from Las, Laf and Lam strains were compared to identify sequence variations within and among the three genomic regions to define the cumulative diversity.

Phylogenetic analysis
Topology of the 16S rRNA tree for nine Ca. Liberibacter asiaticus strains, two Ca. Liberibacter africanus strains and two Ca. Liberibacter africanus strains segregated according to the existing species classification (Fig. 1A). Interestingly, for the Ca. Liberibacter asiaticus, the strains from Florida, USA and Sao Paulo, Brazil were interspersed with multiple strains from China suggesting the lack of geographic grouping within Ca. Liberibacter asiaticus. Among the three species, only the Ca. Liberibacter africanus strains showed longer inner branches owing to their subspecies status. With respect to the other bacteria used in the analyses, the 16S rRNA Neighbor-joining tree and the concatenated protein sequence (3401 aa) derived tree for Ca. Liberibacter asiaticus showed similar tree topology ( Fig. 1A-B). Both the trees placed Ca. Liberibacter species at the bottom of the order Rhizobiales with a high bootstrap value of 96-100. The tree topology suggests that Ca.
Liberibacter's segregated earlier from the Rhizobiales and Rickettsiales in the α-proteobacterial division, suggesting its independent evolution as a sub-division.

Gene organization and gene order conservation
Region-1 Comparison of the gene order in the Region-1 from nine bacterial species including Ca. Liberibacter species suggests conservation among these bacteria from tsf (Ef-Tu) to the rpoC gene for this region with minor variations (Fig.  2; R. etli figure not included).

Region-2
The gene order of Region-2 was 16S rRNA-tRNA Ile -tRNA Ala -23S rRNA and was relatively conserved among these eight bacterial species with minor variations especially for the genes cloned from the 5' end (Fig. 3).

Region-3
An overall similarity in gene structure conservation was observed for the omp region where the gene order of all the 11 genes (rpsB-tsf-Tu-pyrH-frr-uppS-cdsA2-yaeL-omp-lpxD-fabZ-lpxA) was conserved in seven bacteria (S. meliloti, B. quintana, B. melitensis, R. etli, A. tumefaciens, M. loti and Ca. L. asiaticus (Fig. 4). In these bacteria, there was another conserved gene of the COG03494 group (uncharacterized conserved protein), similar to phosphatidate cytidylyltransferase, at the 3'-end of this 11 gene cluster. In the other six bacteria, M. loti, S. meliloti, B. melitensis, A.tumefaciens B. quintana and R. etli, this gene is followed by lpxA gene. This region was least conserved in the bacterium R. bellii.

Discussion
In this study, we report cloning and characterization of new genomic regions from Ca. Liberibacter species and compare the overall sequence diversity in the sequences of these bacteria in the GenBank.
Comparison of DNA sequences for the 16S, omp and rpoB genes from GenBank and the sequences cloned in this study shows that there is very little sequence variation among the different Ca. Liberibacter species suggesting strong host and/or environmental selection and a genetically stable lineage of the pathogen. The 16S sequences were more conserved among the Las strains while; a slightly higher degree of sequence variation was noted for the Laf strains. Alignments for a ~1.5 Kb region of the rpoB of Las strains and Laf strains revealed that strain from China differed by two SNPs from the Japan, Florida and Brazil strains, which were identical at this locus. These two SNPs were possibly introduced later in the Chinese strain after their separation from the other Las strains.
Our phylogenetic analyses based on the 16S rRNA and the concatenated protein sequences from eight genes, places Ca. Liberibacter species as a new clade in the sub-division of the α-proteobacteria [2]. This agrees with the previously reported 16S rRNA and omp gene based phylogenetic analyses [16,17]. The inclusion of the new genes showed similar results to those based upon the 16S rRNA and the omp gene, suggesting future inclusion of these genes along with the 16S rRNA and other omp genes should enhance understanding of Ca. Liberibacter strain diversity studies, especially in situations where the other two conserved genes fail to differentiate the strains. Our results also suggest that Ca. Liberibacter evolved along with the members of the order Rhizobiales and Rhodobacteriales after the separation of the order Rickettsiales, but branched out before the expansion of the order Rhizobiales. The comparative genomic analysis of eight of these bacteria based on the three genomic loci cloned from the Ca. Liberibacter asiaticus revealed overall gene order and operon conservation with some notable differences. Especially, the selective incorporation/retention of a NDP-

Conclusion
The genomic regions cloned in this study have provided new information for better understanding molecular aspects of genomic evolution of Ca. Liberibacter and taxonomically related bacteria. Outer membrane protein region (Region-3) comparative organization