Skip to main content

Evolutionary significance and diversification of the phosphoglucose isomerase genes in vertebrates



Phosphoglucose isomerase (PGI) genes are important multifunctional proteins whose evolution has, until now, not been well elucidated because of the limited number of completely sequenced genomes. Although the multifunctionality of this gene family has been considered as an original and innate characteristic, PGI genes may have acquired novel functions through changes in coding sequences and exon/intron structure, which are known to lead to functional divergence after gene duplication. A whole-genome comparative approach was used to estimate the rates of molecular evolution of this protein family.


The results confirm the presence of two isoforms in teleost fishes and only one variant in all other vertebrates. Phylogenetic reconstructions grouped the PGI genes into five main groups: lungfishes/coelacanth/cartilaginous fishes, teleost fishes, amphibians, reptiles/birds and mammals, with the teleost group being subdivided into two subclades comprising PGI1 and PGI2. This PGI partitioning into groups is consistent with the synteny and molecular evolution results based on the estimation of the ratios of nonsynonymous to synonymous changes (Ka/Ks) and divergence rates between both PGI paralogs and orthologs. Teleost PGI2 shares more similarity with the variant found in all other vertebrates, suggesting that it has less evolved than PGI1 relative to the PGI of common vertebrate ancestor.


The diversification of PGI genes into PGI1 and PGI2 is consistent with a teleost-specific duplication before the radiation of this lineage, and after its split from the other infraclasses of ray-finned fishes. The low average Ka/Ks ratios within teleost and mammalian lineages suggest that both PGI1 and PGI2 are functionally constrained by purifying selection and may, therefore, have the same functions. By contrast, the high average Ka/Ks ratios and divergence rates within reptiles and birds indicate that PGI may be involved in different functions. The synteny analyses show that the genomic region harbouring PGI genes has independently undergone genomic rearrangements in mammals versus the reptile/bird lineage in particular, which may have contributed to the actual functional diversification of this gene family.


The early vertebrate evolution has been characterised by a number of whole genome duplications (WGD) [14]. Two rounds of WGD namely 1R and 2R have occurred in vertebrate common ancestor [58]. An additional WGD, relatively more recent, has specifically occurred in teleost fish common ancestor [9, 10]. Whole genome, fragment or single gene duplication is amongst the central evolutionary mechanisms that create genomic innovation through the generation of new genetic variants that confer to organisms a better adaptive capacity to their environment [1114]. Gene duplication produces two paralogues, one of which is free of selective constraints and may accumulate deleterious mutations and eventually becomes a pseudogene [15, 16]. Such non-functional duplicates can be retained in the genome but remain unexpressed and dysfunctional, although more recent studies on gene duplication suggest that pseudogenes might serve some functions [1719]. The pseudogenes are deleted from the genome or become very divergent from the parental gene so that they are no longer recognisable as such [16, 20]. In rare cases, both paralogues are maintained active because they differ in their functional aspects [16, 21, 22]. The evolutionary scenario under which the duplicate adopts a part of the function of the parental gene is known as subfunctionalisation [15, 16, 23], while neofunctionalisation is applied where one duplicate evolves a different and novel function [2325]. The neofunctionalisation requires important genetic changes in the key amino acid positions that are central determinants of the function of the protein [23, 26]. Changes in nucleotides and/or amino acid composition are not the only genetic changes that can lead to functional divergence after gene duplication [27, 28]. Genomic rearrangements including spontaneous genomic deletions may occur and cause a complete loss of some introns, which may lead to changes in exon length and nucleotide compositions [29, 30]. The fusion of remaining adjacent exons following intron loss may create new variants that conserve a part of the parental function or involve a completely different novel function [31, 32].

The observations of differential conservation or loss of duplicates has led to the question of why some variants are conserved in some species and lost in closely related species [33, 34]. This has also raised the question whether the absence of certain duplicates in genomes implies the loss of specific functions or whether the function that they were fulfilling is accomplished by other members of the same gene network or physiological pathways [24]. The relative frequency of duplicate loss or conservation varies considerably among organisms, even between species within the same lineage [35, 36]. An exhaustive analyses of aquaporin and claudin genes in teleost fishes revealed significant differences in the number of paralogs of these gene families between species [3739]. Other examples of differential duplicate conservation or loss have been reported for many gene families in euteleosts [40, 41]. It cannot be ruled out that the failure to identify duplicates in teleost genomes was an artefact of sequence incompleteness due to low sequencing coverage for these genomes [42, 43]. The availability of many genomes distributed over different kingdoms now allows to address such questions (at a large scale).

The phosphoglucose isomerage (PGI) is amongst the genes that have retained both paralogs after teleost-specific WGD [4447]. The PGI protein gene has been identified as a key enzyme of the glycolysis pathway, where it insures the inter conversion between d-glucose-phosphate and d-fructose-6-phosphate [44, 48]. PGI is also involved in other functions, including thermal adaptation, differentiation and mediation of maturation inducer activity, which might result from secondary effects of the basic function [44, 48, 49]. The protein structure of PGI genes in relation to their genomic evolution has been investigated by a few studies using a limited number of taxa [4547] that did not cover the whole vertebrates. It has been thus demonstrated that the electric charges of teleost PGI1 and PGI2 have significantly diverged, which was interpreted as a sub-functionalisation indicating that the two PGI paralogs have evolved to have different functions after duplication [47]. The same authors tried to infer the origin of this sub-functionalisation by applying an evolutionary model to identity sign of positive selection after PGI duplication. However, the results were not clear, indicating that the evolutionary processes that has led to the functional divergence of PGI1 and PGI2 are still not completely understood and need further evaluation. Although large divergence rate between duplicates could be the result of positive selection on novel function, relaxation of selective constraints, and even loss of function, the levels of sequence divergence can provide information on the process of neofunctionalisation among duplicates [50, 51]. The divergence rates of protein sequences are expected to be higher in duplicates that have evolved novel functions compared to those that did not undergone neofunctionalisation. Therefore, an estimation of the ratios of non-synonymous to synonymous changes and divergence rates between PGI1 and PGI2 paralogs and orthologs among and within lineages may help to infer the evolutionary origin and support previous findings on the functional divergence of teleost PGI1 and PGI2.

The objective of the present study was to infer and retrace the evolutionary history of the PGI genes in vertebrates by combining similarity, phylogenetic and conserved synteny analyses. Another objective of this study was to estimate the divergence times between PGI pairwise paralogs and orthologs among and within lineages in order to infer the origin and confirm the functional diversification of this gene family that has been previously reported [46, 47]. The results show that, in addition to the functional divergence resulting from amino acid changes, complex genomic rearrangements including inversion, intron gain and intron deletion have also affected the region harbouring PGI genes after duplication, which has probably led to their actual functional diversification.


Synteny analyses

The genomic location of PGI genes identified in all analysed species as well as their flanking genes is shown in Table 1. Two PGI genes were found in all teleost fishes. The similarity and synteny analyses resealed that these two isoforms correspond to two variants that have been previously characterised in Danio rerio. These two isoforms are referred as PGI1 and PGI2 in this study. Only one PGI isoform was found the in the holostei, the spotted gar Lepisosteus oculatus (Table 1). The first and/or second flanking gene(s) was lost in some of the species. Therefore, the two first adjacent upstream flanking genes: MPHOSPH6: (UFG1) and HSD17B2 (UFG2) of PGI1 isoform were presented. Likewise, the three first adjacent downstream flanking genes: LSM14A (DFG1), SI:DKEY (DFG2) and, THAP9 (DFG3) are also shown in Table 1. Similarly, the two first upstream flanking genes (KIAA0355 and LSM14A) and downstream adjacent flanking genes (WTIP and HSD17B2) of PGI2 are also indicated in Table 1. PGI1 was located on corresponding chromosomes 25 and 6 in D. rerio and Oryzias latipes genomes [52], respectively, while PGI2 was located on corresponding Dicentrarchus labrax (LG5) and Gasterosteus aculeatus (GroupII) chromosomes [53]. More importantly, O. latipes PGI1 and PGI2 are located on two distinct chromosomes (6 and 3) that share a high degree of synteny [54], suggesting that they may have resulted from a duplication of the same genomic region. Also, only one PGI gene was identified in amphibians, reptiles, birds and mammals. The two PGI isoforms identified in fishes are located on different chromosomes. The order of PGIs and their flanking genes is shown in Fig. 1 and Table 1. This order varied between PGI isoforms, between lineages and even within lineages. The PGI1 gene in most teleost species (including the Atlantic cod, Gadus morhua, the zebrafish, D. rerio, Astyanax mexicanus, the two pufferfish Tetraodon nigorviridis and Takifugu rubripes) is flanked upstream and downstream by the hydroxysteroid (17-beta) dehydrogenase 2 (HSD17B2) and LSM14A mRNA processing body assembly factor a (LSM14AA) genes, respectively (Fig. 2a). By contrast, in some species including the stickleback, G. aculeatus, medaka, O. latipes, the tilapia, Oreochromis niloticus, Amazon molly, Poecilia Formosa and the platyfish, Xiphophorus maculatus, the PGI1 gene is flanked upstream by the M-phase phosphoprotein 6 (MPHOSPH6) gene (Fig. 2a), which is the upstream gene of HSD17B2 gene. The second teleost isoform, PGI2 is flanked upstream by the KIAA0355 gene whereas its downstream flanking gene is Wilms tumor 1 interacting protein (WTIP) (Fig. 2a). Only in O. latipes is PGI2 flanked downstream by a different gene, HSD17B2 (Fig. 2a) i.e. the upstream flanking gene of PGI1.

Table 1 Gene and protein ID and genomic location of PGI loci identified in the different species
Fig. 1

The general organisation of genes surrounding PGI genes

Fig. 2

Conserved synteny of PGI and their upstream and downstream flanking genes: teleost fishes (a), Amphibians, reptiles, birds, mammals, hagfishes, lamprey and the coelacanth (b). The different colors surrounding PGI isoforms indicate the different flanking upstream and downstream flanking genes. The names of different flanking genes are indicated in Table 1

The PGI gene identified in amphibians (Duttaphrynus melanostictus and Xenopus tropicalis), reptiles (Pelodiscus sinensis, Anolis carolinensis) and birds (Ficedula albicollis, Anas platyrhynchos and Meleagris gallopavo) is flanked upstream by KIAA0355 whereas the downstream flanking gene is WTIP (Fig. 2b), which is also the downstream flanking gene of teleost PGI2 genes. The mammalian PGI gene has the same upstream flanking gene (KIAA0355) as amphibians, reptiles and birds but their downstream flanking gene is the programmed cell death 2-like gene (PDCD2L) in Mus musculus, Rattus norvegicus, Pongo abelii, Macaca mulatta, Homo sapiens (Fig. 2b) and ENSECAT00000021114 in Equus caballus, uncharacterized protein ECO:0000313 in Sus scrofa, ENSOCUT00000001459 in Oryctolagus cuniculus and uncharacterized protein, ECO:0000313, in Felis catus. The similarity search in Ensembl using BLAT showed very high similarity in these uncharacterised proteins to PDCD2L, suggesting that all mammals analysed in this study have the same downstream gene, PDCD2L, a locus located just before the ubiquitin-like modifier activating enzyme 2 (UBA2), which flanks the WTIP at its upstream part. In other words, there are two genes between mammalian PGI gene and the WTIP gene, which flanks the PGI of vertebrates including teleost PGI2. Based on these syntenic analyses, the PGI gene in amphibians, reptiles, birds and mammals present more similarities with the PGI2 of fishes, and can therefore be qualified as PGI2, as annotated in the GenBank and Ensembl database for most of the species.

In the lamprey, Petromyzon marinus, only one PGI gene was found and its upstream and downstream flanking genes were not identified (Fig. 2b). This PGI gene was the only gene found on the scaffold where it is located. Similarity search using fishes flanking genes did not allow identifying the bordering genes in other scaffolds of the genome. The failure to identify the lamprey PGI flanking genes could be due to the quality of the assembly, which is problematic in this species. In the coelacanth, Latimeria chalumnae, only one PGI gene was identified, which is located on the same scaffold as its downstream flanking gene, WTIP. Its upstream flanking gene was found at the extremity of a different scaffold (Scaffold JH127461.1). The similarity and syntenic analyses indicated that this PGI is more similar to the PGI2 of the other species than to the PGI1 gene. The BLAST search using teleost fish PGI1 gene and its upstream and downstream genes did not reveal the presence of another PGI gene in coelacanth and lamprey, suggesting that these two species have only one PGI isoform. The similarity search identified only one PGI isoform in the elephant shark, Callorhinchus milii, and the lancelet Branchiostoma floridae, and their flanking genes were not successfully identified.

To better understand the origin of PGI gene, I searched for its presence in invertebrate genomes. Similarity search identified only one gene in the Fruitfly, Drosophila melanogaster, flanked upstream and downstream by FBtr0088650 and FBgn0002552, respectively (Fig. 2b). The similarity search indicates that these two genes are different from the upstream and downstream flanking genes of PGI genes found in the other species. The D. melanogaster (DM) PGI gene has a smaller number of exons (5 exons only) compared to the PGI genes of all vertebrate species (Additional file1: Table S1).

Structure of PGI genes

Most of the PGI genes identified in this study comprise 18 exons interrupted by 17 introns. The PGI1 gene of L. oculatus (LO) and S. scrofa (SS) has an additional exon (Additional file 1: Table S1). In T. nigroviridis (TN), both PGI isoforms have 17 exons. In A. mexicanus (AM), PGI1 has 17 exons and PGI2 18 exons whereas in G. morhua (GM), PGI1 consists of 18 exons and PGI2 17 exons. P. sinensis (PS) also have 17 exons and 16 introns (Additional file 1: Table S1). The smaller number of exon in vertebrates was found in P. marinus (MP) and M. gallopavo (MG), 11 and 15 exons, respectively (Additional file 1: Table S1). The multiple alignment of DM PGI with other vertebrate PGI did not show differences in protein sequence lengths. By contrast, the multiple alignment of nucleotide sequences of PM PGI with D. rerio PGI1 and PGI2 shows that the upstream and downstream exons were lost in this species. Similar results were observed for MG whose the amino acid sequence alignment with GG shows that the first part the sequence is missing, suggesting a deletion the first exons.

In fishes, the largest exon for both PGI isoforms is exon 18 with a maximum length of 2227 bp for P. formosa (PF) PGI1 (Additional file 1: Table S1). In the pufferfish, T. rubripes (TR) and T. nigroviridis, the largest exons were respectively exons 12 and 11 with 153 bp each. in G. morhua (GM), the largest exon for both PGI isoforms was exon 12 with a total length of 153 bp (Additional file 1: Table S1). In all fish species, the smallest exon was exon 11 with 44 bp except for T. nigroviridis where the smallest exon was exon 13 with a total length of 22 bp. In M. gallopavo, the smallest exon is comprised of 44 bp but it was exon 8 instead of exon 11 like in the other species (Additional file 1: Table S1). Likewise, the shortest exon for the other eukaryote PGI genes was exon 11 with 44 bp except in L. oculatus PGI, whose exon 11 was of 22 bp in length. In P. sinensis, the shortest exon was exon 1, which count 21 bp. In D. melanogaster the largest exon was exon 1 whereas the shortest was exon 3. The exon 11 was the most conserved between PGI isoforms and between species in term of length. Other exons such as 5, 6, 7, 8, 9 and 11 were also very conserved in term of length between PGI isoforms but also between species. (Additional file 1: Table S1) The upstream and downstream exons seemed to be more variable in term of size for all PGI genes and in all species. The largest and smallest intron was not the same for PGI paralogs and were also variable between PGI orthologs, i.e. between species (Additional file 1: Table S2). For example in G. aculeatus, the largest introns were introns 3–4 (1857 bp) for PGI1 and intron 9–10 (525 bp) for PGI2 whereas the smallest introns were respectively intron 5–6 (75 bp) and 4–5 (84 bp) (Additional file 1: Table S2). In the pufferfish, the largest PGI1intron was intron 1–2 (1079 bp) in T. rubripes and 14–15 (2101 bp) for T. nigroviridis (Additional file 1: Table S2). The smallest intron for PGI1 in both species was the same, intron 12–13 but the length was very different, 76 bp for T. rubripes versus 4 bp for T. nigroviridis.

Principal component analysis

The principal component analysis (PCA) allowed to differentiate the PGI genes into different groups based on the number of exons as well as their length (Fig. 3). All vertebrate PGI genes were clustered in a same group, except L. oculatus (LO), M. gallopavo (MG), P. marinus (PM) and P. sinensis (PS) PGI gene, and T. nigroviridis PGI1(TN1) which are completely distinct on the PCA plot. The PGI of these species are also not grouped together on the PCA plot (Fig. 3). The position of LO on the PCA plot is determined by the exon E16 whereas that of DM is due to the exons E2, E4 and E5 (Fig. 3). The position of PM and TN1 on the PCA map is essentially determined by E3 while that of PS is explained by E17 (Fig. 3).

Fig. 3

Principal component analysis map showing the distribution of PGI genes in different vertebrates according to their exon length. The name given to the PGI gene on the PCA plot represents the initials of the scientific name of each species, followed by a number which refer to the isoform. The blue color indicates the exon. The red color represents all vertebrate PGI clustered together in the same group while the green indicates exceptions, i.e. PGI genes isolated from this group

The repartition of PGI on the PCA plot based on the intron information allowed differentiating vertebrate PGI in three different groups; teleost, reptile/bird and mammal. D. rerio PGI1 (DR1) and PGI2 (DR2), and A. mexicanus PGI2 (AM2) are the only PGI genes that are isolated from other teleost PGI genes. The coelacanth L. chalumnae PGI (LC) and lizard A. carolinensis (AC) are very isolated from other PGI genes, but also isolated among themselves (Fig. 4). While the position of LC on the PCA plot is essentially explained by introns I6 and I7, that of AC seems to be determined by more number of introns including I6, I7, I10, I11, I13 and I17 (Fig. 4). Within the mammal lineage only M. musculus (MM) and P. abelii (PA) PGI genes are isolated in a different sub-group. The mammal group is essentially defined by introns I4, I8, I12, I14 and I19 whereas bird/reptile group is explained by I10, I11, I13 and I17. The position of the teleost group (formed by PGI1 and PGI2 paralogs) on the PCA map is not related to intron length except for DR1, DR2 and AM2, whose the position on the PCA map seems to be related to the intron size (Fig. 4). The lamprey, P. marinus (PM) and the spotted gar, L. oculatus (LO) PGI genes are isolated but closely related on the PCA map and their repartition their position is essentially explained by I2, I16 and I18 (Fig. 4).

Fig. 4

Distribution of PGI genes according to the length of their introns. The name given to PGI on the PCA plot represents the initials of the scientific name of the species, followed by a number which indicates to the isoform. The introns are indicated in blue. PGI groups of fishes and mammals are indicated in red and purple, respectively. The turquoise color represents bird/reptile PGI while bright color represents the lamprey (P. marinus), the spotted gar (L. oculatus) and the coelacanth (L. chalumnae) PGI that are isolated from the other groups

Phylogenetic analyses

The phylogenetic analysis recovered five main PGI lineage-dependent groups: lungfishes/coelacanth/cartilaginous fishes, teleost fishes, amphibians, reptiles/bird and mammals (Fig. 5). The teleost group is subdivided into two different clades comprising the PGI1 and PGI2 genes, respectively (Fig. 5). All teleost PGI1 genes are grouped in the same clade whereas the PGI2 genes are grouped in a different clade. According to the phylogenetic tree, teleost PGI1 and PGI2 are each other sister group and shows a common origin. The phylogenetic tree do not clearly show which of these two teleost PGI isoforms is more related to the PGI gene of reptiles, birds and mammals which all belong to the same clade, a result expected for WGD that produces two sister groups. The phylogenetic tree including the lamprey PGI sequence is not presented here because it was not well resolved, which may be due to the wrong frameshift found in the annotation of the PGI gene in this species. The within-species phylogenetic results of PGIs in the teleost lineage are not congruent with the taxonomic relationships between species. For example G. aculeatus PGI1 is closely related to G. morhua PGI1 whereas its PGI2 is more related to PGI2 of other species such as O. latipes, T. nigroviridis or T. rubripes. Likewise, M. cephalus PGI1 is closely related to that of P. formosa and X. maculatus while its PGI2 showed more similarity with D. labrax PGI2. Possible reasons for this discrepancy are different evolutionary constraints that may have been impacted on PGI isoforms in these species after duplication.

Fig. 5

Maximum likelihood phylogenetic tree of vertebrate PGIs. The phylogenetic tree was conducted on 50 protein sequences using MEGA software version 6. The values at the nodes represent the bootstrap values from 1000 replicates. The tree was rooted with hagfishes and lancelet PGI protein sequences

Natural selection

The average Ka/Ks ratio (Table 2) measured between teleost PGI1 and PGI2 paralogs is 0.64. The Ka/Ks ratio measured was higher than 1 in P. formosa (1.50), G. morhua (2.07) and D. labrax (1.74) and below the average in all other teleost species (Table 2). This average is higher than that measured between teleost PGI1 and PGI2 orthologs, which were 0.07 and 0.06, respectively. The Ka/Ks ratio measured between teleost PGI1 and mammal PGI was 1.96 whereas the average between teleost PGI2 and mammal PGI was 1.30 (Table 2). The average Ka/Ks ratio measured within mammal, reptile and bird lineages was 0.13, 1.38 and 1.43, respectively.

Table 2 Natural selection and functional divergence parameters

The average divergence rate between teleost PGI1 and PGI2 paralogs was 1.27, which was higher than the rate between teleost PGI1 orthologs (0.38) and teleost PGI1 orthologs (0.31) (Table 2). The average divergence rate of teleost PGI1/mammal PGI and teleost PGI1/mammal PGI2 were 2.68, and 2.49, respectively. The average divergence rates within reptile (2.88) and bird (3.31) lineages were significantly higher than that within mammal lineage (0.13) (Table 2). The average GC content between teleost PGI1/PGI2 paralogs was 53 % whereas it was 54 % for teleost PGI1 orthologs and 50 % for PGI2 orthologs (Table 2). The comparison between teleost and mammal showed similar values 56 % for teleost PGI1/mammal PGI and 54 % for teleost PGI2/mammal PGI. The average GC content within lineages was 47 % for bird, 53 % for reptile and 56 % for mammal (Table 2). For all pairwise comparisons, no codon bias was found. The substitution-rate-ratio (SRR) was 1:1:1:1 for all pairwise PGI orthologs and paralogs.

Secondary structure

The comparison of the secondary structure between species revealed a number of changes (including differences in the number of helices, strands and coils) that were observed between PGI1 and PGI2 after duplication (Additional file 2: Figure S1A). The number of α helices of PGI1 and PGI2 was respectively, 22 and 24, the number of β-strands 12 and 11 and the number of coils 36 and 35. The TMD has the same length and is comprised of 22 amino acid residues. A number of substitutions have also occurred in the TMD, including those at positions 1: T-S, 4: M-T, 7:A-V, 9:V-I and 18:V-I. Any of these substitutions were associated with changes in the electric charge of the corresponding amino acid. Thirteen substitutions resulted in electric charge changes in PGI1 and 7 amino acid replacements associated with changes of electric charges in PGI2 (Additional file 2: Figure S1B). Four residues were with an RSA value, 3 of which were common, and only one different and located at the position 18 of the TMD. There were more amino acid residues with RSA value coil and structure compared to helices (Additional file 2: Figure S1C).


The current study aimed at inferring the evolutionary history and functional divergence of PGI genes in vertebrates. Two paralogues were identified in teleost fishes and only one isoform in all other lineages. The phylogenetic reconstructions, which incorporated protein sequences from completely sequenced genome with sequences from individual genomic characterisation recovered five main lineage-specific groups supported by high bootstrap values. This phylogenetic topology is not only supported by the conserved synteny results, but also by the molecular evolution analyses based on the estimation of the average ratio of nonsynonymous to synonymous changes and the divergence rates. The combined phylogenetic reconstructions and synteny-based analyses revealed, as expected, that vertebrate PGI genes originated from two rounds of genome duplications and their functional diversification derived from both amino acid changes and post-duplication rearrangements including inversion, intron gain (e.g. insertion of a genomic fragment) or loss (e.g. by fusion of exons) followed by the fusion of adjacent exons.

Origin and diversification of vertebrate PGI genes

The phylogenetic reconstructions, similarity and conserved synteny analyses allowed the identification of two PGI isoforms (PGI1 and PGI2) in teleost fishes, in agreement with previous findings [46, 47], and only one PGI gene in the other vertebrates including ray-finned fishes, amphibians, birds and mammals. The diversification of PGI gene into PGI1 and PGI2 has likely resulted from the additional WGD that has specifically occurred within the teleost lineage [52]. The phylogenetic tree partitioned vertebrate PGI genes into five main groups (lungfishes/coelacanth/cartilaginous fishes, teleosts, amphibians, reptiles/birds and mammals), which are also supported by the molecular evolution results (Ka/Ks ratios and divergence rates), and synteny-based analyses. Overall, groups identified with the phylogenetic reconstructions are also supported by the repartition of PGI genes on the PCA plot according to their intron length. These results are consistent with previous findings indicating that reptiles and birds together have similar intron length, which are different from intron size of mammal and teleost genes [53, 54]. The positioning of D. rerio (DR) and P. marinus (PM) PGI on the PCA plot out of the teleost group is also consistent with findings from previous studies that have demonstrated that the intron size has expanded in these species compared to other teleosts [55]. The introns are less constrained by natural selection than protein coding fragments of the genome and therefore evolve faster. This may explain why they allowed to differentiate PGI genes into lineage-specific groups. This is in agreement with the absence of well differentiated groups on the PCA map showing the distribution of PGI genes according to exon length.

Although this cannot be inferred from the phylogenetic results, the synteny analyses support that teleost PGI2 is common to the other vertebrates including lungfishes, lamprey and the coelacanth. This suggests that this gene might existed in the genome of common vertebrate ancestor. This contradicts previous findings that the first duplication that gave rise to the vertebrate common PGI gene has occurred at the origin of hagfishes (agnatha) [56]. The presence of a PGI gene similar to the vertebrate isoform in hagfishes (E. yangi) and cartilaginous fishes, the elephant shark (C. milii), which is closely related to the invertebrates PGI (D. melanogaster) strongly supports for the existence of a PGI locus in the common ancestor of all vertebrates.

The teleost PGI1 and PGI2 are not adjacent in the genome, and then did not result from tandem duplications. Instead, they may have resulted from the duplication of large genomic fragments. Although it was not possible to determine whether all teleost PGI paralogs are located on corresponding chromosomes because most of them are located on unordered scaffolds or unknown random chromosomes, medaka PGI1 and PGI2 were identified on two distinct chromosomes (6 and 3) with a high degree of synteny [57]. These two chromosomes are probably corresponding chromosomes that resulted from WGD [57]. These results suggest that the two teleost PGI paralogs may have resulted from teleost-specific WGD that took place after the divergence of teleosts and lungfishes from their common ancestor [3, 52, 58, 59]. This interpretation is in agreement with previous findings based on the observation that the phylogenic position of PGI duplication coincides with the estimated teleost-specific WGD [46, 47].

Natural selection and functional divergence

The significantly higher average pairwise Ka/Ks ratio (0.64) between teleost PGI1 and PGI2 paralogs compared to teleost PGI1 (0.07) or PGI2 (0.06) orthologs implies that each of these isoforms is highly constrained by the function(s)that it plays within this lineage through an intense purifying selection. Purifying selection is supported by the average divergence rate between PGI1 and PGI2 paralogs, which was significantly higher than the average rate recorded for PGI1 or PGI2 orthologs (1.27 versus 0.38 and 0.31, respectively). By contrast, for each species within the teleost lineage, PGI1 and PGI2 paralogs seems subject to high synonymous substitution rates resulting from directional selection, in accordance with the high divergence rates of the PGI1 and PGI2 paralogs. The average pairwise ratio Ka/Ks measured in certain teleost species such as P. formosa, G. morhua and D. labrax were exceptionally high (1.50, 2.07 and 1.74, respectively). This strongly supports the hypothesis of positive selection due to higher synonymous changes that may have resulted from changes in amino acid composition, codon bias or increased mutation rate. The substitution rate ratios measured in this study indicated that there was no codon bias, and the percentage of GC content for all pairwise comparisons was around 54 %, suggesting that the higher Ka/Ks ratios recorded for these three species resulted from an increased mutation rate compared to the remaining species. Given the heterogeneity of environments inhabited by teleost fishes it can be expected that positive selection acts differently on duplicates between species [60]. Certain species colonise environments where others cannot inhabit because of environmental constraints. Therefore, genes essential to adaptation and survival to these environments might be differently constrained, which may explain the differences in Ka/Ks ratios amongst teleosts.

The low average Ka/Ks ratio (0.13) and divergence rate (0.13) within mammals could be indicative of a more recent divergence of PGI orthologs compared to the avian and reptilian lineages. They may also indicate that mammalian PGI orthologs have the same functions in this lineage, i.e. they are being constrained by their different functions. By contrast, the higher Ka/Ks ratios (1.38 and 1.43, respectively) in the reptile and bird lineages may indicate that PGI genes have evolved different or novel functions. This interpretation is strongly supported by the average divergence rate, which was respectively 22 and 25 times the divergence rate measured within the mammal lineage. It has been suggested that the multi-functionality of PGI genes in mammals resulted from gradual changes in amino acid sequences [47]. The conservation of amino acid structure and the electric charge of PGI proteins measured in this study revealed amino acid substitutions between PGI1 and PGI2, some of which were associated with changes in the electric charges of the corresponding residues. There was divergence in the electric charge of PGI amino acid residues after duplication, which occurred more frequently in PGI2 compared to PGI1. Such changes in protein structure can be interpreted as a neo- and/or subfunctionalization, driven by functional constraints differentially exerted on PGI1 and PGI2 isoforms. These new results on the secondary structure and amino acid properties of PGI1 and PGI2 corroborate previous findings [47]. Indeed, the divergent evolution of the electric charges of PGI duplicates have been shown to reflect the specialisation of PGI isoforms [46, 47].

Genomic rearrangements after PGI duplication

The combined length of the five D. melanogaster PGI exons was equivalent to the total length of the 18 exons of fishes and other vertebrate PGI genes, suggesting that the lower number of PGI exons in this species may have resulted from intron deletion and the fusion of adjacent exons following introns loss. This interpretation is supported by the alignment of amino acid sequences of genes (Additional file 3: Figure S2), which showed that D. melanogaster PGI has a length similar to that of its analogue in the other lineages. The sequence similarity of D. melanogaster PGI with that of mammals, birds, reptiles and amphibians is also equivalent to similarities found between these lineages. These results strongly support that the lower number of introns in D. melanogaster PGI compared to other species resulted from intron loss followed by fusions of adjacent exons. On the other hand, as demonstrated by the alignment of gene sequences (Additional file 3: Figure S2), the lower number of exons (15 exons) of the GM PGI compared to the number found in GG, implies that some exons at the upstream part of the gene have been lost in this species. The lower number of exons (11 exons) of PM PGI may be seen as a result of sequence incompleteness materialised by a sequencing gap in the PGI nucleotide sequence of GG (Additional file 3: Figure S2). However, sequence alignments provide strong evidence that the lower number of exons of the PGI of this species results from the deletion of some exons at both upstream and downstream parts of the sequence. Interestingly, the E13 has a total length of 130 bp in all mammal species in except in SS (44 bp), which also has a E14 different exon length compared to other mammals (84 versus 77 bp). More importantly, the exons E15, E16, E17 and E18 of the SS PGI have respectively the same length as mammal E14, E15, E16 and E17. This provides strong evidence that the additional exon of this species resulted from an insertion of a genomic DNA fragment within E13, which has led to two different and shorter exons (E13 and E14). By contrast, the additional exon of L. oculatus seems to have resulted from a re-organisation of the whole gene after duplication. Indeed, the PGI exons in this species have a different length compared to other vertebrates. This is supported by the contradictory phylogenetic and synteny results that respectively identified the PGI of L. oculatus as PGI1 and PGI2. Such post duplication genomic rearrangements may also explain why the PGI of L. oculatus which a holostean is grouped together with teleost PGI1 and closely related to D. labrax in the phylogenetic tree.

The order of genes surrounding PGIs support the idea that complex genomic rearrangements have occurred after duplication of the genomic fragments harbouring the PGI genes. The micro-synteny around PGI is preserved in most of the species, but in some cases, the first or first two flanking genes were lost. The succession of the three first adjacent downstream flanking genes of the PGI gene in birds/reptiles/amphibians is WTIP-UBA2-PDCD2L, whereas in the mammals analysed, WTIP was not found, and the order of the two flanking genes was inverted (PDCD2L-UBA2). The most parsimonious explanation to this finding is that the genomic fragment harbouring PDCD2L and UBA2 was inverted after duplication, probably after the split of mammals from other tetrapods. The PDCD2L gene was then lost in amphibians while both the UBA2 and PDCD2L genes were lost in the teleost fishes. The re-annotation of the region surrounding PGI did not allow identifying any of these genes, indicating that their misidentification is not due to incomplete sequencing. These genes have been truly lost after duplication, as was the HSD17B2 gene, which was lost in certain fish species after PGI1/PGI2 duplication, but was conserved in the other species. The downstream flanking gene of O. latipes PGI2 is the same as the upstream flanking gene of PGI1 of other teleosts such as T. nigroviridis, G. morhua, D. rerio and A. mexicanus, while its upstream flanking gene is the same as that PGI2 of the other species. The upstream flanking gene of O. latipes PGI1 is the same as that of the remaining teleost species, but its downstream flanking gene is SI:DKEY, which corresponds to the second downstream flanking gene of PGI1 in the other species. The first downstream gene was probably lost in this species, probably through post-duplication rearrangements. These are examples of micro-synteny conservation between paralogs while the sequence homology signals were lost in many cases.


The combined similarity search, conserved synteny and phylogenetic reconstruction analyses conducted in this study allowed an exhaustive clarification of the evolutionary history of PGI genes in vertebrates. The phylogenetic reconstructions differentiated vertebrate PGI genes into different groups, which were also supported by the synteny-based results and the selective and divergence tests. The results further showed that one PGI isoform, teleost PGI2 is shared by all vertebrate species analysed. PGI2 might be involved in the same biochemical pathways or physiological networks in vertebrates. The conservation of amino acid structure and the electric charge of PGI proteins, together with the evolutionary analyses based on Ka/Ks ratios and divergence rates, supports a functional diversification of teleost PGI as previously suggested [47]. Glycolysis, which is the main pathway in which the PGI genes are involved, is an energy metabolic production resource common to all eukaryotic organisms. This may explain why one of the PGI duplicates, PGI2, is shared by all vertebrate species. The PGI isoform specific to teleost fishes may play specific functions within this lineage as evidenced by different selective pressures and divergence rates. These probable novel functions have to be identified and investigated.


Identification of PGI orthologs and their flanking genes

The protein and nucleotide sequences of PGI1 and PGI2 of Mugil cephalus published by Grauvogel et al. [45] were extracted from the GenBank database using the accession number provided by the authors. These well-characterised PGI sequences were blasted against the sea bass, Dicentrarchus labrax genome (, which allowed the identification of two PGI isoforms in this species. Two PGI loci were considered as paralogs or orthologs when the two corresponding nucleotide or protein sequences match on aligned blocks with an average length of at least 80 % with ≥70 % identify. I thus performed synteny-based analyses which consisted of identifying the putative PGI exon–intron structure and the comparison of exon/intron length within and between the main vertebrate lineages from hagfishes to mammals. They also consisted on performing a comprehensive comparative analysis of the genomic region harbouring PGI genes. I also performed a re-annotation of the region potentially harbouring PGI genes when a PGI gene was not previously identified in a given species. Thus the genomic location on chromosomes or scaffolds, as well as their exact position in their genomic entities were determined. Their upstream and downstream flanking genes were then determined and when they were not previously identified, the genomic region potentially harbouring them was re-annotated. To identify the PGI orthologs in the other teleosts, the two isoforms identified in the sea bass genome were blasted against the genome of teleosts available in the Ensembl Genome Browser. The same criteria mentioned above (aligned blocks of protein sequences that match with an average length of at least 80 % with ≥70 % identity) were applied to identify real orthologs. The upstream and downstream genes of each PGI were identified in the other teleost species by similarity search using the nucleotide and protein sequences of genes that flank the PGIs in sea bass. The PGI orthologs of the other vertebrates were obtained from Ensembl by means of blast search using the teleost PGI sequences. When a PGI gene could not be identified in a given species and its the upstream and downstream flanking genes found, their sequences were used for PGI gene identification. Likewise, in cases where a PGI gene and none of its flanking genes were found in a species, the genomic DNA fragment harbouring PGI and the flanking genes was extracted and re-annotated using de novo and/or similarity-based annotation approaches. For the similarity-based annotations, a gene was considered as a PGI or flanking locus when it matched the well-characterised PGI sequences on aligned blocks with an average length of at least 80 % with ≥70 % identify. The protein and nucleotide sequences of predicted genes from the de novo annotation were confirmed as PGI or flanking loci by blast against the well characterised PGI genes using the above criteria. For the species whose genome has not been completely sequenced, the accession numbers of the PGI genes were obtained from the literature and then used to identify the corresponding sequence in GenBank. The information on exon–intron structure of each PGI locus whose genome is not available in the Ensembl Genome Browser was extracted from the transcript-summary table that can be downloaded from the Blast/Blat research output results.

Sequence alignment, phylogenetic and principal component analyses

A phylogenetic tree was reconstructed using protein sequences of PGI genes of species belonging to the main vertebrate lineages. The protein sequences of all PGI isoforms identified in vertebrate species were aligned using MAFFT version 7 ( The Gblocks Server ( was used to improve the alignment. The well-aligned blocks were then used to reconstruct a phylogenetic tree using MEGA software version 6. The maximum likelihood method with the Jones–Taylor–Thornton (JTT) substitution model was used to constructed the phylogenetic tree, which was rooted with the PGI protein sequences of the lancelet, Branchiostoma floridae and the hagfish, Eptatretus yangi. Principal component analysis (PCA) was performed on both PGI exon and intron length separately using the ade4 packages of the R software version v.64 3.1.1. Principal component analysis (PCA) was performed within a phylogenetic context on both PGI exon and intron lengths. Although not commonly used in the comparative genomics, it was considered particularly useful to illustrate the relationship between PGI genes based on their exon and intron lengths. Functional divergence between duplicates could be the results of changes in amino acid residues of the coding sequences, but it could also be related to changes in non-coding regions (including introns) which can lead to functional divergences between duplicates. It has been demonstrated that functional divergence can be caused by amino acid substitutions in coding sequences or alterations of exon/intron structure [61].

Tests for selection

The nonsynonymous (dN) and synonymous (dS) ratio (dN/dS), also known as Ka/Ks ratio or ω was used to measure the evolutionary selective pressure exerted on genes. Pairwise comparisons of Ka/Ks ratios were thus conducted between teleost PGI1 and PGI2 paralogs and, between reptile/bird and mammal PGI. Pairwise comparisons were also conducted between PGI of the latter two groups and teleost PGI2. The intra-specific Ka/Ks ratios between PGI1-PGI2 paralog pairs was calculated in each teleost fish using Nei and Gojobori method implemented in Ka/Ks_Calculator software [62]. There are several methods incorporated in Ka/Ks software calculator for the estimation of Ka/Ks ratios, which include NG [63], LPB [64, 65], MLPB [66] MLWL [66] and YN [67]. All the above listed methods were tested and the results were not significantly different between them. Finally, the NG method was applied and a Fisher’s exact test was used to access the significance of Ka/Ks >1 and Ka/Ks < 1 as implemented in Ka/Ks_Calculator software. Multiple comparison Turkey test was used to evaluate the significance of differences in Ka/Ks ratios between PGI orthologs and paralogs. The same approach was also used to calculate the inter-specific Ka/Ks ratios for each pairwise of PGI2 orthologs from all vertebrate groups that are analysed including teleost fishes. The divergence times were calculated between all PGI paralogs and orthologs using nucleotide sequences.

Secondary structure

The SABLE server ( was used for the functional annotation, which included finding the number of transmembrane domains, predicting the secondary structure, quantifying the relative solvent accessibility (RSA) of amino acid residues along the protein sequences, and identifying physico-chemical property profiles. The RSA represents the solvent-accessible surface areas normalised by the surface area of the residue in the unfolded state, and is used to measure the solvent surface accessible of amino acid residues in a protein. An RSA value of 0 means that the surface area is completely buried whereas an RSA value of 9 is indicative of a fully exposed surface area. The predicted structure were visualised using the POLYVIEW-2D viewer (

Availability of supporting data

All datasets supporting the results of this article are included in the article and its additional files.



phosphoglucose isomerase

UFG2 :

upstream flanking gen 1


upstream flanking gene 2

DFG1 :

downstream flanking gene 1

DFG2 :

downstream flanking gene 2

DFG3 :

downstream flanking gene 3


whole genome duplications


Multiple Alignment using Fast Fourier Transform




principal component analysis


nonsynonymous substitution


synonymous substitution



HSD17B2 :

hydroxysteroid (17-beta) dehydrogenase 2


LSM14A mRNA processing body assembly factor a


M-phase phosphoprotein 6

UBA2 :

ubiquitin-like modifier activating enzyme 2


programmed cell death 2-like gene


wilms tumor 1 interacting protein

DM :

Drosophila melanogaster

LO :

Lepisosteus oculatus

SS :

Sus scrofa

AM :

Astyanax mexicanus

GM :

Gadus morhua,

PS :

Pelodiscus sinensis

MP :

Petromyzon marinus

PF :

P. formosa

TR :

Takifugu rubripes

TN :

Tetraodon nigorviridis

LO :

Lepisosteus oculatus

MG :

Meleagris gallopavo

PS :

Pelodiscus sinensis

LC :

Latimeria chalumnae

AC :

Anolis carolinensis

MM :

Mus musculus

PA :

Pongo abelii

PM :

Petromyzon marinus


  1. 1.

    Donoghue PCJ, Purnell MA. Genome duplication, extinction and vertebrate evolution. Trends Ecol Evol. 2005;20(6):312–9.

    PubMed  Article  Google Scholar 

  2. 2.

    Ohno S. Evolution by gene duplication. New York: Springer; 1970.

    Google Scholar 

  3. 3.

    Taylor JS, Braasch I, Frickey T, Meyer A, Van de Peer Y. Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome Res. 2003;13:382–90.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  4. 4.

    Zhang G, Cohn MJ. Genome duplication and the origin of the vertebrate skeleton. Curr Opin Genet Dev. 2008;18:387–93.

    PubMed  Article  Google Scholar 

  5. 5.

    Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314.

    PubMed Central  PubMed  Article  Google Scholar 

  6. 6.

    McLysaght A, Hokamp K, Wolfe KH. Extensive genomic duplication during early chordate evolution. Nat Genet. 2002;31:200–4.

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Meyer A, Schartl M. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol. 1999;11:699–704.

    CAS  PubMed  Article  Google Scholar 

  8. 8.

    Meyer A. Hox gene variation and evolution. Nature. 1998;391(225):227–8.

    Google Scholar 

  9. 9.

    Glasauer SMK, Neuhauss SCF. Whole genome duplication in teleost fishes and its evolutionary consequences. Mol Genet Genomics. 2014;289:1045–60.

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Meyer A, Van de Peer Y. From 2R to 3R: evidence for a fish specific genome duplication (FSGD). Bioessays. 2005;27:937–45.

    CAS  PubMed  Article  Google Scholar 

  11. 11.

    Duda TF Jr, Palumbi SR. Molecular genetics of ecological diversification: Duplication and rapid evolution of toxin genes of the venomous gastropod Conus. Proc Natl Acad Sci USA. 1999;96:6820–3.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  12. 12.

    Koop BF, Davidson WS. Genomics and the Genome Duplication in Salmonids. Fisheries for Global Welfare and Environment, 5th World Fisheries Congress 2008, pp 77–86.

  13. 13.

    Lynch M, Conery JS. The evolutionary demography of duplicate genes. J Struct Funct Genomics. 2003;3:35–44.

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Warren IA, Ciborowski KL, Casadei E, Hazlerigg DG, Martin S, Jordan WC, Sumner S. Extensive Local gene duplication and functional divergence among paralogs in Atlantic salmon. Genome Biol Evol. 2014;6(7):1790–805.

    PubMed Central  PubMed  Article  Google Scholar 

  15. 15.

    Kellogg E. What happens to genes in duplicated genomes. Proc Natl Acad Sci USA. 2003;100:4369–71.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  16. 16.

    Zhang J. Evolution by gene duplication: an update. Trends Ecol Evol. 2003;18(6):292–8.

    Article  Google Scholar 

  17. 17.

    Harisson PM, et al. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res. 2002;12:272–80.

    Article  Google Scholar 

  18. 18.

    Ota T, Nei M. Evolution of immunoglobulin VH pseudogenes in chickens. Mol Biol Evol. 1995;12:94–102.

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol. 2005;349:27–45.

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Wagner A. The fate of duplicated genes: loss or new function? Bioessays. 1998;20:785–8.

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    Hughes AL. Gene duplication and the origin of novel proteins. PNAS. 2005;102(25):8791–2.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  22. 22.

    Nowak MA, Boerlijst MC, Cooke J, Smith JM. Evolution of genetic redundancy. Nature. 1997;388:167–71.

    CAS  PubMed  Article  Google Scholar 

  23. 23.

    Zhang J, Rosenberg HF, Nei M. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA. 1998;95:3708–13.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  24. 24.

    Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151:1531–45.

    CAS  PubMed Central  PubMed  Google Scholar 

  25. 25.

    Hughes AL. The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond Ser B. 1994;256:119–24.

    CAS  Article  Google Scholar 

  26. 26.

    Hughes AL. Adaptive evolution of genes and genomes. New York: Oxford University Press; 1999. ISBN 0195116267

    Google Scholar 

  27. 27.

    Bader M. Genome rearrangements with duplications. BMC Bioinform. 2010;11:S27.

    Article  Google Scholar 

  28. 28.

    Ponomarev VA, Makarova KS, Aravind L, Koonin EV. Gene duplication with displacement and rearrangement: origin of the bacterial replication protein PriB from the single-stranded DNA-binding protein Ssb. J Mol Microbiol Biotechnol. 2003;5(4):225–9.

    CAS  PubMed  Article  Google Scholar 

  29. 29.

    Ciccarelli FD, von Mering C, Suyama M, Harrington ED, Izaurralde E, Bork P. Complex genomic rearrangements lead to novel primate gene function. Genome Res. 2005;15(3):343–51.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  30. 30.

    Sémon M, Wolfe KH. Rearrangement rate following the whole-genome duplication in teleosts. Mol Biol Evol. 2007;24(3):860–7.

    PubMed  Article  Google Scholar 

  31. 31.

    Hahn Y, Bera TK, Gehlhaus K, Kirsch IR, Pastan IH, Lee B. Finding fusion genes resulting from chromosome rearrangement by analyzing the expressed sequence databases. PNAS. 2004;101(36):13257–61.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  32. 32.

    Marsh JA, Teichmann SA. How do proteins gain new domains? Genome Biol. 2010;11:126.

    PubMed Central  PubMed  Article  Google Scholar 

  33. 33.

    Davis JC, Petrov DA. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol. 2004;2(3):0319.

    Article  Google Scholar 

  34. 34.

    Schnable JC, Pedersen BS, Subramaniam S, Freeling M. Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses. Front Plant Genet Genom. 2011;2(1):2.

    CAS  Google Scholar 

  35. 35.

    Brunet FG, Crollius HR, Paris M, Aury J-M, Gibert P, Jaillon O, Laudet V, Robinson-Rechavik M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol. 2006;23(9):1808–16.

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Makino T, McLysaght A. Positionally biased gene loss after whole genome duplication: evidence from human, yeast, and plant. Genome Res. 2012;22:2427–35.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  37. 37.

    Finn RN, Cerdà J. Aquaporin evolution in fishes. Front Physiol. 2011;2:44.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  38. 38.

    Loh YH, Christoffels A, Brenner S, Hunziker W, Venkatesh B. Extensive expansion of the claudin gene family in the teleost fish, Fugu rubripes. Genome Res. 2004;14:1248–57.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  39. 39.

    Tine M, Kuhl H, Gagnaire P-A, Louro B, Desmarais E, Martins RST, Hecht J, Knaust F, Belkhir K, Klages S, et al. European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation. Nat Commun. 2014;5:5770.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  40. 40.

    Cortesio F, Musilová Z, Stieb SM, Hart NS, Siebeck UE, Malmstrøm M, Tørresen OK, Jentoft S, Cheney KL, Marshall NJ, et al. Ancestral duplications and highly dynamic opsin gene evolution in percomorph fishes. PNAS. 2015;112(5):1493–8.

    Article  Google Scholar 

  41. 41.

    Crow KD, Stadler PF, Lynch VJ, Amemiya C, Wagner GP. The ‘‘Fish-Specific’’ hox cluster duplication is coincident with the origin of teleosts. Mol Biol Evol. 2006;23(1):121–36.

    CAS  PubMed  Article  Google Scholar 

  42. 42.

    Nembaware V, Crum K, Kelso J, Seoighe C. Impact of the presence of paralogs on sequence divergence in a set of mouse-human orthologs. Genome Res. 2002;12:1370–6.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  43. 43.

    Wirthlin M, Lovell PV, Jarvis ED, Mello CV. Comparative genomics reveals molecular features unique to the songbird lineage. BMC Genom. 2014;15:1082.

    Article  Google Scholar 

  44. 44.

    Avise JC, Kitto GB. Phosphoglucose isomerase gene duplication in the bony fishes: an evolutionary history. Biochem Genet. 1973;8:113–32.

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Grauvogel C, Brinkmann H, Petersen J. Evolution of the glucose-6-phosphate isomerase: the plasticity of primary metabolism in photosynthetic eukaryotes. Mol Biol Evol. 2007;24(8):1611–21.

    CAS  PubMed  Article  Google Scholar 

  46. 46.

    H-w Kao, Lee S-C. Phosphoglucose isomerases of hagfish, zebrafish, gray mullet, toad, and snake, with reference to the evolution of the genes in vertebrates. Mol Biol Evol. 2002;19(4):367–74.

    Article  Google Scholar 

  47. 47.

    Sato Y, Nishida M. Post-duplication charge evolution of phosphoglucose isomerases in teleost fishes through weak selection on many amino acid sites. BMC Evol Biol. 2007;7:204.

    PubMed Central  PubMed  Article  Google Scholar 

  48. 48.

    Jeffery CJ, Bahnson BJ, Chien W, Ringe D, Petsko A. Crystal structure of rabbit phosphoglucose isomerase, a glycolytic enzyme that moonlights as neuroleukin, autocrine motility factor, and differentiation mediator. Biochemistry. 2000;39:955–64.

    CAS  PubMed  Article  Google Scholar 

  49. 49.

    Hall JG. The Adaptation of Enzymes to Temperature: Catalytic characterization of glucosephosphate isomerase homologues isolated from Mytilus edulis and Isognomon alatus, bivalve molluscs inhabiting different thermal environments. Mol Biol Evol. 1985;2(3):251–69.

    CAS  PubMed  Google Scholar 

  50. 50.

    Arbiza L, Dopazo J, Dopazo H. Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome. PLoS Comput Biol. 2006;4(2):e38. doi:10.1371/journal.pcbi.0020038.

    Article  Google Scholar 

  51. 51.

    Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. Selection in the evolution of gene duplications. Genome Biol. 2002;3:research0008-research0008.0009.

    Article  Google Scholar 

  52. 52.

    Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, et al. Zebrafish hox clusters and vertebrate genome evolution. Science. 1998;282:1711–4.

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Gelfman S, Burstein D, Penn O, Savchenko A, Amit M, Schwartz S, Pupko T, Ast G. Changes in exon–intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Res. 2012;22:35–50.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  54. 54.

    Zhang Q, Edwards SV. The evolution of intron size in amniotes: a role for powered flight? Genome Biol Evol. 2012;4(10):1033–43.

    PubMed Central  PubMed  Article  Google Scholar 

  55. 55.

    Wang J, Ye LH, Liu QZ, Peng LY, Liu W, Yi XG, Wang YD, Xiao J, Xu K, Hu FZ, et al. Rapid genomic DNA changes in allotetraploid fish hybrids. Heredity. 2015;2015:1–9.

    Google Scholar 

  56. 56.

    Fisher SE, Shaklee JB, Ferris SD, Whitt GS. Evolution of five multilocus isozyme systems in the chordates. Genetica. 1980;52(53):73–85.

    Google Scholar 

  57. 57.

    Takeda H. Draft genome of the medaka fish: a comprehensive resource for medaka developmental genetics and vertebrate evolutionary biology. Dev Growth Differ. 2008;50:S157–66.

    CAS  PubMed  Article  Google Scholar 

  58. 58.

    Postlethwait JH, Woods IG, Ngo-Hazelett P, Yan YL, Kelly PD, Chu F, Huang H, Hill-Force A, Talbot WS. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res. 2000;10:1890–902.

    CAS  PubMed  Article  Google Scholar 

  59. 59.

    Taylor JS, Van de Peer Y, Braasch I, Meyer A. Comparative genomics provides evidence for an ancient genome duplication event in fish. Philos Trans R Soc Lond B Biol Sci. 2001;356:1661–79.

    CAS  PubMed Central  PubMed  Article  Google Scholar 

  60. 60.

    Kotrschal A, Taborsky B. Environmental change enhances cognitive abilities in fish. PLoS Biol. 2010;8(4):e1000351. doi:10.1371/journal.pbio.1000351.

    PubMed Central  PubMed  Article  Google Scholar 

  61. 61.

    Xua G, Guo C, Shan H, Kong H. Divergence of duplicate genes in exon–intron structure. Proc Natl Acad Sci USA. 2012;109(4):1187–92.

    Article  Google Scholar 

  62. 62.

    Zhang Z, Li J, Zhao X, Wang J, Wong G, Yu J. KaKs Calculator: calculating Ka and Ks through model selection and model averaging. Genom Proteom Bioinform. 2006;4:259–63.

    CAS  Article  Google Scholar 

  63. 63.

    Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986;3:418–26.

    CAS  PubMed  Google Scholar 

  64. 64.

    Li WH. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol. 1993;36:96–9.

    CAS  PubMed  Article  Google Scholar 

  65. 65.

    Pamilo P, Bianchi NO. Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. Mol Biol Evol. 1993;10:271–81.

    CAS  PubMed  Google Scholar 

  66. 66.

    Tzeng Y-H, Pan R, Li W-H. Comparison of three methods for estimating rates of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 2004;21:2290–8.

    CAS  PubMed  Article  Google Scholar 

  67. 67.

    Yang Z, Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000;17:32–43.

    CAS  PubMed  Article  Google Scholar 

Download references


I thank Dr. Peter Teske, Dr. Prof. François Bonhomme and Dr. Bruno Guinand for his comments on the manuscript. I also thank the Max Planck Institute for Molecular Genetics (Berlin, Germany) particularly Dr. Ricahard Reinhardt and Heiner Kuhl for using the sea bass genome resources and the Department of Zoology, University of Johannesburg (South Africa) for providing the financial supports (postdoctoral fellowship) that allowed the manuscript preparation. Many thanks also the two anonymous reviewers who made very interesting comments that have helped to improve the manuscript.

Authors’ information

MT is an evolutionary biologist who performed his Doctorate Thesis in Evolutionary Biology and Ecology at the University of Montpellier II, in the department Integrative Biology of the Institute of Sciences and Evolution, under the direction of Dr. François Bonhomme (CNRS) and Dr. Jean-Dominique Durand (IRD). His research focused on genetic and physiological mechanisms underlying the adaptation of fishes to environmental variations including changes in ambient salinity. He performed his first postdoctoral research in Genomics and Evolution in the laboratory of Dr. Richard Reinhardt at the Max Plank Institute for Molecular Genetics in Berlin. Afterward, he performed a second postdoc in Genomics and Next Generation Sequencing at the Genome Centre at MIP Cologne. More specifically, he worked on the sequencing project of the European sea bass (Dicentrarchus labrax L.) genome, which was published to Nature Communication. He is current doing his research at the Molecular Zoology Laboratory at Johannesburg University where he work on a project that aims on investigating thermal adaptation and the role of phenotypic plasticity in coastal invertebrates and/or fishes by studying populations of species whose ranges span multiple South African marine biogeographic provinces.

Competing interests

Eventual competing interests (including personal communications or additional permissions, related manuscripts), sources of financial support, corporate involvement and patent holdings are disclosed.

Author information



Corresponding author

Correspondence to Mbaye Tine.

Additional files

Additional file 1: Table S1. A: Exon count and their corresponding length of the PGI genes in different vertebrate species analysed in this study. B: Intron count of PGI genes and their corresponding length in the different species analysed in this study. The code given to the PGI genes in the table corresponds to the initials of the scientific name of each species, followed by a number, which refers to the isoform.

Additional file 2: Figures S1. A: Secondary structure of PGI1 and PGI2. B: Comparison of electric charges of amino acid residues between PGI1 and PGI2. C: Comparison of relative sovant accessibility (RSA) between PGI1 and PGI2).

Additional file 3: Figures S2. A: CLUSTAL multiple alignment amino acid sequences of Drosophila melanogaster PGI with mammalian (MM: Macaca mulatta), Avian (GG: Gallus gallus), reptile (AC: Anolis carolinenesis), amphibian (DuM: Duttaphrynus melanostictus) and teleost (DR: Danio rerio) PGIs. B: Multiple alignment of amino acid sequences M. gallopavo (MG) PGI with G. gallus (GG) PGI. C: Multiple alignment of nucleotide sequences P. marinus (PM) PGI with D. rerio PGI1 and PGI2.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tine, M. Evolutionary significance and diversification of the phosphoglucose isomerase genes in vertebrates. BMC Res Notes 8, 799 (2015).

Download citation


  • Duplication
  • Divergence
  • Function
  • Fishes
  • Synteny