- Short Report
- Open Access
Evolutionary patterns of RNA-based gene duplicates in Caenorhabditis nematodes coincide with their genomic features
BMC Research Notes volume 5, Article number: 398 (2012)
RNA-based gene duplicates (retrocopies) played pivotal roles in many physiological processes. Nowadays, functional retrocopies have been systematically identified in several mammals, fruit flies, plants, zebrafish and other chordates, etc. However, studies about this kind of duplication in Caenorhabditis nematodes have not been reported.
We identified 43, 48, 43, 9, and 42 retrocopies, of which 6, 15, 18, 3, and 13 formed chimeric genes in C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, respectively. At least 5 chimeric types exist in Caenorhabditis species, of which retrocopy recruiting both N and C terminus is the commonest one. Evidences from different analyses demonstrate many retrocopies and almost all chimeric genes may be functional in these species. About half of retrocopies in each species has coordinates in other species, and we suggest that retrocopies in closely related species may be helpful in identifying retrocopies for one certain species.
A number of retrocopies and chimeric genes exist in Caenorhabditis genomes, and some of them may be functional. The evolutionary patterns of these genes may correlate with their genomic features, such as the activity of retroelements, the high rate of mutation and deletion rate, and a large proportion of genes subject to trans-splicing.
Gene duplicates that pass through an RNA intermediate termed retrocopies, are a kind of nucleotide sequence formed by the reintegration of retrotranscribed mRNAs into new genomic regions. Retrocopies formed recently have three key features: (1) poly-(A) tail; (2) direct repeats; and (3) lack of intron. Moreover, a retrocopy can incorporate sequences from multiple parental source regions and form a hybrid coding sequence, creating a gene chimera[2, 3]. Most retrocopies were thought to be evolutionary dead-ends and functionless, because they probably lacked the expression potential and would be degenerated during evolution[4, 5]. However, sporadic studies have found some retrocopies are functional[6–8]. Few years ago, Betran et al. identified numerous retrocopies in Drosophila and mammals, and concluded that some of them substituted for their parental genes’s functions to avoid the spermatogenesis X inactivation[9, 10]. Subsequently, functional retrocopies have been systematically identified in mammals, fruit flies, plants[13–15], zebrafish and other chordates. Some retrocopies give evidence of having experienced positive selection[18–22], indicating their potential importance in adaptation.
C. elegans is used extensively as a model organism for diverse biological processes. Progress in many research fields, including genetics, molecular biology, and developmental biology, can be attributed to this species. The availability of genome sequences of C. elegans and its relatives provide an opportunity for comparative genomics and evolutionary biology to address features of this genus. Generally, the divergence times separating most species of Caenorhabditis nematodes span many millions of years[23, 24]. Their extent of genome divergence among species can be large, in terms of single nucleotide changes, genome rearrangement, intron gain and loss, and gene family dynamics[23–30]. A large fraction of genes in Caenorhabditis genomes can be arranged in operons[31–33] and subject to trans-splicing. Moreover, two species, C. briggsae and C. elegans, have populations comprised primarily of self-fertilizing hermaphrodites, a mode of reproduction that has originated independently multiple times[28, 35]. Many Caenorhabditis species are associated with other animals. For example, C. briggsae was found in association with snails, C. remanei was found associated with isopods, snails, and other invertebrates, and C. japonica was found associated with shield bugs Parastrachia japonensis (Heteroptera, Cydnidae) in most of their life time[36, 38]. Because of these genomic and organismal features, the evolutionary dynamics of retrocopies that reside in these genomes may be important in adaptive evolution.
Here we identify retrocopies and chimeric genes (together referred to as “new genes” for convenience) in 5 Caenorhabditis nematodes: C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei (Additional file1: Figure S1 shows their phylogenetic relationships). We subsequently compare their abundance among these species. Using expression data from a public databases and RT-PCR experiments, we inferred expression for new genes in C. elegans. To explore the functionality of new genes, we deduced open reading frames of retrocopies through comparison to their parental genes in combination with patterns of conservation.
Datasets compiling and preprocessing
We studied retrocopies in 5 Caenorhabditis species (C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei) using available genome sequences. Data (including genome and peptide sequences) were downloaded from WormBase (release WS215) via their ftp server (ftp://ftp.wormbase.org/). Repetitive sequences in each genome were masked using RepeatMasker. All expression data (ESTs and mRNAs, thereafter called EST for convenience) of C. elegans were downloaded from the University of California, Santa Cruz, database (assembly Aug 2010,http://genome.ucsc.edu/). Various contaminants, low quality and low-complexity sequences within these expression data were screened and trimmed using SeqClean with NCBI’s UniVec as a screening file.
We modified the retrocopy discovery procedures that have been described elsewhere to identify retrocopies in each genome. Firstly, all peptide sequences were queried against their masked genomic sequences using TBLASTN (E-value ≤ 0.1). Secondly, a series of customized perl scripts were used to analyze the TBLASTN results and join putative exons together. We selected the best matched DNA segment sequences for each protein and aligned them using GENEWISE. The structure of the selected DNA sequence (exons, introns and whether it is a pseudogene) can be inferred according to its alignment with corresponding protein sequences and we only selected multi-exon sequences for subsequent analyses. On the other hand, we extracted and merged nearby homology matches (distance < 40 bp) that were not likely separated by introns from the TBLASTN results, and required the query and merged target sequences aligned to one another more than 50 amino acids and had amino acid identity more than 30 %. After performing similarity searches of the merged sequences against multi-exon proteins using FASTA, we selected the closest hit as their candidate parental protein. To confirm the absence of all introns in retrocopy, we compared the merged sequences with their 10,000-bp flanking regions to their candidate parental proteins using GENEWISE. We also required the GENEWISE score > 35 to ensure they had a certain degree of similarity. At last, we checked the absence of all introns manually and assigned their parental-retrogene relationships for each species.
dN and dS estimation and dN/dS ratio test
Pairwise dN and dS statistics for all retrocopies and their parental genes were estimated using the YN00 program of PAML4. We conducted a likelihood ratio test to determine whether dN/dS between pairs of duplicates was significantly different from 0.5. The Codeml program of PAML4 was run two times (first fixing ω = 0.5 and second estimating omega) for each gene pair and twice of the log likelihood difference of these 2 runs was compared to a χ2 distribution with one degree of freedom. dN/dS was smaller than 0.5 and P-value was smaller than 0.01 may denote that the retrocopy is subject to evolutionary constraint and functional[14, 41].
Identification of Chimerical Retrocopy
Here, we considered a retrocopy as a chimeric retrocopy if the flanking coding sequence(s) that the retrocopy recruits is larger than 50 bp according to the WormBase annotation. If the coding sequence of a protein overlapped more than 90 bp with a retrocopy, we compared the coding sequences that the retrocopy recruits to the genomic sequence and flanking 10,000-bp regions of its parental gene to insure that these recruited sequences derived from other regions rather than parental gene and their flanking regions. We also selected retrocopies that a minimum of two introns from the parental gene were absent in alignable regions that were thought to be definite retrocopies and used clustalx to align them with corresponding chmeric genes to scrutinize their relationships.
Distributions of new genes in other Caenorhabditis species
To examine the distribution of retrocopies in other Caenorhabditis genomes for each species, a procedure similar to screening retrocopies aformentioned was used. Briefly, retrocopies in each species were queried against other 4 Caenorhabditis genomes using TBLATX (E-value ≤ 0.1) respectively and nearby homology matches (distance < 40 bp) were extracted and merged. Then, each merged sequence and its flanking 10,000-bp was compared to the corresponding parental protein, and was considered to be a conserved retrocopy in that species when the alignable regions of the query and merged target sequences aligned to one another met the following criteria: (1) longer than 30 amino acids; (2) GENEWISE score > 35; (3) amino acid identity more than 30%; and (4) the merged sequence was intronless. We checked their authenticity manually one by one and discarded false positives. A retrocopy was considered as species-specific if its matched sequences in all other species failed these criteria. For chimeric genes in each species, we identified their othologs in other 4 species using reciprocal best blast hits (RBB) to determine their distribution. We checked genomic positions carefully between new genes that identified using different methods to scrutinize their relationships.
Pairwise whole genome alignment
Softmasked genome sequences for each species were downloaded from WormBase (release WS215). Pairwise whole genome alignment among C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei was carried out using lastz. Then the Chain/Net package was used for post treatments.
Transcription Analysis in Caenorhabditis elegans
Here we conducted transcription analysis in C. elegans since sufficient expression data in public database for this species were available. To ensure that an EST is derived from a retrocopy rather than its parental gene, we followed a relatively complicated pipeline to retain high-quality mappings. We mapped all 391,185 cleaned sequences to genome sequences using BLAT, and retained 280241 sequences that meet the following criteria: mapping length ≥ 150 bp, identity ≥ 98%, coverage within mapping ≥ 97%, and coverage within whole transcript ≥ 75%. We selected the best mapping if a transcript was mapping to multiple genomic loci and discarded those ambiguous mappings (difference in BLAT scores < 2%). Finally, we obtained high-quality and clearly mappings for 277,204 ESTs (71%). Subsequently, we compared genomic positions of retrocopies and the mapped positions of ESTs. A retrocopy was doubtless expressed if it overlapped more than 150 bp with an EST, and probably expressed if it overlapped more than 100 bp with an EST (including ambiguious mappings). For chimeric gene, we mapped its coding sequence to cleaned expression data using BLASTN (E-value ≤ 1e-20), and considered it was expressed if the mapped EST contained sequences longer than 50 bp both from retrocopy and recruited sequences. Only 2 chimeric genes failed this criterion. Expression of new genes lacking EST supports was checked using RT-PCR experiments.
Total RNA was extracted from C elegans samples (N2 strain, mixed stage) using trizol reagent (Invitrogen, Carlsbad, CA, USA). To avoid genomic contamination, we treated Total RNA with DNase I (Promega) to remove genomic DNA. Unique primers were used to amplify target sequences, and mRNAs without being reverse-transcribed was used as negative controls. Purified products were sequenced on an ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, California, United States) and the resulting sequences were deposited in GenBank [GenBank: JN655873-JN655883].
Results and discussions
Retrocopies in Caenorhabditis nematodes
Using a modified retrocopy discovery procedure, we identified retrocopies in 5 Caenorhabditis genomes (C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei), as summarized in Table1 (for more information, please see Additional file2: Table S1). At first, we obtained 141, 90, 82, 365, 134 retrocopies for C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, respectively. However, these numbers may be overestimated because of the following reasons: Firstly, in C. brenneri, C. japonica, and C. remanei, the genome assembly is fragmented and containing many short scaffolds, this may result in false positives owing of a potential failure to detect some introns. Secondly, the nematode introns are lost at a very high rate, thus some false positives may be paralogues that lost introns during evolution. Thirdly, natural heterozygosity may cause part of genome assemblies represented by alleles, especially for gonochoristic Caenorhabditis species. To overcome these problems, we required retrocopies reside in chromosomes (or scaffolds) longer than 50 kb, and in different chromosomes (or scaffolds) compared with their parental genes. If multiple retrocopies share one parental gene, we selected the one located in the longest chromosome, or have lowest dS values compared to the parental gene. As a result, we obtained 43, 48, 43, 9, 42 retrocopies for C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, respectively (Table1).
A retrocopy is thought to be functional if it meets the following criteria: its open reading frame is intact; it is subjected to purifying selection; and it is expressed. Here, we inferred the open reading frames of retrocopies by comparing alignment regions with their parental genes. To explore the conservation of retrocopies, we examined the ratio of nonsynonymous substitutions per site (dN) and synonymous substitutions per site (dS) between them and their functional parental genes. dN/dS (also terme ω) significantly lower than 0.5 indicate functional constraint on both genes. Consequently, we determined that the open reading frames of majority of retrocopies in each species were intact (Table1, and Additional file2: Table S1), and that a small number of them were disrupted (by frameshift mutations or premature stop codons, or both). We also found most retrocopies were subject to purifying selection in Caenorhabditis nematodes (Table1, Additional file2: Table S1). We obtained expression profiles for retrocopies in C. elegans using expression data available from a public database. Out of 43 retrocopies in total, 33 were confidently determined to be expressed, and 1 were probably expressed (Additional file2: Table S1). Additionally, we confirmed the expression for eight other retrocopies via RT-PCR experiments, and found the open reading-frame of the only retrocopy do not have any expression support was disabled. In summary, in C. elegans, all 32 retrocopies subjected to purifying selection are expressed, and 5 disrupted retrocopies are also expressed (Additional file2: Table S1).
Taken together, majority of retrocopies have intact open reading frames and are subjected to purifying selection in Caenorhabditis nematodes, although some others possess interrupted open reading frames are still subject to purifying selection (5, 5, 3, 7 retrocopies in C. brenneri, C. briggsae, C. japonica, and C. remanei, respectively). In C.elegans, only one retrocopy with interrupted open reading frames were not supported by expression data. Therefore, we suggest that a substantial portion of retrocopies in the genomes of Caenorhabditis species are likely functional. This pattern is similar to that in non-mammal chordates. This might result from loss of most non-functional retrocopies owing to the high mutation and deletion rate in Caenorhabditis[49, 50], with only the small fraction of functional retrocopies retained over evolutionary time.
Chimeric genes in Caenorhabditis nematodes
We found 6, 15, 18, 3, 13 retrocopies recruited nearby regions and formed chimeric genes in C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, respectively. Table2 summarizes attributes of chimeric genes in these 5 Caenorhabditis species (more information in Additional file2: Table S1). The open reading frames of a large number of chimeric retrocopies are intact compared to their parental genes (Table2). However, there are still some chimeric retrocopies have interrupted open reading frames. To figure out how interrupted retrocopies form chimeric genes, we scrutinize the structures of retrocopies and corresponding chimeric genes. This analyses was performed for retrocopies that lost at least 2 parental introns to ensure they truly originated by retroposition and the results are shown in Table2 (please see Additional file2: Table S1 for details). At least 5 chimeric types exist in Caenorhabditis species with more types in C. elegans and C. remanei (Figure1). In accordance with results in zebrafish, type III chimeric genesare the commonest in Caenorhabditis species. That is to say, about three-quarter of chimeric genes subjected to our analysis recruited both N and C terminus and formed new chimeric coding sequences, and others are comprised of all kinds of other chimeric types. It is interesting to note that frameshift mutations or premature stopcodons always located in the noncoding regions of chimeric gene, such as upstream of the start codon and introns (Figure1). The phenomenon that part of retrocopy sequence transformed into non-coding sequences in chimeric genes has been reported before in other species[14, 16], indicating its generality. These results suggested that both intact and interrupted retrocopies can form chimeric genes and the interrupted regions transformed into non-coding regions in chimeric genes, no matter what chimeric types they are.
Table2 also shows the ωvalues for many retro-parental gene pairs are significantly less than 0.5. Omega values less than 0.5 was a conservative denotation that both parental gene and retrocopy are subjected to purifying selection. Therefore, some cases that the ω values less than 0.5 significantly may be true, but some others may be false positives and more robust tests should be used since newly originated genes are subject to weaker purifying selection or positive selection frequently during their evolution. We confirmed the expression support by either mRNA or EST sequences in the public databases for 17 out of 18 chimeric genes in C. elegans (Additional file2: Table S1). However, we failed to testify the expression for the other one chimeric retrocopies using RT-PCR experiments. The results that many chimeric retrocopies are under evolutionary constraints and are expressed, which indicated majority of chimeric genes in Caenorhabditis species may be functional. It is interesting that many functional chimeric genes reside in the small Caenorhabditis genomes. A possible reason may be that majority of transcripts in these species are subject to trans-splicing[52, 53]. These trans-spliced transcripts could be retro-transcribed and integrated into new genomic regions and formed chimeric genes. Caenorhabditis nematodes are relatively constant, thus some of these newly originated chimeric genes are retained and modified adaptively to accommodate the environmental change.
Distributions of retrocopy and chimeric genes
We screened each species of Caenorhabditis for homologous copies of new genes in the other four species (see methods), with their distribution shown in Additional file3: Table S2 and Additional file4: Table S3. In C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei, we found 22, 13, 13, 5, 18 retrocopies and 2, 3, 3, 1, 5 chimeric genes are species-specific. This number may be underestimated for retrocopies since the high lost rate of introns in Caenorhabditis nematodes. As a result, we take into account of results from whole genome alignment (synteny), and found 19, 25, 22, 3, 18 retrocopies in C. brenneri, C. briggsae, C. elegans, C. japonica, and C. remanei have coordinates in other species (Additional file3: Table S2), and the analyses following are based on these retrocopies. In a previous research, Bai et al. found that only 1 retrogene in the D. melanogaster genome was species-specific. Our research demonstrated that more retrocopies were species-specific and less was common in genus Caenorhabditis. These results corroborate the high divergence of Caenorhabditis nematodes, which have been reported in previous studies[28, 54].
Generally, the dS values of retrocopies have coordinates in other Caenorhabditis species are higher than those retrocopies that are species-specific (P = 0.053, 0.016, 0.057, 0.007 in C. brenneri, C. briggsae, C. elegans, and C. remanei, Mann–Whitney U-test). The differences should be significant statistically given that nonparametric tests have less "power" to detect a significant difference. The majority of retrocopies have coordinates in other Caenorhabditis species are intact compared to the open reading frame of their parental gene. The same is true for the proportion of retrocopies whose ω values significantly less than 0.5. On the contrary, more than half of species-specific retrocopies have frameshift mutations or premature stop codons compared to their parental sequences except in C. elegans, the proportion of which was as much as 30.8 %. Only a small portion of species-specific retrocopies in each species have omega values significantly less than 0.5. In summary, retrocopies have coordinates in other Caenorhabditis species are older than species-specific ones, and their retention indicated that they have essential functions and are under evolutionary constrains. However, since new genes originated recently can quickly become essential[55, 56], some of these species-specific retrogenes should have functions in Caenorhabditis and will be good candidates for subsequent functional experiments.
We compared chromosome positions of retrocopies identified using different methods, and found only a small portion of them overlapped (data not shown). The number of overlapped retrocopies seems to correlate with their phylogenetic positions since more overlapped retrocopies were found using retrocopies of closely related species as queries and less overlapped were found using retrocopies of distantly related species. We obtained the same distribution pattern for chimeric genes. This may be due to that the mutation rate in Caenorhabditis species is high[28, 50, 57], plus the deletion rate of neutral regions in Caenorhabditis genomes is fast[49, 50]. As a result, retrocopies in these species degenerate fast if they do not function immediately. On the other hand, the sequencing, assembly and annotations for some genomes may be incomplete or have some problems, or other reasons resulted in the absence of parental genes formed the obstacle for obtaining full list of retrocopies in these species. However, we could find out retrocopies that have not been found by just screening one genome, using retrocopies of closely related species to perform homologous screening. We suggest this method should be complementary and useful for identifying retrocopies in one species when the retrocopies of its closely related species can be available. The high mutation rate in Caenorhabditis species may also explain the phenomenon that chimeric genes formed by species-specific retrocopies are not necessarily species-specific.
Here, we identified and compared retrocopies in 5 Caenorhabditis nematodes and explored their functionality. Most retrocopies have intact open reading frames and are conservative suggesting that a majority of retrocopies in each genome may be functional. Moreover, the expression data from public database and RT-PCR experiments demonstrated almost all retrocopies in C. elegans are expressed. In Caenorhabditis nematodes, at least 5 chimeric types exist,and the most common type is retrocopy recruiting both N and C terminus and forming new chimeric coding sequences. Interrupted retrocopies can form chimeric genes in these species,and the interrupted region transformed into non-coding regions. Using homology screening, we obtained the distribution in other 4 Caenorhabditis species for each retrocopy and chimeric gene.
Kaessmann H, Vinckenbosch N, Long MY: RNA-based gene duplication: mechanistic and evolutionary insights. Nat Rev Genet. 2009, 10 (1): 19-31.
Buzdin A, Gogvadze E, Kovalskaya E, Volchkov P, Ustyugova S, Illarionova A, Fushan A, Vinogradova T, Sverdlov E: The human genome contains many types of chimeric retrogenes generated through in vivo RNA recombination. Nucleic Acids Res. 2003, 31 (15): 4385-4390. 10.1093/nar/gkg496.
Buzdin A, Ustyugova S, Gogvadze E, Vinogradova T, Lebedev Y, Sverdlov E: A new family of chimeric retrotranscripts formed by a full copy of U6 small nuclear RNA fused to the 3 ' terminus of L1. Genomics. 2002, 80 (4): 402-406. 10.1006/geno.2002.6843.
Jeffs P, Ashburner M: Processed Pseudogenes in Drosophila. Proc R Soc Lond B Biol Sci. 1991, 244 (1310): 151-159. 10.1098/rspb.1991.0064.
Mighell AJ, Smith NR, Robinson PA, Markham AF: Vertebrate pseudogenes. FEBS Lett. 2000, 468 (2–3): 109-114.
Long MY, Langley CH: Natural-Selection and the Origin of Jingwei, a Chimeric Processed Functional Gene in Drosophila. Science. 1993, 260 (5104): 91-95. 10.1126/science.7682012.
Mccarrey JR, Thomas K: Human Testis-Specific Pgk Gene Lacks Introns and Possesses Characteristics of a Processed Gene. Nature. 1987, 326 (6112): 501-505. 10.1038/326501a0.
Wang W, Brunet FG, Nevo E, Long M: Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster. Proc Natl Acad Sci U S A. 2002, 99 (7): 4448-4453. 10.1073/pnas.072066399.
Betran E, Thornton K, Long M: Retroposed new genes out of the X in Drosophila. Genome Res. 2002, 12 (12): 1854-1859. 10.1101/gr.6049.
Emerson JJ, Kaessmann H, Betran E, Long MY: Extensive gene traffic on the mammalian X chromosome. Science. 2004, 303 (5657): 537-540. 10.1126/science.1090042.
Pan D, Zhang LQ: Burst of Young Retrogenes and Independent Retrogene Formation in Mammals. PLoS One. 2009, 4 (3): e5040-10.1371/journal.pone.0005040.
Bai Y, Casola C, Feschotte C, Betran E: Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila. Genome Biol. 2007, 8 (1): R11-10.1186/gb-2007-8-1-r11.
Wang W, Zheng HK, Fan CZ, Li J, Shi JJ, Cai ZQ, Zhang GJ, Liu DY, Zhang JG, Vang S, et al: High rate of chimeric gene origination by retroposition in plant genomes. Plant Cell. 2006, 18 (8): 1791-1802. 10.1105/tpc.106.041905.
Zhu ZL, Zhang Y, Long MY: Extensive Structural Renovation of Retrogenes in the Evolution of the Populus Genome. Plant Physiol. 2009, 151 (4): 1943-1951. 10.1104/pp.109.142984.
Zhang YJ, Wu YR, Liu YL, Han B: Computational identification of 69 retroposons in Arabidopsis. Plant Physiol. 2005, 138 (2): 935-948. 10.1104/pp.105.060244.
Fu B, Chen M, Zou M, Long M, He S: The rapid generation of chimerical genes expanding protein diversity in zebrafish. BMC Genomics. 2010, 11 (1): 657-10.1186/1471-2164-11-657.
Chen M, Zou M, Fu BD, Li X, Vibranovski MD, Gan XN, Wang DQ, Wang W, Long MY, He SP: Evolutionary Patterns of RNA-Based Duplication in Non-Mammalian Chordates. PLoS One. 2011, 6 (7): e21466-10.1371/journal.pone.0021466.
Zhang Y, Lu SJ, Zhao SQ, Zheng XF, Long MY, Wei LP: Positive selection for the male functionality of a co-retroposed gene in the hominoids. BMC Evol Biol. 2009, 9: 252-10.1186/1471-2148-9-252.
Tracy C, Rio J, Motiwale M, Christensen SM, Betran E: Convergently Recruited Nuclear Transport Retrogenes Are Male Biased in Expression and Evolving Under Positive Selection in Drosophila. Genetics. 2010, 184 (4): 1067-U1296. 10.1534/genetics.109.113522.
Betran E, Long M: Dntf-2r, a young Drosophila retroposed gene with specific male expression under positive Darwinian selection. Genetics. 2003, 164 (3): 977-988.
Babushok DV, Ohshima K, Ostertag EM, Chen XS, Wang YF, Mandal PK, Okada N, Abrams CS, Kazazian HH: A novel testis ubiquitin-binding protein gene arose by exon shuffling in hominoids. Genome Res. 2007, 17 (8): 1129-1138. 10.1101/gr.6252107.
Ohshima K, Igarashi K: Inference for the Initial Stage of Domain Shuffling: Tracing the Evolutionary Fate of the PIPSL Retrogene in Hominoids. Mol Biol Evol. 2010, 27 (11): 2522-2533. 10.1093/molbev/msq138.
Cutter AD: Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate. Mol Biol Evol. 2008, 25 (4): 778-786. 10.1093/molbev/msn024.
Coghlan A, Wolfe KH: Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Res. 2002, 12 (6): 857-867. 10.1101/gr.172702.
Stein LD, Bao ZR, Blasiar D, Blumenthal T, Brent MR, Chen NS, Chinwalla A, Clarke L, Clee C, Coghlan A, et al: The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics. PLoS Biol. 2003, 1 (2): 166-
Gotoh O: Divergent structures of Caenorhabditis elegans cytochrome P450 genes suggest the frequent loss and gain of introns during the evolution of nematodes. Mol Biol Evol. 1998, 15 (11): 1447-1459. 10.1093/oxfordjournals.molbev.a025872.
Cho SC, Jin SW, Cohen A, Ellis RE: A phylogeny of Caenorhabditis reveals frequent loss of introns during nematode evolution. Genome Res. 2004, 14 (7): 1207-1220. 10.1101/gr.2639304.
Kiontke K, Gavin NP, Raynes Y, Roehrig C, Piano F, Fitch DHA: Caenorhabditis phylogeny predicts convergence of hermaphroditism and extensive intron loss. Proc Natl Acad Sci U S A. 2004, 101 (24): 9003-9008. 10.1073/pnas.0403094101.
Robertson HM: Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss. Genome Res. 1998, 8 (5): 449-463.
Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.
Rödelsperger C, Dieterich C: More than expected: Operon turnover in three Caenorhabditis species.https://bioinformatics.cs.vt.edu/~murali/conference-fayfaars/2007-ismb-eccb/ISMBECCB07/Posters/M10Dieterich.pdf,
Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, Thierry-Mieg J, Thierry-Mieg D, Chiu WL, Duke K, Kiraly M, et al: A global analysis of Caenorhabditis elegans operons. Nature. 2002, 417 (6891): 851-854. 10.1038/nature00831.
Cutter AD, Agrawal AF: The evolutionary dynamics of operon distributions in eukaryote genomes. Genetics. 2010, 185 (2): 685-693. 10.1534/genetics.110.115766.
Zorio DAR, Cheng NN, Blumenthal T, Spieth J: Operons as a common form of chromosomal organization in C. elegans. Nature. 1994, 372 (6503): 270-272. 10.1038/372270a0.
Kiontke KC, Felix MA, Ailion M, Rockman MV, Braendle C, Penigault JB, Fitch DH: A phylogeny and molecular barcodes for Caenorhabditis, with numerous new species from rotting fruits. BMC Evol Biol. 2011, 11: 339-10.1186/1471-2148-11-339.
Eisenmann DM: Wnt signaling (June 25, 2005), WormBook, ed. The C. elegans Research Community. WormBook. 2005, 10.1895/wormbook.1.7.1.http://www.wormbook.org,
Sudhaus W, Kiontke K: Phylogeny of Rhabditis subgenus Caenorhabditis (Rhabditidae, Nematoda). J Zool Syst Evol Res. 1996, 34 (4): 217-233.
Kiontke K, Hironaka M, Sudhaus W: Description of Caenorhabditis japonica n. sp (Nematoda : Rhabditida) associated with the burrower bug Parastrachia japonensis (Heteroptera : Cydnidae) in Japan. Nematology. 2002, 4: 933-941. 10.1163/156854102321122557.
Smit A, Hubley R, Green P: RepeatMasker Open-3.0.http://www.repeatmaskerorg 1996–2010
Marques AC, Dupanloup I, Vinckenbosch N, Reymond A, Kaessmann H: Emergence of young human genes after a burst of retroposition in primates. PLoS Biol. 2005, 3 (11): 1970-1979.
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.
Birney E, Durbin R: Dynamite: a flexible code generating language for dynamic programming methods used in sequence comparison. Proc Int Conf Intell Syst Mol Biol. 1997, 5: 56-64.
Pearson WR: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990, 183: 63-98.
Yang ZH: PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.
Harris RS: Improved pairwise alignment of genomic DNA. PhD Thesis. 2007, Pennsylvania City: The Pennsylvania State University
Kent WJ: BLAT–the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.
Barriere A, Yang SP, Pekarek E, Thomas CG, Haag ES, Ruvinsky I: Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. Genome Res. 2009, 19 (3): 470-480.
Witherspoon DJ, Robertson HM: Neutral evolution of ten types of mariner transposons in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae. J Mol Evol. 2003, 56 (6): 751-769. 10.1007/s00239-002-2450-x.
Denver DR, Morris K, Lynch M, Thomas WK: High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature. 2004, 430 (7000): 679-682. 10.1038/nature02697.
Cai JJ, Petrov DA: Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes. Genome Biol Evol. 2010, 2: 393-409. 10.1093/gbe/evq019.
Allen MA, Hillier LW, Waterston RH, Blumenthal T: A global analysis of C. elegans trans-splicing. Genome Res. 2011, 21 (2): 255-265. 10.1101/gr.113811.110.
Eisenmann DM: Wnt signaling (June 25, 2005), WormBook, ed. The C. elegans Research Community. WormBook. 2005, 10.1895/wormbook.1.7.1.http://www.wormbook.org,
Eisenmann DM: Wnt signaling (June 25, 2005), WormBook, ed. The C. elegans Research Community. WormBook. 2005, 10.1895/wormbook.1.7.1.http://www.wormbook.org,
Ding Y, Zhao L, Yang S, Jiang Y, Chen Y, Zhao R, Zhang Y, Zhang G, Dong Y, Yu H, et al: A Young Drosophila Duplicate Gene Plays Essential Roles in Spermatogenesis by Regulating Several Y-Linked Male Fertility Genes. PLoS Genet. 2010, 6 (12): e1001255-10.1371/journal.pgen.1001255.
Chen S, Zhang YE, Long M: New Genes in Drosophila Quickly Become Essential. Science. 2010, 330 (6011): 1682-1685. 10.1126/science.1196380.
Mushegian AR, Garey JR, Martin J, Liu LX: Large-scale taxonomic profiling of eukaryotic model organisms: A comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes. Genome Res. 1998, 8 (6): 590-598.
We are thankful to Professor Asher D. Cutter in University of Toronto for his critical reading of this manuscript and helpful comments and suggestions that greatly improved the manuscript. We also thank Yuan Ren and Wei Chi for their help in experiments. This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program) (No. 2007CB411601) and by the National Natural Science Foundation of China (No. 31071998).
All authors declare that they have no competing interests.
MZ carried out the analyses and drafted the manuscript. SH designed and participated in the analyses. GW participated in the design and helped to draft the manuscript. All authors read and approved the final manuscript.