Genome-wide detection of hybrid genes with multiple components in human
© Tzeng et al; licensee BioMed Central Ltd. 2009
Received: 30 July 2008
Accepted: 06 May 2009
Published: 06 May 2009
Previous studies showed that gene hybrid is one of the principal processes for generating new genes. Although some gene hybrid events have been reported to be inter- or intra-species, there lacks a well-organized method for large scale detection of the events with multiple components. Hence in this study, we focus on building up an efficient method for exploring all candidates of gene hybrid events in human genome and provide useful results for further study.
We have developed a method designated Triad Comparison Algorithm (TCA) to detect all potential N-hybrid events (i.e., an N-hybrid gene and its N non-overlapping component regions derived from N different genes) in human genome. The results reveal that there are many convoluted N-hybrid events with multiple components (N > 2) and that the most complicated N-hybrid genes detected in human by TCA are composed of six component regions. Interestingly, our results show that most of the hybrid events belong to the 3-hybrid category. Furthermore, we observe that a single gene might participate in different events. Twelve genes were found to have dual identities contained in different N-hybrid events (i.e., they were identified as hybrid genes as well as component genes). This points out that to a certain extent the gene hybrid mechanism has generated new genes during the course of human genome evolutionary history.
An efficient method, TCA, is developed for exploring all candidates of hybrid genes in the human genome and provides useful results for the evolutionary analysis. The advantage of TCA is its power of detecting any kinds of hybrid events in any species with a large genome size.
The emergence of new genes is fundamental to the evolution of lineage- or species- specific traits . Duplication of chromosomal segments provides abundant raw material for the formation of new genes [2, 3]. In addition to the gene duplications that have been identified in different scales, recent studies demonstrated that the fusion/fission mechanism may also play an important role in enrichment of new genes and/or genes with multiple protein domains in various species [1, 4–12]. For example, most of the proteins in SCOP  or Pfam  databases harbor two or more domains resulted by a wide variety of domain combinations [15, 16]. Moreover, multiple functional domains in proteins have been considered as essential units for the modular assembly of new genes [17–21]. It has been shown that gene hybrid events across genomes can be used in predicting functional associations of proteins, including physical interactions and complex formations. This prediction relies on an observation that two proteins functioning in the same complex in one organism frequently fused into a single "Rosetta Stone" protein in another organism (i.e., "Rosetta Stone" protein deciphers the interaction between the protein pairs) [7, 22]. A previous study reported that the monkey king gene family of Drosophila melanogaster was originated from retroposition followed by gene fission event .
Recently, the gene fusion/fission has been demonstrated to largely contribute to the evolution of multi-domain proteins in bacteria . From literature, some systematic methods have been proposed for detection of gene hybrid events [6, 12, 22, 24–26]. The methods are not very efficient for genome-wide detection of hybrid events with arbitrary number (N) of components because they are based on pair-wise sequence comparison. Therefore, their computational complexity will increase exponentially as N increases, leading to a difficulty of the large scale detection, especially for multiple components.
In this study, we have built up an efficient method for exhaustively exploring all candidates of gene hybrid events (for all possible N components, where N ≧ 2) with flexible criteria in the human genome. This result reveals that some of the hybrid events are complicated and that some genes seem to undergo such the events multiple times in the human genome.
Results and discussion
Complicated N-hybrid Events Have Longer Component Regions
The number of detected N-hybrid events under different length criteria of components (identity > 0.7 and E-value < 10-10)
# of N-hybrid events
Length of component regions (nucleotides)
The proportions of component regions between different N-hybrid events are significant different (P < 0.001, Kolmogorov-Smirnov Test). Table S1 shows that the 3-hybrid event has the smallest proportion, on average, came from the corresponding component genes under all different length criteria. Although each component region contained in 3-hybrid genes usually has longer length than that contained in 2-hybrid genes, but on average the length of 3-hybrid genes is even longer than that of 2-hybrid genes. This conduces that 3-hybrid event has the smaller proportion of component regions than 2-hybrid event. When the length criteria of component regions are larger than 100 nucleotides, only few N-hybrid events with N = 4 and 5 have been found (Table 1). For these 4- and 5-hybrid events, their proportions of component regions are larger than 3-hybrid events on the whole.
One Gene Could Participate in Different N-hybrid Events
In the results, different N-hybrid events may have the same hybrid gene or part of component genes. In Table S2, we count the number of different hybrid genes and component genes contained in all N-hybrid events with each component length is larger than 50 bp (Additional File 3).
Figure 3B indicates that NM-001012976 is a 3-hybrid gene with two possible combinations of three component genes: (BC036758, AK097920, AK126238) and (BC036758, AK097920, AK127320). The two combinations are different only in one of the component genes, AK126238 and AK127320, which belong to different isoform groups. The similar phenomenon can also be found in some 4-hybrid events (Fig. 3C). The 4-hybrid gene AF424542 has two possible combinations of four components: (AK131276, AF194537, BC036758, NM_207471) and (AK131276, AF194537, AK094887, NM_207471), that have three common component genes.
There are some more complicated cases. Figure 3D shows that two genes, AK127765 and AK124630, both are components of two different 3-hybrid genes, NM_001012976 and AK091740. Another example is shown in Figures 3B and 3C, which a component gene, BC036758, contribute to a 3-hybrid gene, NM_001012976, and a 4-hybrid gene, AF424542, as well. Similarly, the events in Figure 3C (4-hybrid event) and Figure 3A (6-hybrid event) have two common component genes, AK131276 and AF194537.
Multiple Origins of Hybrid Genes in Human
In the results, we have found 12 genes with dual identities, which can be identified as a component gene or a hybrid gene in different hybrid events. For simplicity, we term them "mixed Rosetta Stone" (MRS) genes. They are listed in Table S3 (Additional File 4) and all corresponding events can be found in Additional File 5 and Additional File 6. All the MRS genes are contained in very alike events and usually have some similar component genes.
Proteins that were fused into a single "Rosetta Stone" protein frequently function in the same complex and are involved in the same interaction network [7, 22]. The MRS genes detected in the human genome strongly indicate functional associations of these proteins, including physical interactions and complex formation. This also reveals that multiple occurrences of gene hybrid mechanism and complex network connection between MRS genes. An example for the existence of complicated hybrid gene network could be found in human neoplasia that acquired gene fusions play a causal role in the initiation of the neoplastic process either by activating proto-oncogenes or creating hybrid genes . Hence the origins of the MRS genes are worth further study in disease research.
"3" is fundamental?
Remarkably, we detected that the majority of hybrid events are 3-hybrid regardless of the component length criteria used (Table 1), while a small proportion of N-hybrid events are found with N > 3. This might indicate that the hybrid mechanisms in the human genome are largely involved with at least three genes. According to previous literatures, hybrid genes can be used for predicting functional associations of proteins, including physical interactions and protein complex formation [7, 22, 23]. For gene-fusion events studied across four genomes – Escherichia coli, Haemophilus influenzae, Methanococcus jannaschii, and Saccharomyces cerevisiae, more than 2 (~2.44) proteins, on average, are involved in each fusion event . Furthermore, the recent studies on the protein-protein interaction network in yeast have shown that the median connectivities of networks from various databases are 3 (while the mean connectivities are varied from 4.11 to 6.61) [28, 29]. If we treated the relatedness defined in our study as a type of connections between genes, it may be the reason that the most detected N-hybrid events are 3-hybrid in the human genome.
We propose a hypothesis that 3-hybrid event is the main composition of gene hybrid mechanism in the human genome. The complete human gene network based on their functional roles is not currently available, yet our preliminary analysis for gene hybrid events still gives some insights on how human genes produced by hybrid mechanism. The study for the demonstration will proceed in the future.
In this study, we have developed a method, TCA, for the detection of all potential N-hybrid events in the human genome. Our result reveals that the hybrid mechanism between genes is an important way to generate new genes in this genome. Furthermore, the results also reveal that one gene could be involved and play opposite roles in different hybrid events. This phenomenon suggests the possibility of multiple occurrences of hybrid mechanism in human genome evolution. It further suggests that the hybrid mechanism is not just an accidental event but an on-going process in the human genome evolution. Another important insight from our results is that the 3-hybrid events may be the basic unit in the complicated hybrid gene network.
The authors would like to express their thanks to Dr. Trees-Juen Chuang of the Genomics Research Center, Academia Sinica, Taiwan, for helpful discussions. We also thank Dr. Meng-Shin Shiao and Dr. Anuphap Prachumwat for the valuable suggestions on the study. This research was supported by a fellowship to Y.H.T. from the Academia Sinica, Taiwan, and funded by a grant to W.H.L. from the National Science Council, Taiwan, by Academia Sinica, Taiwan.
- Long M, Betran E, Thornton K, Wang W: The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003, 4 (11): 865-875. 10.1038/nrg1204.View ArticlePubMedGoogle Scholar
- Samonte RV, Eichler EE: Segmental duplications and the evolution of the primate genome. Nat Rev Genet. 2002, 3 (1): 65-72. 10.1038/nrg705.View ArticlePubMedGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, Berlin: SpringerView ArticleGoogle Scholar
- Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature. 1999, 402 (6757): 86-90. 10.1038/47056.View ArticlePubMedGoogle Scholar
- Akiva P, Toporik A, Edelheit S, Peretz Y, Diber A, Shemesh R, Novik A, Sorek R: Transcription-mediated gene fusion in the human genome. Genome Res. 2006, 16 (1): 30-36. 10.1101/gr.4137606.PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30 (7): 1575-1584. 10.1093/nar/30.7.1575.PubMed CentralView ArticlePubMedGoogle Scholar
- Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science. 1999, 285 (5428): 751-753. 10.1126/science.285.5428.751.View ArticlePubMedGoogle Scholar
- Yanai I, Wolf YI, Koonin EV: Evolution of gene fusions: horizontal transfer versus independent events. Genome Biol. 2002, 3 (5): research0024-10.1186/gb-2002-3-5-research0024.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang W, Yu H, Long M: Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species. Nature genetics. 2004, 36 (5): 523-527. 10.1038/ng1338.View ArticlePubMedGoogle Scholar
- Zhang Z, Sun H, Zhang Y, Zhao Y, Shi B, Sun S, Lu H, Bu D, Ling L, Chen R: Genome-wide analysis of mammalian DNA segment fusion/fission. Journal of theoretical biology. 2006, 240 (2): 200-208. 10.1016/j.jtbi.2005.09.016.View ArticlePubMedGoogle Scholar
- Nurminsky DI, Nurminskaya MV, De Aguiar D, Hartl DL: Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature. 1998, 396 (6711): 572-575. 10.1038/25126.View ArticlePubMedGoogle Scholar
- Snel B, Bork P, Huynen M: Genome evolution. Gene fusion versus gene fission. Trends Genet. 2000, 16 (1): 9-11. 10.1016/S0168-9525(99)01924-1.View ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 2004, D226-229. 10.1093/nar/gkh039. 35 Database
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res. 2004, D138-141. 10.1093/nar/gkh121. 32 Database
- Bornberg-Bauer E, Beaussart F, Kummerfeld SK, Teichmann SA, Weiner J: The evolution of domain arrangements in proteins and interaction networks. Cell Mol Life Sci. 2005, 62 (4): 435-445. 10.1007/s00018-004-4416-1.View ArticlePubMedGoogle Scholar
- Orengo CA, Thornton JM: Protein families and their evolution-a structural perspective. Annu Rev Biochem. 2005, 74: 867-900. 10.1146/annurev.biochem.74.082803.133029.View ArticlePubMedGoogle Scholar
- Patthy L: Modular assembly of genes and the evolution of new functions. Genetica. 2003, 118 (2–3): 217-231. 10.1023/A:1024182432483.View ArticlePubMedGoogle Scholar
- Koonin EV, Wolf YI, Karev GP: The structure of the protein universe and genome evolution. Nature. 2002, 420 (6912): 218-223. 10.1038/nature01256.View ArticlePubMedGoogle Scholar
- Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA: Supra-domains: evolutionary units larger than single protein domains. J Mol Biol. 2004, 336 (3): 809-823. 10.1016/j.jmb.2003.12.026.View ArticlePubMedGoogle Scholar
- Doolittle RF: The multiplicity of domains in proteins. Annu Rev Biochem. 1995, 64: 287-314. 10.1146/annurev.bi.64.070195.001443.View ArticlePubMedGoogle Scholar
- Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA: Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 2004, 14 (2): 208-216. 10.1016/j.sbi.2004.03.011.View ArticlePubMedGoogle Scholar
- Enright AJ, Ouzounis CA: Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2001, 2 (9): RESEARCH0034-10.1186/gb-2001-2-9-research0034.PubMed CentralView ArticlePubMedGoogle Scholar
- Pasek S, Risler JL, Brezellec P: Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins. Bioinformatics. 2006, 22 (12): 1418-1423. 10.1093/bioinformatics/btl135.View ArticlePubMedGoogle Scholar
- Skrabanek L, Saini HK, Bader GD, Enright AJ: Computational prediction of protein-protein interactions. Mol Biotechnol. 2008, 38 (1): 1-17. 10.1007/s12033-007-0069-2.View ArticlePubMedGoogle Scholar
- Apic G, Gough J, Teichmann SA: Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001, 310 (2): 311-325. 10.1006/jmbi.2001.4776.View ArticlePubMedGoogle Scholar
- Gabaldon T, Huynen MA: Prediction of protein function and pathways in the genome era. Cell Mol Life Sci. 2004, 61 (7–8): 930-944. 10.1007/s00018-003-3387-y.View ArticlePubMedGoogle Scholar
- Hoglund M, Frigyesi A, Mitelman F: A gene fusion network in human neoplasia. Oncogene. 2006, 25 (18): 2674-2678. 10.1038/sj.onc.1209290.View ArticlePubMedGoogle Scholar
- Prachumwat A, Li WH: Protein function, connectivity, and duplicability in yeast. Mol Biol Evol. 2006, 23 (1): 30-39. 10.1093/molbev/msi249.View ArticlePubMedGoogle Scholar
- Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nature biotechnology. 2004, 22 (1): 78-85. 10.1038/nbt924.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.