The plausible reason why the length of 5' untranslated region is unrelated to organismal complexity
© Chen et al; licensee BioMed Central Ltd. 2011
Received: 27 June 2011
Accepted: 27 August 2011
Published: 27 August 2011
Organismal complexity is suggested to increase with the complexity of transcriptional and translational regulations. Supporting this notion is a recent study that demonstrated a higher level of tissue-specific gene expression in human than in mouse. However, whether this correlation can be extended beyond mammals remains unclear. In addition, 5' untranslated regions (5'UTRs), which have undergone stochastic elongation during evolution and potentially included an increased number of regulatory elements, may have played an important role in the emergence of organismal complexity. Although the lack of correlation between 5'UTR length and organismal complexity has been proposed, the underlying mechanisms remain unexplored.
In this study, we select the number of cell types as the measurement of organismal complexity and examine the correlation between (1) organismal complexity and transcriptional regulatory complexity; and (2) organismal complexity and 5'UTR length by comparing the 5'UTRs and multiple-tissue expression profiles of human (Homo sapiens), mouse (Mus musculus), and fruit fly (Drosophila melanogaster). The transcriptional regulatory complexity is measured by using the tissue specificity of gene expression and the ratio of non-constitutively expressed to constitutively expressed genes. We demonstrate that, whereas correlation (1) holds well in the three-way comparison, correlation (2) is not true. Results from a larger dataset that includes more than 15 species, ranging from yeast to human, also reject correlation (2). The reason for the failure of correlation (2) may be ascribed to: Firstly, longer 5'UTRs do not contribute to increased tissue specificity of gene expression. Secondly, the increased numbers of common translational regulatory elements in longer 5'UTRs do not lead to increased organismal complexity.
Our study has extended the evidence base for the correlation between organismal complexity and transcriptional regulatory complexity from mammals to fruit fly, the representative model organism of invertebrates. Furthermore, our results suggest that the elongation of 5'UTRs alone can not lead to the increase in regulatory complexity or the emergence of organismal complexity.
The evolution of organismal complexity is a fundamental issue in biological sciences. A number of hypotheses have been proposed to explain the emergence of organismal complexity, including increases in gene/protein number [1–3], gains of noncoding regulatory elements [1, 2, 4, 5], and expansions of biological networks [2, 6]. A previous study provides evidence that human (a more complex organism) has an increased proportion of genes that are narrowly expressed (indicating increased transcriptional regulatory complexity) than mouse (a less complex organism) . However, the study only compares human and mouse due to data limitations. The close relationship between the two mammalian species has restricted the applicability of the study to a small evolutionary scope. For example, we are not sure whether the suggested correlation between transcriptional regulatory complexity and organismal complexity can be extended to other vertebrates (e.g. birds or fishes) or invertebrate species. Furthermore, the source of the increased regulatory complexity in complex organisms has not been fully explained, although the elongation of 5' untranslated regions (5'UTRs) has been alluded to . Since 5'UTRs are associated with both transcriptional and translational cis-regulations [8–10], the elongation of these non-coding regions may have contributed to increased regulatory complexity . A recent analysis suggested that the length of 5'UTR was unrelated to organismal complexity . However, the analysis did not discuss possible reasons for the lack of correlation. Furthermore, this analysis did not take into consideration the phylogenetic relationships among the compared species (see the discussion below about independent contrast). Therefore, we are interested in reconfirming the lack of correlation between 5'UTR length and organismal complexity and examining the potential underlying molecular mechanisms. To this end, we analyzed the 5'UTR lengths of more than 15 species ranging from yeast to human. Furthermore, to examine the relationship between transcriptional regulatory complexity and 5'UTR length, we analyzed the gene expression data of human, mouse, and fruit fly, for which multiple-tissue gene expression data are available.
Notably, there have been some discussions over how organismal complexity should be measured . However, most of the proposed methods cannot be applied to our study because unbiased quantification of these measurements (e.g. functional complexity , number of transcription factor families , or phenotypic complexity ) for all of the compared species is difficult. Therefore, we selected the number of cell types, a generally acceptable index , as the measurement of organismal complexity.
We also noted that closely related species might have similar genetic features, levels of organismal complexity, and 5'UTR lengths. Such similarities may lead to overweighting of some lineages and biased correlations between biological features . To reduce such biases, we employed independent contrast to correct for the compared genetic characteristics . Independent contrast considers the phylogenetic distances between the compared species and adjusts the weighting of the compared biological features according to the phylogenetic tree of the compared species (see Methods).
Our results indicate that 5'UTR length correlates with neither organismal complexity nor breadth/tissue specificity of gene expression. In addition, the increased numbers of common translational regulatory signals (upstream start codons and upstream open reading frames) in longer 5'UTRs do not contribute to increased organismal complexity. In other words, we provide evidence that logical connections (i), (iv), and (v) are invalid. Therefore, we suggest that the elongation of 5'UTRs alone cannot explain the emergence of organismal complexity, despite that transcriptional regulatory complexity indeed positively correlates with organismal complexity (connection (ii)) from fruit fly to mammals.
The increase in 5'UTR length is unrelated to the increase in organismal complexity
The median/average 5'UTR lengths and the numbers of cell types of the compared organisms
Median/Average length of 5'UTRs (bp)
No. of cell typesc
Human (Homo sapiens)
Chimpanzee (Pan troglodytes)
Mouse (Mus musculus)
Rat (Rattus novegicus)
Chicken (Gallus gallus)
Cow (Bos taurus)
Dog (Canis familiaris)
Frog (Xenopus tropicalis)
Zebrafish (Danio rerio)
Tetraodon (Tetraodon nigroviridis)
Fugu (Takifugu rubripes)
Ascidian (Ciona intestinalis)
Fruit fly (Drosophila melanogaster)
Nematode (Caenorhabditis elegans)
Honeybee (Apis mellifera)
Mosquito (Anopheles gambiae)
Thale cress (Arabidopsis thaliana)
Rice (Oryze sative)
Yeast (Saccharomyces cerevisiae) f
Since the lengths of 5'UTRs may differ between different annotation systems, and plants are not included in the above analysis, we used an independent dataset (UTRdb, see Methods) and added two plant species to again evaluate the correlation between 5'UTR length and organismal complexity. Accordingly, the 5'UTRs of a total of 17 species, including 9 vertebrates, 5 invertebrates, 2 plants, and yeast, were analyzed (Table 1). The correlation between organismal complexity and 5'UTR length is again statistically insignificant (R2 = 0.001, P = 0.884; Figure 2(C)).
To control for the factor of lineage-specific gains/losses of genes, we extracted one-to-one orthologous genes from 11 vertebrate species from the Ensembl dataset and performed the analysis again. Note that we included only vertebrate species to ensure a large enough number of genes for the analysis. The correlation remains statistically insignificant (Additional file 1), suggesting that lineage-specific gene gains/losses do not affect our result. Therefore, connection (v) in Figure 1 is not supported by this multiple-species comparison.
The length of 5'UTR cannot fully explain the breadth or tissue specificity of gene expression
We have shown that organismal complexity does not increase with increasing length of 5'UTR. We then examine possible reasons for the lack of correlation by investigating logical connections (i), (ii), (iii), and (iv) in Figure 1 using the biological features of the three intensively studied species, for multiple-tissue (>10 tissues) gene expression data are available only for these species. We first analyzed the relationship between 5'UTR length and the breadth/tissue specificity of gene expression. Vinogradov and Anatskaya  showed that human had a higher fraction of non-constitutively expressed genes than mouse, which was suggested to result from human's longer 5'UTRs (logical connections (i) and (ii) in Figure 1). In this vein, organisms with longer 5'UTRs are expected to have a larger proportion of narrowly expressed genes (higher tissue specificity) because the supposedly larger numbers of regulatory elements in longer 5'UTRs allow subtle transcriptional regulations, which should in turn lead to increased organismal complexity.
To examine the validity of these logical connections, we compared the expression patterns of one-to-one orthologous gene among the three species for all the available tissues (Methods [21, 22]). Notably, there are two technical issues in this comparison. First, the numbers of experimentally examined tissues are much larger for mammals (79 for human and 61 for mouse) than for fruit fly (17 tissues). This may lead to a larger proportion of "constitutively expressed genes" in fruit fly than in mammals because, intuitively, a gene is more likely to be expressed in 17 tissues than in 61 (or 79) tissues. Second, it is infeasible to compare "homologous" tissues between mammals and fruit fly. To address these issues, we randomly sampled 10 non-redundant tissues from each of the species 1,000 times, and analyzed the expression profiles in the sampled tissues (Methods). The rationale of this analysis is that the gene expression patterns in complex organisms should be more variable than in relatively simple organisms. In other words, given the same numbers of tissues, more complex organisms should have fewer genes that are expressed in all of the examined tissues, and demonstrate higher levels of tissue specificity of gene expression. We then took three measurements for the analyzed genes in the sampled tissues: (a) the 5'UTR length; (b) the ratio of "non-constitutively expressed genes" to "constitutively expressed" genes (Methods); and (c) the mean of tissue specificity of gene expression (the "τ" statistic ).
Increasing numbers of upstream start codons and upstream open reading frames do not contribute to increase in organismal complexity
Next, we examine the relationship between organismal complexity and the numbers of translational regulatory motifs in 5'UTRs (logical connections (iii) and (iv) in Figure 1). Here we use two common motifs, namely upstream start codons (uAUGs) and upstream open reading frames (uORFs), to represent the translational regulatory elements in 5'UTRs. This is reasonable because these elements occur frequently in 5'UTRs and can significantly down-regulate the translation of the main coding regions . Furthermore, for the same species, the numbers of uAUGs and uORFs are positively correlated with the lengths of 5'UTRs [25, 26]. We can examine whether this is also true between different species. To this end, we used the 15-species Ensembl datasets to examine the correlation between 5'UTR length and the number of uAUG/uORF. In fact, the numbers of uAUGs and uORFs are both positively correlated with the lengths of 5'UTRs, with only one exception (the number of uAUGs VS. 5'UTR length for randomly selected transcripts; Additional file 4). Therefore, the general trend is that organisms with longer 5'UTRs tend to have more translational regulatory elements, which supports logical connection (iii) in Figure 1.
We also use the Ensembl datasets to examine whether the numbers of uAUG/uORF correlate with organismal complexity. The independent contrast analyses indicate that the number of neither of the two types of regulatory elements per gene significantly correlates with organismal complexity (P ≧ 0.340; Additional file 5). Therefore, logic connection (iv) is not supported.
In sum, we provide evidence against two important assumptions (connections (i) and (iv) in Figure 1) in the 5'UTR length-organismal complexity hypothesis. The failure of these assumptions leads to falsification of the hypothesis itself (connection (v)). Therefore, we suggest that the elongation of 5'UTR is not the major contributor of the increased organismal complexity.
We have demonstrated that the elongation of 5'UTR is not directly related to the increase in organismal complexity among human, mouse, and fruit fly (and also in several larger datasets). The possible reason for the lack of correlation is twofold. First, at the transcription level, 5'UTR length is not correlated with breadth/tissue specificity of gene expression. Second, at the translation level, the larger numbers of common translational regulatory elements in longer 5'UTRs do not lead to increased organismal complexity.
However, we emphasize that our results support the correlation between organismal complexity and the complexity in gene regulations . It is well established that transcriptional/translational regulations involve a wide variety of trans- and cis- factors. 5'UTRs represent only part of the cis-factors. We cannot rule out the possibility that organismal complexity is associated with the interactions between 5'UTRs and other regulatory factors, thus blurring the correlation between 5'UTR length and organismal complexity. Furthermore, 5'UTRs may contain so far uncharacterized transcriptional/translational regulatory elements, which alone or in combination with other regulatory elements may contribute to organismal complexity.
The apparent lack of correlation between 5'UTR length and organismal complexity is unexpected, for the elongation of 5'UTRs and the emergence of organismal complexity were suggested to result from the same evolutionary process [27, 28]. It has been proposed that the decrease in population size and the consequent reduction of selective constraint on genome evolution led to the accumulation of regulatory elements and the emergence of organismal complexity . Therefore, it appears reasonable to assume an association between organismal complexity and 5'UTR length. However, as we discussed earlier, 5'UTR is not the only regulatory element in the genome. For example, non-coding RNA-mediated gene regulations [30–32], nonsense-mediated decay , the lengths and interactions of protein coding sequences , and 3'UTRs may all contribute to regulatory complexity . To be sure, 5'UTRs represent only part of the complicated machinery of eukaryotic gene regulations. The proportion that 5'UTRs contribute to the variations in transcriptional/translational regulations remains unknown. And such proportions are also likely to vary with biological conditions. It is intriguing to study whether the collective length of all regulatory elements correlates significantly with organismal complexity. A potential approach is to integrate these features into a multiple regression model and analyze the contributions of each characteristic to the variations in organismal complexity.
Our study has extended the evidence base for the association between organismal complexity and transcriptional regulatory complexity from mammals to fruit fly. We also show that increased organismal complexity does not result directly from the elongation of 5'UTRs because longer 5'UTRs do not contribute to higher regulatory complexity. Therefore, despite the proposed common evolutionary origin of these two biological phenomena, one single type of regulatory sequence (5'UTR) may not account for such a multi-faceted feature as organismal complexity.
We used two primary data sources for well-annotated 5'UTR information: Ensemble (version 56) and UTRdb (http://utrdb.ba.itb.cnr.it/; updated in July 2010) . For the Ensembl dataset, 11 vertebrate and 3 invertebrate species were selected (Table 1). The sequences of 5'UTRs and gene annotations (Ensembl version 56) were retrieved by using BioMart . For the UTRdb dataset, 10 vertebrate, 4 invertebrate, and 2 plant species were selected (Table 1). The 5'UTR sequences of yeast (Saccharomyces cerevisiae) were retrieved from a recent publication  and added to both datasets for subsequent analyses. Note that the Ensembl dataset was applied in all of the analyses of this study, whereas the UTRdb dataset was used to examine logical connection (v) only.
Furthermore, we selected the most extensively studied species, namely human (Homo sapiens), mouse (Mus musculus), and fruit fly (Drosophila melanogaster) for analyses. The high study intensities for these species have considerably reduced the probability of annotation errors as compared with the other analyzed species. In addition, the large-scale gene expression data available for these three species enable us to analyze the correlation between 5'UTR length and gene expression patterns, which would be impossible for the other species. In addition to Ensemble and UTRdb data, a RefSeq-CAGE dataset was also employed. For this last dataset, only the RefSeq-annotated transcription start sites that were supported by the CAGE tag clusters [18–20] were retained. Therefore, the lengths of 5'UTRs derived from this dataset were considered as highly accurate. Chromosomal positions of tag clusters were downloaded from the FANTOM website for human and mouse [18, 19] and from a recent genome-wide study for fruit fly .
To further enhance the quality of the data, several criteria were applied to filter the retrieved transcripts: the transcripts to be analyzed must (a) have an annotated 5'UTR; (b) be a known transcript (rather than a novel or predicted transcript); and (c) have a known protein product. The last two conditions were employed to ensure that the 5' and 3' termini of the analyzed 5'UTRs were experimentally supported. In the case of alternative splicing, we used two different criteria to select one transcript for each gene for the Ensembl dataset (Table 1): (A) a randomly selected transcript; or (B) the transcript with a "pure 5'UTR" (i.e. a 5'UTR that does not overlap with the coding sequences in any other transcripts). In the latter case, we further filtered out the pure 5'UTRs that matched any of the entries in the non-redundant (NR) protein database (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz) by using blastx  with the default parameters (E-value < 10-5). Analyses of both datasets yield consistent results. For the UTRdb dataset, we randomly selected one transcript for each gene with alternatively spliced isoforms.
For the measurement of organismal complexity, we used the number of cell types because this indicator has been shown to be highly correlated with organismal complexity . The numbers of cell types of the compared species (Table 1) were retrieved from Vogal and Chothia's study .
Evaluating the correlation between organismal complexity and genetic features
The genetic characteristics of closely related species may not evolve independently, which may lead to biased correlations between genetic features . To eliminate such biases, we employed the "CONTRAST" module of PHYLIP  to derive the contrasts of the measured biological features (5'UTR lengths, the numbers of translational regulatory elements, and the numbers of cell types) with reference to the phylogenetic tree of the compared organisms. The process is summarized as follows. First, the phylogenetic tree was constructed based on the protein sequences of one-to-one orthologous genes of the compared species. Second, unweighted contrasts of the biological characteristics (e.g. 5'UTR length) were calculated for the internal nodes of the phylogenetic tree. Third, weighted contrasts were calculated according to the genetic distances between the nodes of the tree.
The Spearman's correlations of a zero-intercept linear regression model were then evaluated for the derived contrasts of biological characteristics by using the R program (http://www.r-project.org). The reason for using the zero-intercept regression is that no changes in one biological characteristic are expected if the other characteristic does not change (e.g. no changes in organismal complexity are expected if the lengths of 5'UTRs do not change). Notably, the overall results hold well even if we use the regular Spearman's correlation.
Measurements of gene expression breadth and tissue specificity
Gene expression data of human and mouse were retrieved from the BioGPS website (http://biogps.gnf.org/downloads/). The datasets covered 79 human and 61 mouse tissues, where the levels of gene expression were measured using the Affymetrix microarray chips (U133A/GNF1H for human and GNF1M for mouse) . To determine the probe-gene associations, we blastn-aligned  the probe sequences against the complementary DNA (cDNA) sequences of known human and mouse protein coding genes retrieved from Ensembl version 56. Only the probes that could be completely matched to a cDNA with 100% identity were retained. The probes that matched more than one gene were excluded. In the cases where multiple probes matched the same gene, we retained the probe that had the highest sum of expression levels in all tissues. Accordingly, 15,834 human and 15,627 mouse genes were identified and subsequently analyzed. The gene expression data of adult fruit fly were retrieved from the FlyAtlas (http://flyatlas.org/drosophila_2.na23.annot.csv), which covered 17 tissues that were examined using the Affymetrix Drosophila Genome 2.0 Array . The probe-gene associations were determined as described above. Accordingly, 12,095 of the fruit fly genes were included in the subsequent analyses.
Note that the numbers of examined tissues differ remarkably between the mammalian species and fruit fly. To fairly reflect the differences in expression patterns among human, mouse, and fruit fly, we randomly sampled 10 non-redundant tissues from each of species (or 17 tissues from human and mouse) 1,000 times, and analyzed the expression profiles in the sampled tissues. A mammalian gene was considered as expressed in a given tissue if its average difference (AD) value was larger than 200 . In the case of fruit fly, a gene was regarded as expressed if it had at least 3 present calls out of 4 biological replicates . The genes that were not expressed in any of the 10 (or 17) sampled tissues were excluded. We then took three measurements for the analyzed genes in the sampled tissues: (a) the median 5'UTR lengths; (b) the ratio of "non-constitutively expressed genes" (defined as genes that were not expressed in all of the 10 (or 17) sampled tissues) to "constitutively expressed" genes (genes that were expressed in all of the 10 or 17 sampled tissues); and (c) the average tissue specificity of gene expression. Tissue specificity of gene expression was measured by the modified τ statistic , which considered both expression breadth and expression level of a gene. The τ value falls between 0 and 1. A larger τ value indicates higher tissue specificity of gene expression.
Identification of translational regulatory elements in 5'UTRs
Identification of all of the translational regulatory elements in 5'UTRs is infeasible due to our limited understanding of these elements. Instead, we calculated the numbers of two common regulatory elements that have been proved able to significantly alter the levels of protein translation: upstream start codons (uAUGs)  and upstream open reading frames (uORFs) . The uAUGs in 5'UTRs were scanned from the 5' cap to the 3' end in three different reading frames. A uORF was defined as a putative open reading frame that started at a uAUG and terminated at a stop codon within a 5'UTR. A uORF must be at least 9 nucleotides long, including a uAUG, a stop codon, and at least one codon in-between. To avoid redundancy in the calculation of uORF numbers, only the first uAUG triplet was used as the start of a uORF when multiple in-frame uAUGs were present.
This study applies only bioinformatics analyses on data from the public domain. Therefore, no ethical approval or consent for data usage is required.
List of abbreviations
5' untranslated region
cap analysis of gene expression
upstream start codon
upstream open reading frame.
Feng-Chi Chen is supported by National Health Research Institutes intramural funding and the National Science Council, Taiwan (NSC 99-3112-B-400-012). We thank Dr. Ben-Yang Liao for helpful comments. We also thank Chia-Chian Kao for his assistance in processing microarray data, and Tsung-Kair Chang for suggestions of statistical tests. We are grateful for the three anonymous reviewers for constructive comments.
- Szathmary E, Jordan F, Pal C: Molecular biology and evolution. Can genes explain biological complexity?. Science. 2001, 292 (5520): 1315-1316. 10.1126/science.1060852.PubMedView ArticleGoogle Scholar
- Pray L: Eukaryotic genome complexity. Nature Education. 2008, 1 (1):
- Pennisi E: Why do humans have so few genes?. Science. 2005, 309 (5731): 80-10.1126/science.309.5731.80.PubMedView ArticleGoogle Scholar
- Mattick JS: The genetic signatures of noncoding RNAs. PLoS Genet. 2009, 5 (4): e1000459-10.1371/journal.pgen.1000459.PubMedPubMed CentralView ArticleGoogle Scholar
- Taft RJ, Pheasant M, Mattick JS: The relationship between non-protein-coding DNA and eukaryotic complexity. Bioessays. 2007, 29 (3): 288-299. 10.1002/bies.20544.PubMedView ArticleGoogle Scholar
- Xia K, Fu Z, Hou L, Han JD: Impacts of protein-protein interaction domains on organism and network complexity. Genome Res. 2008, 18 (9): 1500-1508. 10.1101/gr.068130.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Vinogradov AE, Anatskaya OV: Organismal complexity, cell differentiation and gene expression: human over mouse. Nucleic Acids Res. 2007, 35 (19): 6350-6356. 10.1093/nar/gkm723.PubMedPubMed CentralView ArticleGoogle Scholar
- Wilkie GS, Dickson KS, Gray NK: Regulation of mRNA translation by 5'- and 3'-UTR-binding factors. Trends Biochem Sci. 2003, 28 (4): 182-188. 10.1016/S0968-0004(03)00051-3.PubMedView ArticleGoogle Scholar
- Sonenberg N, Hinnebusch AG: Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell. 2009, 136 (4): 731-745. 10.1016/j.cell.2009.01.042.PubMedPubMed CentralView ArticleGoogle Scholar
- Jackson RJ, Hellen CU, Pestova TV: The mechanism of eukaryotic translation initiation and principles of its regulation. Nat Rev Mol Cell Biol. 2010, 11 (2): 113-127. 10.1038/nrm2838.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen C, Chen S, Juan H, Huang H: Lengthening of 3'UTR increases morphological complexity in animal evolution. Nature Precedings. 2010, hdl:10101/npre.2010.4915.1Google Scholar
- Adami C: What is complexity?. Bioessays. 2002, 24 (12): 1085-1094. 10.1002/bies.10192.PubMedView ArticleGoogle Scholar
- Meader SJ, Ponting CP, Lunter G: Massive turnover of functional sequence in human and other mammalian genomes. Genome Res. 2010Google Scholar
- Tupler R, Perini G, Green MR: Expressing the human genome. Nature. 2001, 409 (6822): 832-833. 10.1038/35057011.PubMedView ArticleGoogle Scholar
- Tenaillon O, Silander OK, Uzan JP, Chao L: Quantifying organismal complexity using a population genetic approach. PLoS ONE. 2007, 2 (2): e217-10.1371/journal.pone.0000217.PubMedPubMed CentralView ArticleGoogle Scholar
- Vogel C, Chothia C: Protein family expansions and biological complexity. PLoS Comput Biol. 2006, 2 (5): e48-10.1371/journal.pcbi.0020048.PubMedPubMed CentralView ArticleGoogle Scholar
- Felsenstein J: Phylogenies and the comparative method. Am Nat. 1985, 125: 1-15. 10.1086/284325.View ArticleGoogle Scholar
- Kawaji H, Severin J, Lizio M, Forrest AR, van Nimwegen E, Rehli M, Schroder K, Irvine K, Suzuki H, Carninci P, Hayashizaki Y, Daub CO: Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Nucleic Acids Res. 2011, 39 (Database issue): D856-860.PubMedPubMed CentralView ArticleGoogle Scholar
- Kawaji H, Severin J, Lizio M, Waterhouse A, Katayama S, Irvine KM, Hume DA, Forrest AR, Suzuki H, Carninci P, Hayashizaki Y, Daub CO: The FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation. Genome Biol. 2009, 10 (4): R40-10.1186/gb-2009-10-4-r40.PubMedPubMed CentralView ArticleGoogle Scholar
- Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, Lassmann T, Yu C, Booth BW, Zhang D, Wan KH, Yang L, Boley N, Andrews J, Kaufman TC, Graveley BR, Bickel PJ, Carninci P, Carlson JW, Celniker SE: Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 2011, 21 (2): 182-192. 10.1101/gr.112466.110.PubMedPubMed CentralView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101 (16): 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
- Chintapalli VR, Wang J, Dow JA: Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nat Genet. 2007, 39 (6): 715-720. 10.1038/ng2049.PubMedView ArticleGoogle Scholar
- Liao BY, Zhang J: Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol. 2006, 23 (6): 1119-1128. 10.1093/molbev/msj119.PubMedView ArticleGoogle Scholar
- Calvo SE, Pagliarini DJ, Mootha VK: Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci USA. 2009, 106 (18): 7507-7512. 10.1073/pnas.0810916106.PubMedPubMed CentralView ArticleGoogle Scholar
- Lawless C, Pearson RD, Selley JN, Smirnova JB, Grant CM, Ashe MP, Pavitt GD, Hubbard SJ: Upstream sequence elements direct post-transcriptional regulation of gene expression under stress conditions in yeast. BMC Genomics. 2009, 10: 7-10.1186/1471-2164-10-7.PubMedPubMed CentralView ArticleGoogle Scholar
- Iacono M, Mignone F, Pesole G: uAUG and uORFs in human and rodent 5'untranslated mRNAs. Gene. 2005, 349: 97-105.PubMedView ArticleGoogle Scholar
- Lynch M, Scofield DG, Hong X: The evolution of transcription-initiation sites. Mol Biol Evol. 2005, 22 (4): 1137-1146. 10.1093/molbev/msi100.PubMedView ArticleGoogle Scholar
- Reuter M, Engelstadter J, Fontanillas P, Hurst LD: A test of the null model for 5' UTR evolution based on GC content. Mol Biol Evol. 2008, 25 (5): 801-804. 10.1093/molbev/msn044.PubMedView ArticleGoogle Scholar
- Lynch M: The frailty of adaptive hypotheses for the origins of organismal complexity. Proc Natl Acad Sci USA. 2007, 104 (Suppl 1): 8597-8604.PubMedPubMed CentralView ArticleGoogle Scholar
- Pauli A, Rinn JL, Schier AF: Non-coding RNAs as regulators of embryogenesis. Nat Rev Genet. 2011, 12 (2): 136-149. 10.1038/nrg2904.PubMedPubMed CentralView ArticleGoogle Scholar
- Inui M, Martello G, Piccolo S: MicroRNA control of signal transduction. Nat Rev Mol Cell Biol. 2010, 11 (4): 252-263.PubMedView ArticleGoogle Scholar
- Costa FF: Non-coding RNAs, epigenetics and complexity. Gene. 2008, 410 (1): 9-17. 10.1016/j.gene.2007.12.008.PubMedView ArticleGoogle Scholar
- Isken O, Maquat LE: The multiple lives of NMD factors: balancing roles in gene and genome regulation. Nat Rev Genet. 2008, 9 (9): 699-712. 10.1038/nrg2402.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanapin AA, Mulder N, Kuznetsov VA: Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity. BMC Genomics. 2010, 11 (Suppl 1): S4-10.1186/1471-2164-11-S1-S4.PubMedPubMed CentralView ArticleGoogle Scholar
- Grillo G, Turi A, Licciulli F, Mignone F, Liuni S, Banfi S, Gennarino VA, Horner DS, Pavesi G, Picardi E, Pesole G: UTRdb and UTRsite (RELEASE 2010): a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 2010, 38 (Database issue): D75-80.PubMedPubMed CentralView ArticleGoogle Scholar
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004, 14 (1): 160-169.PubMedPubMed CentralView ArticleGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Carroll SB: Chance and necessity: the evolution of morphological complexity and diversity. Nature. 2001, 409 (6823): 1102-1109. 10.1038/35059227.PubMedView ArticleGoogle Scholar
- Vibranovski MD, Lopes HF, Karr TL, Long M: Stage-specific expression profiling of Drosophila spermatogenesis suggests that meiotic sex chromosome inactivation drives genomic relocation of testis-expressed genes. PLoS Genet. 2009, 5 (11): e1000731-10.1371/journal.pgen.1000731.PubMedPubMed CentralView ArticleGoogle Scholar
- Kozak M: Pushing the limits of the scanning mechanism for initiation of translation. Gene. 2002, 299 (1-2): 1-34. 10.1016/S0378-1119(02)01056-9.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.