Genome-wide comparative analysis of microRNAs in three non-human primates
© Brameier; licensee BioMed Central Ltd. 2010
Received: 22 January 2010
Accepted: 9 March 2010
Published: 9 March 2010
MicroRNAs (miRNAs) are negative regulators of gene expression in multicellular eukaryotes. With the recently completed sequencing of three primate genomes, the study of miRNA evolution within the primate lineage has only begun and may be expected to provide the genetic and molecular explanations for many phenotypic differences between human and non-human primates.
We scanned all three genomes of non-human primates, including chimpanzee (Pan troglodytes), orangutan (Pongo pygmaeus), and rhesus monkey (Macaca mulatta), for homologs of human miRNA genes. Besides sequence homology analysis, our comparative method relies on various postprocessing filters to verify other features of miRNAs, including, in particular, their precursor structure or their occurrence (prediction) in other primate genomes. Our study allows direct comparisons between the different species in terms of their miRNA repertoire, their evolutionary distance to human, the effects of filters, as well as the identification of common and species-specific miRNAs in the primate lineage. More than 500 novel putative miRNA genes have been discovered in orangutan that show at least 85 percent identity in precursor sequence. Only about 40 percent are found to be 100 percent identical with their human ortholog.
Homologs of human precursor miRNAs with perfect or near-perfect sequence identity may be considered to be likely functional in other primates. The computational identification of homologs with less similar sequence, instead, requires further evidence to be provided.
MicroRNAs (miRNAs) constitute a class of short endogenous non-coding RNA (ncRNA) sequences which directly function as negative regulators of gene expression at the post-transcriptional level in multicellular eukaryotes (see e.g. [1–3] for reviews). The ~70 nt long precursor of animal miRNAs (pre-miRNA) forms a typical hairpin-like stem-loop structure. The contained mature miRNA is only ~22 nt long and binds to complementary target sites in the untranslated region (UTR) of messenger RNA. Perfect base-pairing is found only for a 6-8 nt long seed region located at the 5' end of the miRNA. As a result, one miRNA may at least theoretically target hundreds of genes.
Comparative approaches to discover miRNA genes [4, 5] rely on sequence homology to known miRNAs , sequence profiles , characteristic secondary structure features and/or evolutionary conservation among different species [8–12]. Some approaches use both sequence and secondary structure conservation to known miRNA precursors [13, 14]. Berezikov et al.  use phylogenetic shadowing to derive a general conservation profile from miRNA precursor sequences of 10 primate species which is used to search for new miRNAs. Ab initio approaches are able to discover miRNAs in a genome without using sequence homology or conservation (see e.g.  and references therein).
Three non-human primate genomes have been fully sequenced and are publicly available, including rhesus monkey (Macaca mulatta), chimpanzee (Pan troglodytes), and orangutan (Pongo pygmaeus). While for the first two species genome-wide comparative miRNA studies have been published recently [17, 18], the current list of miRNAs reported in miRBase  (most found in ) is still largely incomplete and comprises only 84 sequences. According to recent estimates supported by both genetic and fossil evidence , divergence of the human and ape (chimpanzee) lineages occurred about 6 million years ago (mya), orangutan and African apes diverged about 14 mya from their common ancestor, and hominoids and Old World monkeys (like rhesus macaque) about 23 mya.
The comparative method favored in this study uses various sequence- and structure-based filters to find miRNA homologs. A combination of multiple filters not only captures more diverse aspects of miRNAs, but allows lower thresholds (lower specificity) to be used for each individual filter. This again is essential for detecting homologs that are more distant (in sequence) and allows a broader selection of more different subtypes of miRNAs.
Furthermore, different filters and thresholds are applied for accepting or rejecting a miRNA candidate which excludes a (small) third set of undecided predictions. This is to increase the confidence in both positive and negative predictions, i.e., to better control the number of false positives and false negatives.
The genomes of the three non-human primates were downloaded from the Ensembl database (release 50, http://www.ensembl.org). The currently known miRNAs in human were retrieved from the miRBase database  (release 12.0, http://microrna.sanger.ac.uk) and comprise 695 hairpin sequences and 692 different mature sequences. Many miRNAs in miRBase have been identified computationally in homology studies. The human hairpin sequences were aligned against the three primate genomes using NCBI BLAST  (offline version 2.2.18) with parameter settings -G 1 -E 1 -F F. Among various settings tested here, this has been found to increase the number of detected precursor homologs, compared to the standard settings.
In a second-level BLAST analysis we check the conservation of mature miRNAs by aligning all mature sequences known in human against the precursor sequences predicted in the other primates. Because of their small size, some query sequences did not produce a BLAST hit or the alignment was incomplete. In these few cases the alignment had to be manually corrected and was extended to the length of the query sequence.
Secondary structure analysis
All distances are based on the Levenshtein distance or string edit distance d edit  which is the minimum number of point mutations needed to transform one sequence into the other. By subtracting the length difference in Equation 1 we reduce its influence on the overall distance.
Filtering microRNA homologs
Multiple filtering steps are to be passed by a candidate sequence to be accepted as a homolog of a human miRNA precursor. Only one (the best) BLAST hit is selected and processed.
(1) Precursor sequence filter: minimum 85 percent sequence identity over an alignment length of at least 95 percent
(2) Structure sequence filter: minimum 85 percent identity in secondary structure sequence
(3) Hairpin filter: minimum 15 base pairs in the stem arm and only one terminal loop
(4) Seed filter: no mutations in the seed region of the mature sequence
A cascade of Perl scripts makes the filtering process fully automatic. The selection of thresholds is partly motivated by the comparative analysis in Section Results and discussion. An absolute maximum of -10 kcal/mol is imposed on the MFE of a hairpin structure. This is the highest value found among known human miRNAs. The same applies to the required minimum of 15 base pairs.
The seed region is expanded to positions 2-9  (from the 5' end) and extracted from each human mature miRNA. The 8mers are aligned against the mature homologs using perfect matching by Perl regular expressions.
A miRNA candidate (best BLAST match) is rejected, instead, and a homolog is said to be not exiting in a genome if the sequence identity drops below 70 percent.
Results and discussion
Comparative sequence and structure analysis
miRNA homologs with perfect (100 percent) or near-perfect (around 98 percent) sequence identity allow us to assume that these are likely functional (as in human). Candidate sequences with less but more than 85 percent similarity - true for 38 percent of the precursor homologs found in orangutan - require the verification of more miRNA features.
MicroRNA gene identification in orangutan
Lists of positive and negative predictions from our analyses are provided in the supplementary material (see Section Additional files). Additional file 1 contains all 605 homologs of human precursor miRNAs found for orangutan, including 77 sequences which are already known (i.e. in miRBase). 18 homologs are identical or have an overlapping genome location with another miRNA. This leaves 510 newly discovered miRNAs in total.
Besides known miRNAs, candidates are marked in Additional file 1 that pass various other filters (see Section Methods). This allows a flexible combination of filtering criteria, including those derived from the precursor structure or the mature sequence. 526 orthologs (from 605) remain after applying the hairpin filter and 494 after the seed filter. Here, we also utilize the existence (detection) of a miRNA homolog in more than one primate species (besides human). This is to improve the reliability of predictions and helps to reduce the effect of possible sequencing errors. In our setup, 499 human precursor miRNAs are found to have a homologous sequence in all three primates. 563 miRNAs are conserved in both chimpanzee and orangutan, and 530 are shared between orangutan and rhesus macaque.
Additional file 2 lists all homologs of human mature miRNAs found in the orangutan precursors. The 682 entries include homologs of both 5' and 3' miRNAs, some originating from the same precursor. 611 human mature miRNAs are conserved in at least one orangutan precursor, resulting in 624 different sequences.
Identification of lineage-specific microRNAs
Another question of interest is which and how many miRNAs are lineage- or species-specific. Our analysis especially supports the identification of human-specific miRNAs. Since we cannot completely exclude the possibility that some homologs may not be found because of erroneous or incomplete genome assembly, we require the negative prediction of a miRNA to be confirmed by our method in at least two of the three non-human primate genomes at hand. Additional file 3 lists all 35 homologs which are missing in this way. 12 human miRNAs could not be identified in any of the three primate genomes and, thus, are the most likely to be human-specific. These in particular may be responsible for phenotypic differences between human and non-human primates, i.e., may help to explain what makes us human.
Sequence and structural similarities to a human miRNA are strong indications for a putative homolog to be transcribed and functional. Nevertheless, the expression levels of both miRNAs may differ due to alterations in the specific regulatory pathway that controls their expression. In addition, the regulatory effects, i.e., the selection and expression of target genes, may be significantly different. This is due to a fast evolution of miRNA binding sites  which led to many lineage- or species-specific sites and is just as responsible for what makes us different from other primates.
In this comparative study we searched the genomes of three non-human primates for miRNAs. The applied prediction algorithm (outlined in Section Methods) verifies multiple criteria based on similarities to known human miRNAs in sequence and structure to detect both closely-related and more distantly-related homologs. The parallel analysis allows, in particular, the prediction of a miRNA in multiple species to be used as an additional filter. In return, it provides some support for the configuration of the method, i.e., for the parameter settings (thresholds) and filter definitions used here. The other results of this study may be summarized as follows:
(1) A thorough and comprehensive search for novel orangutan miRNAs. More than 500 putative miRNA genes have been identified, where the precursor sequence is at least 85 percent identical to its ortholog in human.
(2) Both sequence distances and structure distances to human miRNAs have been found to be more similar for orangutan and rhesus macaque than for orangutan and chimpanzee, indicating a more similar evolutionary distance to human on the miRNA level.
(3) The proportion of identical or nearly identical precursor sequences with human has been found relatively small for all three primate species, considering the evolutionary distances and compared to the mature sequences. Only about 40 percent are the same in human and orangutan.
(4) Identification of common and lineage-specific miRNAs. 499 miRNA sequences are conserved in all primates investigated here. 35 human miRNAs have not been found in at least two non-human primate genomes and some of which might actually be human-specific.
The author wishes to thank Dr. Anne Averdam for critical reading of the manuscript.
- Bartel D: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116 (2): 281-297. 10.1016/S0092-8674(04)00045-5.PubMedView ArticleGoogle Scholar
- Ambros V: The functions of animal microRNAs. Nature. 2004, 431 (7006): 350-355. 10.1038/nature02871.PubMedView ArticleGoogle Scholar
- He L, Hannon G: MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet. 2004, 5 (7): 522-531. 10.1038/nrg1379.PubMedView ArticleGoogle Scholar
- Brown J, Sanseau P: A computational view of microRNAs and their targets. Drug Discov Today. 2005, 10 (8): 595-601. 10.1016/S1359-6446(05)03399-4.PubMedView ArticleGoogle Scholar
- Zhang B, Pan X, Wang Q, Cobb G, Anderson T: Computational identification of microRNAs and their targets. Comput Biol Chem. 2006, 30 (6): 395-407. 10.1016/j.compbiolchem.2006.08.006.PubMedView ArticleGoogle Scholar
- Weber M: New human and mouse microRNA genes found by homology search. FEBS J. 2005, 272 (1): 59-73. 10.1111/j.1432-1033.2004.04389.x.PubMedView ArticleGoogle Scholar
- Legendre M, Lambert A, Gautheret D: Profile-based detection of microRNA precursors in animal genomes. Bioinformatics. 2005, 21 (7): 841-845. 10.1093/bioinformatics/bti073.PubMedView ArticleGoogle Scholar
- Lim L, Glasner M, Yekta S, Burge C, Bartel D: Vertebrate microRNA genes. Science. 2003, 299 (5612): 1540-10.1126/science.1080372.PubMedView ArticleGoogle Scholar
- Lai E, Tomancak P, Williams R, Rubin G: Computational identification of Drosophila microRNA genes. Genome Biol. 2003, 4: 42-10.1186/gb-2003-4-7-r42.View ArticleGoogle Scholar
- Ohler U, Yekta S, Lim L, Bartel D, Burge C: Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA. 2004, 10 (9): 1309-1322. 10.1261/rna.5206304.PubMed CentralPubMedView ArticleGoogle Scholar
- Altuvia Y, Landgraf P, Lithwick G, Elefant N, Pfeffer S, Aravin A, Brownstein M, Tuschl T, Margalit H: Clustering and conservation patterns of human microRNAs. Nucleic Acids Res. 2005, 33 (8): 2697-2706. 10.1093/nar/gki567.PubMed CentralPubMedView ArticleGoogle Scholar
- Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A, Flamm C, Hofacker I, Stadler P: The expansion of the metazoan microRNA repertoire. BMC Genomics. 2006, 7: 25-10.1186/1471-2164-7-25.PubMed CentralPubMedView ArticleGoogle Scholar
- Wang X, Zhang J, Li F, Gu G, He T, Zhang X, Li Y: MicroRNA identification based on sequence and structure alignment. Bioinformatics. 2005, 21 (18): 3610-3614. 10.1093/bioinformatics/bti562.PubMedView ArticleGoogle Scholar
- Nam J, Shin K, Han J, Lee Y, Kim V, Zhang B: Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res. 2005, 33 (11): 3570-3581. 10.1093/nar/gki668.PubMed CentralPubMedView ArticleGoogle Scholar
- Berezikov E, Guryev V, Belt van de J, Wienholds E, Plasterk R, Cuppen E: Phylogenetic shadowing and computational identification of human microRNA genes. Cell. 2005, 120 (1): 21-24. 10.1016/j.cell.2004.12.031.PubMedView ArticleGoogle Scholar
- Brameier M, Wiuf C: Ab initio identification of human microRNAs based on structure motifs. BMC Bioinformatics. 2007, 8: 478-10.1186/1471-2105-8-478.PubMed CentralPubMedView ArticleGoogle Scholar
- Yue J, Sheng Y, Orwig K: Identification of novel homologous microRNA genes in the rhesus macaque genome. BMC Genomics. 2008, 9: 8-10.1186/1471-2164-9-8.PubMed CentralPubMedView ArticleGoogle Scholar
- Baev V, Daskalova E, Minkov I: Computational identification of novel microRNA homologs in the chimpanzee genome. Comput Biol Chemb. 2009, 33 (1): 62-70. 10.1016/j.compbiolchem.2008.07.024.View ArticleGoogle Scholar
- Griffths-Jones S, Grocock R, van Dongen S, Bateman A, Enright A: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006, 34: D140-D144. 10.1093/nar/gkj112.View ArticleGoogle Scholar
- Raaum R, Sterner K, Noviello C, Stewart C, Disotell T: Catarrhine primate divergence dates estimated from complete mitochondrial genomes: concordance with fossil and nuclear DNA evidence. J Hum Evol. 2005, 48 (3): 237-257. 10.1016/j.jhevol.2004.11.007.PubMedView ArticleGoogle Scholar
- Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
- Hofacker I: Vienna RNA secondary structure server. Nucleic Acids Res. 2003, 31 (13): 3429-3431. 10.1093/nar/gkg599.PubMed CentralPubMedView ArticleGoogle Scholar
- Cambridge University Press, Gusfield D: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. 1997, Cambridge University PressView ArticleGoogle Scholar
- Nahvi A, Shoemaker C, Green R: An expanded seed sequence definition accounts for full regulation of the hid 3' UTR by bantam miRNA. RNA. 2009, 15 (5): 814-822. 10.1261/rna.1565109.PubMed CentralPubMedView ArticleGoogle Scholar
- Saunders M, Liang H, Li W: Human polymorphism at microRNAs and microRNA target sites. PNAS. 2007, 104 (9): 3300-3305. 10.1073/pnas.0611347104.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.