Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences
© ElGokhy et al.; licensee BioMed Central Ltd. 2014
Received: 9 November 2013
Accepted: 22 April 2014
Published: 6 May 2014
MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons.
The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f–measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.
The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.
The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs.
The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.
MicroRNAs (miRNAs) are short (∼22 nucleotides), endogenously-initiated non-coding RNAs that control gene expression post transcriptionally, either by the degradation of target mRNAs or by the inhibition of protein translation.
The prediction of miRNA genes is a challenging problem towards the understanding of post transcriptional gene regulation. The two frontier strategies for miRNA prediction are experimental cloning and in silico . However, due to the difficulty of miRNA prediction using experimental techniques, computational approaches have been developed to conquer some of the technical difficulties of the experimental approaches.
The miRNA identification problem is usually defined over pre-miRNAs because their lengths are larger than that of mature miRNAs and, hence, more information can be extracted from their sequences. Moreover, the hairpin stem loop secondary structure of pre-miRNAs is an essential feature used in the computational identification of miRNAs. However, many sequence fragments in a genome have a similar stem-loop hairpin structure, in spite of not being genuine miRNA precursors .
Two major computational prediction strategies are considered, either by using homology or by using machine learning methods. Most miRNA prediction methods were developed to find out homologous miRNA in closely related species. These methods use comparative genomics information besides structural features that are extracted from the typical hairpin structures of known pre-miRNAs. ‘Blastn’ adopts the homology principle in miRNA prediction .
Comparative genomics is used to filter most of the hairpins that are not conserved in related species. This filtration step makes the method unable to recognize new miRNAs for which there are no known close homologies. Therefore, the attitude turned to focus on machine learning methods to distinguish real pre-miRNAs from other hairpin sequences with similar stem loop features (pseudo-miRNA) . The early machine learning methods used to discriminate real versus pseudo-miRNAs are miRScan , miRseeker , miRfinder  and miRCheck .
An amazing extensive wide variety of support vector machine systems have been built, aiming to get better results in predicting miRNAs. The first two of these systems are miR-abela  and Triplet-SVM .
The miR-able algorithm succeeded to predict between fifty and hundred novel pre-miRNAs . 30% of these have been verified experimentally as real miRNAs.
Triplet-SVM has been prominent due to its simplicity . In this method, a set of features are extracted, and provided to a support vector machine classifier to differentiate between real and pseudo-miRNAs. 90% recognition rate has been achieved.
RNAmicro is a compound prediction method . It first applies a homology strategy to recognize conserved almost-hairpins in a multiple sequence alignment. Then it computes a vector of numerical descriptors from each almost-hairpin that is used by a support vector machine classifier.
Two other systems have been derived from Triplet-SVM approach: MiPred , and miREncoding . MiPred annexed two thermodynamical features (Minimum Free Energy MFE, and the P-value), and succeeded in getting better results by using Random Forests instead of SVM. MiREncoding added several new features and tried to enhance the SVM classification performance by using a feature selection algorithm.
Another SVM, miPred , improved the accuracy of the previous SVM-classifiers by making extensive use of thermodynamical features. It uses normalized features which are computed on a large number of shuffled versions of a given pre-miRNA. However, this method is not reinforced by biologists due to its lack of biological plausibility. In addition, the normalization process is computationally time consuming.
The microPred  is another forceful SVM classifier that obtained more effective results than the previous classifiers due to the use of a negative data set (consisting of ncRNA and pseudo hairpins), new biologically relevant features, feature selection, extensive and systematic training and testing of the classifier system.
Virgo is a viral miRNA precursors prediction method . The method is based on both sequence and structure features that are extracted and fed to an SVM classifier to distinguish pre-miRNA hairpin sequences from pseudo-miRNA hairpin sequences. The method is more efficient than other ab-initio methods for predicting viral and mammalian miRNAs.
EumiR, being an eukaryotic microRNA precursor prediction server, queries multiple sequences to determine if they are true miRNAs or not . EumiR and Virgo share the same prediction principle. Eukaryotic pre-miRNA are used in training EumiR.
YasMiR is, also, an SVM for miRNA identification , whose novelty is two-folded: firstly, many of its features incorporate the base-pairing probabilities provided by Mc-Caskill’s algorithm and secondly, its classification performance has been improved by using a similarity (“profile”-based) measure between the training and the testing miRNAs and a set of carefully chosen (“pivot”) RNA sequences.
In the present paper, a computational method is proposed for the identification of miRNA precursors. The method combines the outcomes of four previously developed classification approaches using a neural network, to enable more accurate prediction of miRNAs.
Our method investigates whether a given sequence is a true or pseudo-miRNA using Support vector machines (SVM) and Random Forests (RF), since both of them are optimal binary classifiers.
Our de-novo miRNA prediction method is applied on three metagenomic samples from different environments. The prediction results provide a set of highly probable miRNA hairpins for future laboratory testing. This may lead to the discovery of new miRNA candidates.
Specifically, Triplet-SVM , MiPred , Virgo  and EumiR  classifiers are used for miRNAs prediction, and their prediction results are combined using a single hidden layer neural network in the hope of obtaining a more accurate miRNA predictor.
Our ensemble classifier achieved an excellent performance. This encouraged us to rely on it in identifying new miRNA candidates. We identified 106 sequences in the mine drainage metagenome, 55 in the groundwater metagenome and 18 in the marine metagenome as highly probable miRNAs.
This paper is organized as follows. Section ‘Background’ gives an overview of the miRNA prediction techniques. Section ‘Methods’ presents the proposed methodology. Section ‘Results and discussions’ analyses the prediction results. Section ‘Conclusions’ concludes the paper and suggests future work.
This section discusses the data sets that are fed into our ensemble classifier. Then, the mechanism of the data preparation is described. Finally, the structure of the adopted approach is explained in details.
Generally, microbiology has concentrated on individual species in pure laboratory. Therefore, the understanding of microbial communities has lagged behind understanding of their individual members. Metagenomics is a new tool to study microbes in the complex communities; where they live and how they interact with their surrounding environments [17, 18]. Metagenomics (also known as environmental genomics or community genomics) is the study and analysis of genomes of microbial organisms recovered directly from their natural environments [17, 19].
Whole Genome Shotgun sequencing is the procedure of breaking up a target genomic region into many segments, and sequencing them randomly. Through whole-genome shotgun sequencing of collected DNA from environmental patterns, metagenomics has played the role of systematic realization of the nucleotide sequence, followed by analysis of the structure, regulation and function of genes. The primary benefit of metagenomics is that it provides the ability of effectively characterizing the genetic variety existing in samples, without the need for isolation and lab refinement of individual species .
In this paper, an ensemble approach is used for miRNA mining in three metagenomic sequences from different environments. These metagenomes (mine drainage, groundwater, and marine metagenomic sequences) have been sequenced in whole-genome shotgun sequencing projects. Details about these projects are available in [20–22].
Three samples of the considered metagenomes (mine drainage, groundwater, and marine metagenomic sequences): each consisting of twenty contigs from the metagenome; have been randomly selected. As the miRNA prediction problem is usually defined over pre-miRNA and these stem-loop precursors are approximately 60 ∼70 nucleotides [23, 24], we developed a Perl script to divide each sample into fragments (70 nucleotides each). Many studies uses the same sample size [10, 25]. Each fragment in the sample starts with only one nucleotide shift from the start of the previous fragment to make sure that the miRNA mining covers all possible metagenomic sequences. This yields 97336 sequences for the mine drainage metagenome, 24625 sequences for the ground water metagenome, and 16709 sequences for the marine metagenome. Then, a feature vector; extracted from these fragments; is fed to the ensemble classifier to decide whether it is possibly a miRNA or not.
The ensemble classifier
The proposed ensemble approach aims to combine the decisions of four miRNA predictors that have been trained on different data and features. The motivation behind the assembling of the classifiers is the better performance and results achieved by consensus predictors and meta-classifiers in bioinformatics analysis that make the implementation of a meta-classifier a good decision for our method. The performance of any classifier is affected by several factors including the size of the training data set, its dimensionality, the number of classes to be differentiated and their mutual separability. Ensemble methods have been devised to reduce over-fitting and improve the performance of individual classifiers by fusing their decisions .
Ensemble design is either based on bagging or boosting. In bagging (bootstrap aggregating) ’m’ models are fitted using ’m’ bootstrap samples and combined by averaging or voting. Samples bootstrapping aims at creating diversity in the training data while average/voting aims at improving classification performance. Boosting is implemented either by varying the weights given to the data samples or by forming committees. Boosting is based on the idea that a strong classifier can be constructed from weak classifiers . Our proposed classifier adopts a hybrid scheme. The base classifiers have, originally been trained using different data samples of different dimensionality. Therefore, diversity in the training data is achieved (bagging principle). Also, it belongs to the class of "committees" (boosting principle). Our adopted scheme offers the advantage of being non-linear (because of the sigmoid activation functions in the first layer of the proposed Neural Network).
Main characteristics of the classifiers used in the proposed ensemble
A vector of 32 structure-sequence features
A vector of 32 structure-sequence features, MFE and P-value
A vector of 512 structure-sequence features
A vector of 512 structure-sequence features
Different Eukaryotic pre-miRNA
A vector of 4-dimensions (the outputs from the base classifiers)
The triplet-SVM classifier has been developed for predicting a query sequence with hairpin structure as a real miRNA precursor or not . Triplet SVM uses a set of features that combines the local contiguous structures with sequence information to characterize the hairpin structure of real versus pseudo-miRNAs.
RNAfold program from the RNA Vienna package has been used to predict the secondary structure of the query sequences . In the predicted secondary structure, Each nucleotide is paired or unpaired, represented by brackets "(" or ")" and dots ".", respectively. The left bracket "(" indicates that the paired nucleotide is near the 5’-end and can be paired with another nucleotide near the 3’-end, which is represented by a right bracket")". The study utilizes "(" for both situations. According to the previous mentioned definition for any 3 adjacent nucleotides, there are 8 possible structure combinations: "(((", "((.", "(..", "(.(", ".((", ".(.", "..(", and"...", which lead to 32 (4∗8) possible structure-sequence combinations, denoted as "U(((", "A((.", etc... This defines the triplet elements. The triplet elements have been used to represent the local structure-sequence features of the hairpin. The occurrence of all triplet elements are counted along a hairpin segment, developing a 32-dimensional vector of features, which is then normalized to be the input vector to the SVM .
The SVM classifier is formerly trained depending on the triplet element features of a set of real human pre-miRNAs from the miRNA Registry database  as well as a set of pseudo-miRNAs from the NCBI RefSeq database . The training set contains 163 human pre-miRNAs (positive samples) and 168 pseudo-miRNAs (negative samples) randomly chosen.
A 90% accuracy in distinguishing real from pseudo-miRNA hairpins in the human genome and up to 90% precision in identifying pre-miRNAs from other 11 species (including C. briggsae, C. elegans, D. pseudoobscura, D.melanogaster, Oryza sativa, A. thaliana and the Epstein Barr virus) have been achieved.
MiPred classifier is a Random Forest based method classifier which differentiates real pre-miRNAs from pseudo-miRNAs using hybrid features. The features consist of local structural sequence features of the hairpin with two thermodynamically added features (MFE of the secondary structure that is predicted using the Vienna RNA software package  and the P-value that is the fraction of sequences in a set of dinucleotide shuffled sequences having MFE lower than that of the start sequence . P-value is determined using the Monte Carlo randomization test ).
MiPred is one of the refinements of Triplet-SVM in which the SVM is replaced by a Random Forests ensemble learning algorithm. The Random Forest prediction model has been trained on the same training data set used by the triplet-SVM-classifier. It achieved nearly 10% greater overall accuracy compared to Triplet-SVM on a new test dataset.
The Virgo classifier is an efficient prediction classifier that differentiates true pre-miRNAs from pseudo-miRNAs . The classifier has been developed based on sequence structural features. The feature space consists of both sequences and their structural context. A sequence is folded using RNA-fold and the structural context of overlapping triplets is determined. Sequence structure feature space can have 64 possibilities and each nucleotide can have two states, ‘1’ if it is bound and ‘0’ if it is unbound. Thus, such a feature (eg AUG001, AUG010... etc) can have a total of 512 possibilities.
A support vector machine classifier(SVM light )trained on these feature elements is used for efficient distinction between miRNA precursor hairpins and pseudo-miRNA hairpins. The hairpin sequences for the eukaryotes used to train Virgo were derived from miRBase (release 8.0)  and the pseudo-miRNA sequences were derived from coding regions of genes with no alternate transcripts. The coding sequences were batch downloaded from Ensembl .
Virgo adopts a K-folding like technique for the training phase and selects the classifier that achieves the best specificity. Another advantage of Virgo is using the kernel idea (a Radial Basis function) to find the hyper surface that optimally separates true from pseudo-miRNA hairpins.
Virgo classifier performs better than recently reported methods for machine learning prediction of viral and mammalian pre-miRNAs. The algorithm is fast and efficient and can scale for genome-scale predictions not only on viral genomes, but also on much larger eukaryote genomes.
EumiR is an eukaryotic microRNA precursor prediction server from IGIB (Institute of Genomics and Integrative Biology), which is able to query multiple sequences to decide whether they are true miRNAs or not. EumiR uses the same principle for prediction of Virgo. RNA-fold is used for predicting the secondary structure of the input sequence. Sequence-structure feature space is determined using the same definition of Virgo.
EumiR utilizes LibSVM package to differentiate pre-miRNA hairpins from pseudo-miRNA hairpins. It is trained using eukaryotic pre-miRNAs from different species as positive samples. EumiR is more efficient in predicting eukaryotic pre-miRNAs, but its efficiency level is not the same for predicting viral microRNAs.
EumiR has better accuracy and sensitivity as compared to mir-abela and BayesmiRNAfind on viral miRNA precursors from miRbase.
EumiR server has more analysis options. It is able to BLAST miRbase, BLAST NCBI, SSEARCH miRbase, predict the secondary structure using RNAfold, predict using BayesmiRNAfind and predict using mir-abela.
Single hidden layer neural network
Neural networks are well known for their learning capabilities. Besides, they are model free, i.e., they do not impose any restrictions on the statistical distribution of their input data. The specific Neural-Network-based ensemble works according to the following theorem: A single-hidden layer feed-forward networks with at most N hidden neurons (including biases) can learn N distinct input-output pairs with zero error (It is possible to tolerate a certain amount of error by letting the number of hidden neurons be less than N). This remains true whether the activation function for the hidden neurons is the signum (hardlimit or threshold) or sigmoid (logistic) functions. The activation function of the output neuron(s) is linear. The main advantage of this kind of network is that the hidden layer weights are chosen randomly while the output layer weights can be optimally estimated using the pseudo-inverse solution of an over-determined set of linear equations which is, also, the solution of the least-squares error between the inputs and outputs to/from the neural network. In our proposed ensemble, the inputs (to the ensemble) are the output decisions of four known classifiers; for miRNA prediction; and the output is the corresponding ground truth decision. Therefore, our objective is to learn/calculate the best network weights that map the decisions of the adopted classifiers into a single fused output decision [35, 36].
The hidden layer parameters (15 parameters): (w ik ) are the weights between the four inputs and the three hidden neurons plus three biases for the hidden neurons.
The output layer parameters (4 parameters): (w kj ) are the weights between the three hidden neurons and the single output neuron plus the bias of the output neuron.
The inputs to the neural network are the outputs of the classifiers described previously and the teaching output is ‘1’(‘-1’) corresponding to true (false) miRNA. The neural network MATLAB toolbox has been used for modelling and training of the network.Sigmoid activation functions are used for the hidden layer neurons to maintain the non-linearity of the used classifiers. A linear weighted combination of the outputs of the hidden neurons represents the output of the ensemble. This linear combination is produced by using a linear activation function for the output neuron as shown in Figure 1.
The neural network parameters
The optimal hidden layer parameters
The optimal output layer parameters
Results and discussions
Performance evaluation and prediction results of our proposed ensemble classifier are discussed below.
Performance evaluation of the ensemble method using the classification statistics
A testing data set consisting of 500 known human pre-miRNAs and 1000 pseudo hairpins - different than those used in training - retrieved from miRBase19  and human RefSeq genes ; respectively; has been used for testing the performance of the already trained ensemble classifier.
Performance of ensemble-based classifier versus the other adopted classifiers
Performance evaluation of the ensemble method using the receiver operating characteristic
ROC- based evaluation metrics of the adopted and designed classifiers
95% Confidence interval
0.709 to 0.754
0.832 to 0.869
0.725 to 0.770
0.727 to 0.772
0.836 to 0.872
The proposed classifier has been applied on the metagenomic data sets; described in Section ‘Methods’; and the obtained results are as follows: 106 miRNA candidates have been discovered in the mine drainage metagenome, 55 miRNA candidates have been identified in the ground-water metagenome and 18 miRNA candidates have been predicted in the marine metagenome.
Samples of the obtained prediction results
Most trusted homologues
Marine sequence 1
gma-MIR393f and oan-miR-1353
Glycine max and Ornithorhynchus anatinus
Marine sequence 9
osa-miR5072 and age- miR-513c-1
Oryza sativa and Ateles geoffroyi
Mine drainage sequence 1
ppt- miR1215 and pdi- miR7720
Physcomitrella patens and Brachypodium distachyon
Mine drainage sequence 18
pma-miR-138b and osa- miR1851
Petromyzon marinus and Oryza sativa
Mine drainage sequence 29
hco- miR-5983 and sme-miR-2167
Haemonchus contortus and Schmidtea mediterranea
Mine drainage sequence 35
ppt- miR537d and hma- miR-3005
Physcomitrella patens and Hydra magnipapillata
Mine drainage sequence 41
gga- miR-6611 and mtr-miR5037a
Gallus gallus and Medicago truncatula
Mine drainage sequence 53
aly- miR3444 and hsa-miR-4440
Arabidopsis lyrata and Homo sapiens
Mine drainage sequence 67
lja-miR7526f and cte- miR-96
Lotus japonicus and Capitella teleta
Mine drainage sequence 72
cel-miR-90 and dps-miR-2543a-1
Caenorhabditis elegans and Drosophila pseudoobscura
Mine drainage sequence 88
hsa- miR-3167 and bdi- miR7711
Homo sapiens and Brachypodium distachyon
Groundwater sequence 1
ssc-miR-486-2 and hsa- miR-661
Sus scrofa and Homo sapiens
Groundwater sequence 10
csi-miR3950 and cel-miR-87
Citrus sinensis and Caenorhabditis elegans
Groundwater sequence 16
mmu-miR-8112 and tgu-miR-2981
Mus musculus and Taeniopygia guttata
Groundwater sequence 23
rco-miR156h and hsa-miR-4483
Ricinus communis and Homo sapiens
Groundwater sequence 37
osa-miR531 and ggo-miR-760
Oryza sativa and Gorilla gorilla
Groundwater sequence 50
hsv1-miR-H17 and mmu-miR-5131
Herpes Simplex Virus 1 and Mus musculus
A computational tool for miRNA prediction; in genomic or metagenomic data; has been developed. It has been tested on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of highly probable miRNA hairpins for cloning prediction methods.
The results obtained in this paper are very promising, paving the road for future research in different directions. These directions include miRNA mining in genomic/metagenomic sequences, developing other approaches for ensemble classifiers design, and applying feature selection methods to choose a reduced set of uncorrelated features for miRNA prediction.
Availability and requirements
The proposed method is a neural network ensemble classifier. The outputs from four known miRNA prediction methods (Triple-SVM, MiPred, Virgo and EumiR); dealing with different miRNA features; are fed into a single hidden layer neural network that is trained to predict the likelihood that an input sample is a miRNA. The source code of each of the considered classifiers is freely accessible [2, 4, 14, 15]. The code for the neural network classifier is available as supplementary file. (See Additional file 1).
The used training and testing data sets consisting of known human pre-miRNAs and pseudo hairpins have been retrieved from miRBase19  and human RefSeq genes ; respectively. (See the data sets in Additional file 2).
The approach is applied on metagenomic sequences from different environments (mine drainage, groundwater and marine metagenomic sequences) downloaded from the NCBI sequence reed archive [20–22]. (The Metagenomic Samples are listed in Additional file 3).
106 miRNA candidates have been discovered in the mine drainage metagenome sample, 55 miRNA candidates have been identified in the ground-water metagenome sample and 18 miRNA candidates have been predicted in the marine metagenome sample. (The predicted sequences are listed in Additional file 4).
This research has been supported by the Ministry of Higher Education (MoHE) of Egypt through an PH.D. fellowship. This work has been done under partial support from Pharco Pharamaceutical Corporation, Alexandria, Egypt.
- Xu Y, Zhou X, Zhang W:Microrna prediction with a novel ranking algorithm based on random walks. Bioinformatics. 2008, 24 (13): i50-i58. 10.1093/bioinformatics/btn175.PubMedPubMed CentralView ArticleGoogle Scholar
- Xue C, Li F, He T, Liu G, Li Y, Zhang X:Classication of real and pseudo microrna precursors using local structure-sequence features and support vector machine. BMC Bioinformatics. 2005, 6 (1): 310-10.1186/1471-2105-6-310.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ:Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410. 10.1016/S0022-2836(05)80360-2.PubMedView ArticleGoogle Scholar
- Jiang P, Wu H, Wang W, Ma W, Sun X, Lu Z:Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007, 35 (suppl 2): W339-W344.PubMedPubMed CentralView ArticleGoogle Scholar
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP:The micrornas of caenorhabditis elegans. Genes Dev. 2003, 17 (8): 991-1008. 10.1101/gad.1074403.PubMedPubMed CentralView ArticleGoogle Scholar
- Lai EC, Tomancak P, Williams RW, Rubin GM:Computational identication of drosophila microrna genes. Genome Biol. 2003, 4 (7): R42-10.1186/gb-2003-4-7-r42.PubMedPubMed CentralView ArticleGoogle Scholar
- Bonnet E, Wuyts J, Rouzé P:Van de Peer Y: Detection of 91 potential conserved plant micrornas in arabidopsis thaliana and oryza sativa identies important target genes. Proc Natl Acad Sci USA. 2004, 101 (31): 11511-11516. 10.1073/pnas.0404025101.PubMedPubMed CentralView ArticleGoogle Scholar
- Jones-Rhoades MW, Bartel DP:Computational identification of plant micrornas and their targets, including a stress-induced mirna. Mol Cell. 2004, 14 (6): 787-799. 10.1016/j.molcel.2004.05.027.PubMedView ArticleGoogle Scholar
- Sewer A, Paul N, Landgraf P, Aravin A, Pfeffer S, Brownstein MJ, Tuschl T, van Nimwegen E, Zavolan M:Identication of clustered micrornas using an ab initio prediction method. BMC Bioinformatics. 2005, 6 (1): 267-10.1186/1471-2105-6-267.PubMedPubMed CentralView ArticleGoogle Scholar
- Hertel J, Stadler PF:Hairpins in a haystack: recognizing microrna precursors in comparative genomics data. Bioinformatics. 2006, 22 (14): e197-e202. 10.1093/bioinformatics/btl257.PubMedView ArticleGoogle Scholar
- Zheng Y, Hsu W, Lee M, Wong L:Exploring essential attributes for detecting microrna precursors from background sequences. Data Mining Bioinform. 2006, 4316: 131-145.Google Scholar
- Ng KLS, Mishra SK:De novo svm classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics. 2007, 23 (11): 1321-1330. 10.1093/bioinformatics/btm026.PubMedView ArticleGoogle Scholar
- Batuwita R, Palade V:micropred: effective classification of pre-mirnas for human mirna gene prediction. Bioinformatics. 2009, 25 (8): 989-995. 10.1093/bioinformatics/btp107.PubMedView ArticleGoogle Scholar
- Shiva K, Faraz A, Vinod S:Prediction of viral microrna precursors based on human microrna precursor sequence and structural features. Virol J. 2009, 6 (1): 129-10.1186/1743-422X-6-129.View ArticleGoogle Scholar
- Eukaryotic microRNA precursor prediction server (EumiR). 2009, [http://miracle.igib.res.in/eumir/],
- Pasaila D, Mohorianu I, Sucila A, Pantiru S, Ciortuz L:Yet another svm for mirna recognition: yasmir. Technical report, Citeseer. 2010,Google Scholar
- Handelsman J:Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev. 2004, 68 (4): 669-685. 10.1128/MMBR.68.4.669-685.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Tor Y, Hermann T, Westhof E:Deciphering rna recognition: aminoglycoside binding to the hammerhead ribozyme. Chem Biol. 1998, 5 (11): 277-283. 10.1016/S1074-5521(98)90286-1.View ArticleGoogle Scholar
- Chen K, Pachter L:Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol. 2005, 1 (2): e24-10.1371/journal.pcbi.0010024.PubMed CentralView ArticleGoogle Scholar
- Mine drainage metagenome. 2009, [http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=AAWO01],
- Groundwater metagenome. 2010, [http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=ADIG01],
- Marine metagenome. 2010, [http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=ADKQ01],
- Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Rådmark O:The nuclear rnase iii drosha initiates microrna processing. Nature. 2003, 425 (6956): 415-419. 10.1038/nature01957.PubMedView ArticleGoogle Scholar
- Lund E, Güttinger S, Calado A, Dahlberg JE, Kutay U:Nuclear export of microrna precursors. Science. 2004, 303 (5654): 95-98. 10.1126/science.1090599.PubMedView ArticleGoogle Scholar
- Wang X, Zhang J, Li F, Gu J, He T, Zhang X, Li Y:Microrna identification based on sequence and structure alignment. Bioinformatics. 2005, 21 (18): 3610-3614. 10.1093/bioinformatics/bti562.PubMedView ArticleGoogle Scholar
- Perrone MP, Cooper LN:When networks disagree: Ensemble methods for hybrid neural networks. Technical report, DTIC Document. 1992,Google Scholar
- Schapire RE:The strength of weak learnability. Mach Learn. 1990, 5 (2): 197-227.Google Scholar
- Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P:Fast folding and comparison of rna secondary structures. Monatshefte für Chemie/Chem Mon. 1994, 125 (2): 167-188.View ArticleGoogle Scholar
- Griffiths-Jones S:The microrna registry. Nucleic Acids Res. 2004, 32 (suppl 1): D109-D111.PubMedPubMed CentralView ArticleGoogle Scholar
- Pruitt KD, Maglott DR:Refseq and locuslink: Ncbi gene-centered resources. Nucleic Acids Res. 2001, 29 (1): 137-140. 10.1093/nar/29.1.137.PubMedPubMed CentralView ArticleGoogle Scholar
- Freyhult E, Gardner PP, Moulton V:A comparison of rna folding measures. BMC Bioinformatics. 2005, 6 (1): 241-10.1186/1471-2105-6-241.PubMedPubMed CentralView ArticleGoogle Scholar
- Bonnet E, Wuyts J, Rouzé P, Van de Peer Y:Evidence that microrna precursors, unlike other non-coding rnas, have lower folding free energies than random sequences. Bioinformatics. 2004, 20 (17): 2911-2917. 10.1093/bioinformatics/bth374.PubMedView ArticleGoogle Scholar
- Griffiths-Jones S, Grocock RJ, Van Dongen S, Bateman A:Enright AJ: mirbase: microrna sequences, targets and gene nomenclature. Nucleic Acids Res. 2006, 34 (suppl 1): D140-D144.PubMedPubMed CentralView ArticleGoogle Scholar
- Birney E, Andrews D, Cáccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Gräf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kähäri A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, et al:Ensembl 2006. Nucleic Acids Res. 2006, 34 (suppl 1): 556-561.View ArticleGoogle Scholar
- Huang S-C, Huang Y-F:Bounds on the number of hidden neurons in multilayer perceptrons. Neural Netw IEEE Trans on. 1991, 2 (1): 47-55. 10.1109/72.80290.View ArticleGoogle Scholar
- Huang G-B, Babri HA:Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. Neural Netw IEEE Trans on. 1998, 9 (1): 224-229. 10.1109/72.655045.View ArticleGoogle Scholar
- DeLong ER, DeLong DM, Clarke-Pearson DL:Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988, 44 (3): 837-845. 10.2307/2531595.PubMedView ArticleGoogle Scholar
- Fawcett T:An introduction to roc analysis. Pattern Recognit Lett. 2006, 27 (8): 861-874. 10.1016/j.patrec.2005.10.010.View ArticleGoogle Scholar
- Griffiths-Jones S, Saini HK, Van Dongen S, Enright AJ:mirbase: tools for microrna genomics. Nucleic Acids Res. 2008, 36 (suppl 1): D154-D158.PubMedPubMed CentralGoogle Scholar
- Turner M, Yu O, Subramanian S:Genome organization and characteristics of soybean micrornas. BMC Genomics. 2012, 13 (1): 169-10.1186/1471-2164-13-169.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.