Seforta, an integrated tool for detecting the signature of selection in coding sequences
© Camiolo et al.; licensee BioMed Central Ltd. 2014
Received: 9 January 2014
Accepted: 11 April 2014
Published: 16 April 2014
The majority of amino acid residues are encoded by more than one codon, and a bias in the usage of such synonymous codons has been repeatedly demonstrated. One assumption is that this phenomenon has evolved to improve the efficiency of translation by reducing the time required for the recruitment of isoacceptors. The most abundant tRNA species are preferred at sites on the protein which are key for its functionality, a behavior which has been termed “translational accuracy”. Although observed in many species, as yet no public domain software has been made available for its quantification.
We present here Seforta (Selection for Translational Accuracy), a program designed to quantify translational accuracy. It searches for synonymous codon usage bias in both conserved and non-conserved regions of coding sequences and computes a cumulative odds ratio and a Z-score. The specification of a set of preferred codons is desirable, but the program can also generate these. Finally, a randomization protocol calculates the probability that preferred codon combinations could have arisen by chance.
Seforta is the first public domain program able to quantify translational accuracy. It comes with a simple graphical user interface and can be readily installed and adjusted to the user's requirements.
In spite of the various mechanisms which have evolved to maintain mRNA translation accuracy, errors still arise at a rate of one every 103-104 codons . Between ten and 50% of random residue substitutions compromise protein function through their effect on the product's three dimensional structure . Coding sequence composition is expected to influence the rate of mistranslation errors [3, 4]. Because multiple aminoacyl tRNAs compete with one another for loading at the ribosome acceptor sites, codons corresponding to the most abundant tRNAs (preferred codons) tend to be translated with the highest fidelity. In E.coli, it has been demonstrated that the frequency of amino acid residue errors at preferred codons is approximately ten fold lower than at other codons . Missense errors are likely to induce their greatest deleterious effect when they occur in a region which is key for the protein's functionality. Thus preferred codons may be non-homogeneously distributed due to a variety of evolutionary constraints affecting different parts of the gene product. This phenomenon has been noted in a number of genomes and is referred to as “selection for translational accuracy” . Despite this, there is as yet no public domain software available to quantify translational accuracy. Here we present such a program, which we have called Seforta (SElection FOR Translational Accuracy); it benefits from a simple graphical user interface, and is designed to uncover the signature of selection for accuracy together with the identification of an optimal codon set.
Two alternative methods for the identification of preferred codons are available. The first is based on a comparison of codon frequencies between genes which are strongly or weakly expressed, using a conventional 2x2 contingency table analysis. Seforta identifies the two expression datasets to be compared by either selecting the two tails of the expression data distribution (based on a user-defined percentile value) or by dividing the dataset into a number of groups based on equal-sized intervals of expression level. When expression data are not available, the preferred codon lists can be identified via the correlation method proposed by Hershberg et al. [8, 9]. This method identifies which codon(s) increase in frequency as genes become more biased in their codon usage. Seforta calculates the overall codon bias of a gene by using the effective number of codons (Nc’), which measures bias without any prior assumption regarding the identity of the preferred codons, while also controlling for background nucleotide composition . The identification of preferred codons relies on the size of the Spearman correlation, having adjusted the overall level of codon bias for background nucleotide composition [8, 9].
Availability and requirements
Project name: seforta (version 0.1)
Project home page: http://sourceforge.net/projects/seforta/
Operating system: Linux 64-bit
Programming language: C++, Java
Other requirements: xterm must be installed
License: GNU GPL
Availability of supporting data
The software source code together with test files are available at the Project home page reported above.
We would like to thank the PhD program entitled “Scienze e biotecnologie dei Sistemi Agrari e Forestali e delle Produzioni alimentari” and the "Master and Back" program 2011 (RAS - Regione Autonoma della Sardegna) for financial support.
- Kramer EB, Farabaugh PJ: The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA. 2007, 13: 87-96.PubMedPubMed CentralView ArticleGoogle Scholar
- Pakula AA, Sauer RT: Genetic analysis of protein stability and function. Annu Rev Genet. 1989, 23: 289-310. 10.1146/annurev.ge.23.120189.001445.PubMedView ArticleGoogle Scholar
- Akashi H: Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics. 1994, 136: 927-935.PubMedPubMed CentralGoogle Scholar
- Drummond DA, Wilke CO: The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet. 2009, 10: 715-724. 10.1038/nrg2662.PubMedPubMed CentralView ArticleGoogle Scholar
- Precup J, Parker J: Missense misreading of asparagine codons as a function of codon identity and context. J Biol Chem. 1987, 262: 11351-11355.PubMedGoogle Scholar
- Akashi H: Translational selection and yeast proteome evolution. Genetics. 2003, 164: 1291-1303.PubMedPubMed CentralGoogle Scholar
- Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective analysis of disease. JNCI. 1959, 22: 719-PubMedGoogle Scholar
- Hershberg R, Petrov DA: Selection on codon bias. Annu Rev Genet. 2008, 42: 287-299. 10.1146/annurev.genet.42.110807.091442.PubMedView ArticleGoogle Scholar
- Hershberg R, Petrov DA: General rules for optimal codon choice. PLoS Genet. 2009, 5: e1000556-10.1371/journal.pgen.1000556.PubMedPubMed CentralView ArticleGoogle Scholar
- Novembre JA: Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol. 2002, 19: 1390-1394. 10.1093/oxfordjournals.molbev.a004201.PubMedView ArticleGoogle Scholar
- Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008, 134: 341-352. 10.1016/j.cell.2008.05.042.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.