RCDI/eRCDI: a web-server to estimate codon usage deoptimization
© Puigbò et al; licensee BioMed Central Ltd. 2010
Received: 5 January 2010
Accepted: 31 March 2010
Published: 31 March 2010
The Relative Codon Deoptimization Index (RCDI) was developed by Mueller et al. (2006) as measure of codon deoptimization by comparing how similar is the codon usage of a gene and the codon usage of a reference genome.
RCDI/eRCDI is a web application server that calculates the Relative Codon Deoptimization Index and a new expected value for the RCDI (eRCDI). The RCDI is used to estimate the similarity of the codon frequencies of a specific gene in comparison to a given reference genome. The eRCDI is determined by generating random sequences with similar G+C and amino acid composition to the input sequences and may be used as an indicator of the significance of the RCDI values. RCDI/eRCDI is freely available at http://genomes.urv.cat/CAIcal/RCDI.
This web server will be a useful tool for genome analysis, to understand host-virus phylogenetic relationships or to infer the potential host range of a virus and its replication strategy, as well as in experimental virology to ease the step of gene design for heterologous protein expression.
Some studies suggest that viral genomes may acquire replicative fitness by selection in one of two ways. First, a codon usage optimized genome, as observed for viruses with high replication rates, such as Poliovirus [1, 2]. Alternatively, selection might favour codon usage deoptimized genomes, as observed for viruses with low replication rates, such as Papillomaviruses [3, 4]. However, some viruses may have both codon optimized and codon deoptimized genes which may be associated with a lytic or lysogenic phase respectively . In addition, some authors propose that suboptimal codons may be used as a complementary strategy to develop polio vaccines [6–8].
The Relative Codon Deoptimization Index (RCDI) was developed by Mueller et al. (2006) as a measure of codon deoptimization, by comparing the similarity in codon usage of a given gene to reference genome. Though the RCDI is a useful index for virus research, it has never been implemented for public use. In this article we describe the web-server RCDI/eRCDI that calculates the RCDI of a set of genes using the codon usage of a reference genome. However, the codon usage of a gene may be the product of compositional biases such as G+C or amino acids composition rather than a codon usage adaptation . Therefore, an expected value of the RCDI (eRCDI) is implemented in order to test the significance of the RCDI values.
The RCDI/eRCDI program has been implemented in PHP as a web-server and it is freely available as one of the tools of the CAIcal server . The program is also available for local use as a Perl script to calculate the RCDI and eRCDI for a great amount of sequences.
The inputs to calculate the RCDI are: 1) a gene (or set of genes) in FASTA format; 2) the codon usage table of the reference genome in codon usage database  format and 3) the genetic code. Other input parameters are related to the calculation of the eRCDI and may be modified according to user requirements. The web-server provides 4 outputs: 1) Genes' parameters: the RCDI values, the frequencies for all codons [(CiFa/CiFh)Ni; see description below] and the %G+C of each gene; 2) Global parameters: the mean %G+C and amino acid composition, and the number of sequences used to calculate the eRCDI; 3) Statistical tests: the RCDI/eRCDI server performs a Kolmogorov-Smirnov test for normality of the eRCDI and a Chi-Square Goodness-of-Fit test to evaluate if the test sequences are homogeneous in %G+C and amino acids composition and 4) Expected RCDI: the mean RCDI value from random sequences and the eRCDI value.
Relative Codon Deoptimization Index (RCDI)
Expected Relative Codon Deoptimization Index (eRCDI)
Since the codon usage of genes depends on their G+C content, it is essential to discern whether codon usage differences are statistically significant and arise from differences in codon preferences or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences. For this purpose, the estimation of expected values can be useful . An expected value to test codon adaptation was first described for the Codon Adaptation Index (CAI) . This expected value is calculated from N random sequences with a similar %G+C and amino acids composition of the query sequences. Random sequences are generated either by Poisson or Markov methods . Using the same principle, an eRCDI can be calculated from the RCDI of each of these random sequences. Therefore, the eRCDI value corresponds to the upper-tolerance limit, based on the mean and standard deviation, and the level of confidence and coverage introduced by the user. The eRCDI estimated by our server provides a threshold value to estimate if a gene is deoptimized and whether this deoptimization is the product of compositional biases (%G+C and/or amino acid composition).
Interpretation of the results
RCDI values, based on the Mueller et al. (2006) article , may allow for an estimation of the translation rate of viral genes in particular, and the whole genome in general. The higher the similarity between viral and host genes (close to an RCDI value of 1), the higher the translation rate is . The first and most straightforward application is to improve expression of proteins in heterologous expression systems, whether increasing the translation rate or manipulating it to obtain proper folding by choosing a sequence with RCDI close to that of the viral natural host. Furthermore, from a genetic point of view, estimation of RCDI may provide insight into the degree of coevolution between hosts and viruses. A low RCDI might indicate high adaptation to a host. Moreover, a high RCDI might also indicate that some genes are expressed in latency phases or even that the virus might present a low replication rate.
RCDI/eRCDI complements information from other indices
Both indices, CAI and ENC can be associated with an expected value, the expected CAI by generation of random sequences  and an estimated ENC by a formula that uses the %G+C at the third position of the codon . In this article we introduce the eRCDI as a measure to assess the significance of the RCDI value. Thus, based on the RCDI and eRCDI, it is possible to estimate if a gene is deoptimized and if this deoptimization is the product of compositional biases (%G+C and/or amino acid composition).
The RCDI is a useful index to assess how similar the codon usage of a given gene is to the codon usage of a reference genome and to test deoptimization levels in viral genomes. However, until now, it has never been, implemented for public use. Therefore, the RCDI/eRCDI server will be a useful tool for genome analysis. For experimental virology, it will allow efficient gene design for heterologous protein expression. At a genetic level, it may help to establish and understand host-virus phylogenetic relationships as well as infer the potential host range of a virus and its replication strategy.
Availability and requirements
Project name: RCDI/eRCDI
Project home page: http://genomes.urv.cat/CAIcal/RCDI
Operating system(s): Platform independent
Programming language: PHP (local version in Perl and tcl/tk)
Other requirements: local version requires perl and tcl/tk installed.
License: no restrictions for academic and non-commercial use.
List of abbreviations
Relative Codon Deoptimization Index
Codon Adaptation Index
Effective Number of Codons
G+C percentage at the third position of the codon.
We thank the NIH Fellows Editorial Board for editorial assistance and Ben Busby for editing a late version of the manuscript. This work was supported by the Department of Health and Human Services intramural program (National Institutes of Health, National Library of Medicine). We also thank the two anonymous reviewers for their constructive suggestions.
- Mueller S, Papamichail D, Coleman JR, Skiena S, Wimmer E: Reduction of the rate of poliovirus protein synthesis through large-scale codon deoptimization causes attenuation of viral virulence by lowering specific infectivity. Journal of virology. 2006, 80 (19): 9687-9696. 10.1128/JVI.00738-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Burns CC, Shaw J, Campagnoli R, Jorba J, Vincent A, Quay J, Kew O: Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region. Journal of virology. 2006, 80 (7): 3259-3272. 10.1128/JVI.80.7.3259-3272.2006.PubMed CentralPubMedView ArticleGoogle Scholar
- Bravo IG, Muller M: Codon usage in papillomavirus genes: practical and functional aspects. Papillomavirus Report. 2005, 16: 63-72. 10.1179/095741905X24996.View ArticleGoogle Scholar
- Cladel NM, Hu J, Balogh KK, Christensen ND: CRPV genomes with synonymous codon optimizations in the CRPV E7 gene show phenotypic differences in growth and altered immunity upon E7 vaccination. PLoS ONE. 2008, 3 (8): e2947-10.1371/journal.pone.0002947.PubMed CentralPubMedView ArticleGoogle Scholar
- Karlin S, Blaisdell BE, Schachtel GA: Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypotheses. Journal of virology. 1990, 64 (9): 4264-4273.PubMed CentralPubMedGoogle Scholar
- Chumakov K, Ehrenfeld E: New generation of inactivated poliovirus vaccines for universal immunization after eradication of poliomyelitis. Clin Infect Dis. 2008, 47 (12): 1587-1592. 10.1086/593310.PubMed CentralPubMedView ArticleGoogle Scholar
- Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S: Virus attenuation by genome-scale changes in codon pair bias. Science. 2008, 320 (5884): 1784-1787. 10.1126/science.1155761.PubMed CentralPubMedView ArticleGoogle Scholar
- Roy P, Boyce M, Noad R: Prospects for improved bluetongue vaccines. Nature reviews. 2009, 7 (2): 120-128. 10.1038/nrmicro2052.PubMedGoogle Scholar
- Puigbo P, Bravo I, Garcia-Vallve S: E-CAI: a novel server to estimate an expected value of Codon Adaptation Index (eCAI). BMC Bioinformatics. 2008, 9 (1): 65-10.1186/1471-2105-9-65.PubMed CentralPubMedView ArticleGoogle Scholar
- Puigbo P, Bravo I, Garcia-Vallve S: CAIcal: A combined set of tools to assess codon usage adaptation. Biology Direct. 2008, 3 (1): 38-10.1186/1745-6150-3-38.PubMed CentralPubMedView ArticleGoogle Scholar
- Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Research. 2000, 28: 292-10.1093/nar/28.1.292.PubMed CentralPubMedView ArticleGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research. 1987, 15: 1281-1295. 10.1093/nar/15.3.1281.PubMed CentralPubMedView ArticleGoogle Scholar
- Wright F: The 'effective number of codons' used in a gene. Gene. 1990, 87 (1): 23-29. 10.1016/0378-1119(90)90491-9.PubMedView ArticleGoogle Scholar
- Puigbo P, Guzman E, Romeu A, Garcia-Vallve S: OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Research. 2007, 35: W126-131. 10.1093/nar/gkm219.PubMed CentralPubMedView ArticleGoogle Scholar
- Puigbo P, Romeu A, Garcia-Vallve S: HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection. Nucleic Acids Research. 2008, 36: D524-D527. 10.1093/nar/gkm831.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.