The Alternative Splicing Mutation Database: a hub for investigations of alternative splicing using mutational evidence

Background Some mutations in the internal regions of exons occur within splicing enhancers and silencers, influencing the pattern of alternative splicing in the corresponding genes. To understand how these sequence changes affect splicing, we created a database of these mutations. Findings The Alternative Splicing Mutation Database (ASMD) serves as a repository for all exonic mutations not associated with splicing junctions that measurably change the pattern of alternative splicing. In this initial published release (version 1.2), only human sequences are present, but the ASMD will grow to include other organisms, (see Availability and requirements section for the ASMD web address). This relational database allows users to investigate connections between mutations and features of the surrounding sequences, including flanking sequences, RNA secondary structures and strengths of splice junctions. Splicing effects of the mutations are quantified by the relative presence of alternative mRNA isoforms with and without a given mutation. This measure is further categorized by the accuracy of the experimental methods employed. The database currently contains 170 mutations in 66 exons, yet these numbers increase regularly. We developed an algorithm to derive a table of oligonucleotide Splicing Potential (SP) values from the ASMD dataset. We present the SP concept and tools in detail in our corresponding article. Conclusion The current data set demonstrates that mutations affecting splicing are located throughout exons and might be enriched within local RNA secondary structures. Exons from the ASMD have below average splicing junction strength scores, but the difference is small and is judged not to be significant.


Background
About 50% of mammalian genes exhibit alternative splicing (AS) -the production of multiple mRNA isoforms from the same gene, often in a tissue-or development stage-specific manner. In humans, the number of different types of expressed mRNA appears to be two to three times higher than the total number of genes [1,2]. The regulation of alternative splicing is a very intricate process which involves the interaction of dozens of spliceosomal proteins with a great variety of short sequence motifs inside exons and introns. These regulatory motifs are known as exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs), intronic splicing enhancers (ISEs), and intronic splicing silencers (ISSs) [1,3]. Pre-mRNA secondary structures are also important players in the regulation of alternative splicing (see review [4]).
Significant progress in understanding AS has been achieved in experimental research that characterized a number of splicing enhancers and silencers [5][6][7][8][9] and also in several bioinformatics approaches for computational inference of ESEs and ESSs [10][11][12][13][14][15][16][17][18]. Despite this progress, one cannot predict a tendency to alternative splicing from genomic data. A set of mutations known to be associated with alternative splicing effects (reviewed by [19,9]) provides valuable raw material for a broad range of studies aiming to elucidate mechanisms of spliceosomal regulation.
In order to advance this area of research, we have created the Alternative Splicing Mutation Database (ASMD) -a collection of human exon sequences with short (1-6 nucleotides) internal mutations that change the balance of alternatively spliced mRNA isoforms or cause the appearance of new mRNA isoforms. The ASMD includes only those mutations that change exonic enhancers and silencers and does not encompass those that change splice sites (deletion of existing splice junctions or creation of novel junctions). The ASMD is manually curated such that each entry is meticulously verified with published literature describing the influence of the mutation on alternative splicing. This information has been converted into a novel parameter, termed "Splicing Effect" or SE value. The SE value lies within a range of [-1, +1] and reflects the effect of a mutation on an observed change in the pattern of alternative splicing. In the case of exon skipping, for example, SE = -1 means that a mutation causes 100% skipping of the constitutive wild-type exon. The database also contains an evaluation of the accuracy of the experimental techniques underlying the SE value for each mutation. The ASMD web site allows for the display of an array of information on every database entry, including splice site strength scores and putative RNA secondary structures.
There already exist many AS-related databases dating back to 1999. They are all important for their contributions to the understanding of alternative splicing. Nevertheless, the ASMD's focus on mutations sets it apart from each of these efforts. Analyzing a high-quality, curated database of mutations could conceivably lead to the identification of novel mediators of splicing and give a unique evaluation of the strength of splicing enhancers and silencers.

Construction and content
The Alternative Splicing Mutation Database (ASMD) version 1 uses a relational database (MySQL) to accurately represent the relationships between the core entities: genes, mutations, and splicing effects. In addition, the database incorporates annotation information in the form of putative local RNA secondary structures, splice sites and their consensus value and log-odds scores. Finally, references, notes, and depositor information has been included in the database to facilitate long-term growth and collaboration.
All wild-type sequences are derived from the human Exon-Intron Database, most from version 35p1, some from version 36p1 [20,21]. Both wild-type and mutant exon sequences for each mutation are stored in the sequences table. Mutant sequences are generated by the incorporation of published mutations into the wild-type sequence. All sequences are then properly annotated in the sequence feature table. Splice site scores are calculated using both the consensus value and log-odds methods, as described in Zhang et al. 2005 [10]. Local RNA secondary structures are predicted using the RNALfold utility from the Vienna RNA package, version 1.6.1 [22], with default parameters and a window size of 30 nucleotides. Only structures with a minimum free energy (mfe) of -10.0 kcal/mol or lower were loaded into the database.
Explanations of "Splicing Effect" (SE) values, determination of SE accuracy levels, and other parameters are provided in the glossary, which is accessible from the home page.

ASMD dataset and browsing features
The ASMD web site consists of three main sections: a home page, a search page, and a public depositions area. The home page is the starting point and provides connections to all parts of the site. The search page is used for locating mutations and splicing effects in the database. It contains the complete search form at the bottom of the page. Figure 1 shows four entries of the ASMD and Figure  2 shows the search form. Views of mutations as well as sequences of genes and exons are accessible from this page. Figure 3 shows part of the detailed mutation view, which is accessible through the ASMD identifier. The pub-lic depositions area contains instructions and forms for the submission of mutations, published references, and notes.
The ASMD sequence data is available in FASTA format from a link on the home page. The informational lines in the file contain characteristics of the gene, the mutation, and the associated splicing effect(s) while the sequence contains the wild-type exon in which the mutation occurs. An explanation of the FASTA-formatted data is available on the web site.

ASMD usage
We expect researchers interested in understanding alternative splicing (AS) will use ASMD in their investigations in two complementary ways. By searching in ASMD for genes, exons, and mutations of interest, it is hoped that researchers may be able to link observed AS isoforms with particular mutations and their correlated sequence features, such as putative RNA secondary structures. And depositing new mutations and their splicing effects into ASMD, we foresee researchers interactively improving the power and utility of this resource. Because ASMD fundamentally differs from other AS databases in its focus on the effects of mutations, it functions differently from other existing databases. Instead of receiving an exhaustive list of observed alternative splicing events for a gene or exon of interest, a researcher using ASMD can expect to find a curated list of small mutations that are correlated with alternative splicing effects, as documented in the literature. This will enable researchers to craft experiments accordingly, to either avoid duplication of effort or to further understanding of AS regulation, both at specific loci and in general.

Future development
The main task for the ASMD is to expand its dataset to cover all known mutations that affect splicing. The process of culling examples from the literature continues and new mutations are being added monthly. We are in the process of updating our sequences to build 36.1 of the human genome. Updates for tools and calculations will be performed every six months as the database grows.
Currently, entries are limited to mutations inside human exons. In future releases we wish to expand the domain to include mutations inside introns and in other mammalian species. Accordingly, we plan to expand our analysis of RNA secondary structures into all parts of pre-mRNA including introns and splicing junctions. Once a sufficient variety of exonic and intronic mutations is obtained for a given gene, a new display will be added to capture the effects of multiple mutations on alternative splicing. Where data exists, this display could also capture the synergistic effects of multiple mutations, a phenomenon already documented in the literature [23]. ASMD data analysis ASMD version 1.1 data demonstrate that mutations affecting splicing are located throughout exons and are not restricted to the ends near splice junctions (see Fig. 4). An analysis of 34 unique exons in the database shows that their splice site strengths have median scores slightly below those of all human exons (see Fig. 5). The difference is small, however, compared to the standard deviation and is judged not to be significant. ary structures (LRSS). Further, those mutations within LRSS may specifically avoid loops and may have a special preference for "dangling ends" (bases adjacent to helices in free ends and multi-loops).
We first observed that there are no strong LRSS in wildtype exons with mutations conferring a positive splicing effect (i.e. decreased skipping). The only putative LRSS in this subset of exons has a calculated minimum free energy (mfe) of -9.0 kcal/mol. None of the splice-affecting mutations in that exon (exon #10 of the CFTR gene) coincide with this putative secondary structure.
The ASMD version 1.1 dataset contains 91 mutations conferring a negative splicing effect (i.e. increased skipping). There is a greater prevalence of putative LRSS in the exons carrying these mutations. 11% of the bases in these exon sequences are within putative LRSS. The number of observed mutations within LRSS compared to random expectation represents an average enrichment of 21% for ten different combinations of folding parameters. The mutations that occur within putative LRSS of -10 kcal/mol or stronger are ASMD IDs 12, 25, 46, 47, 49, 52, 60, 73, 112, and 116.
We also examined the presence of splice-affecting mutations in stems and loops, where stem positions were further broken down into base-pairings, bulges, and dangling ends. Over the same set of parameter combinations, the average percentage of mutations within loops, base-pairings, bulges, and dangling ends is 5, 40, 35, and 20%, respectively.
We judge the current data to indicate a slight trend toward splice-affecting mutations occurring within the stems of local RNA secondary structures, specifically at the "dangling ends." However, subsequent Monte Carlo simulations with the appropriate statistical tests (Chi-squared or Fisher exact) revealed none of these trends to be statistically significant (α = 0.1) with the current data. Statistical evaluation of a larger data set should be performed to confirm or reject these hypotheses.

Conclusion
The ASMD represents a collection of small internal exonic mutations, not associated with splicing junctions, that change the pattern of alternative splicing. The ASMD web site allows a user to explore the connections between mutations and features of their surrounding sequences, including putative RNA secondary structures and strengths of splice junctions. As the database grows, so too will the predictive power of associated tools and our understanding of the mechanisms regulating alternative splicing. By creating the ASMD public deposition area, we encourage the scientific community to participate in the development of the database.

Methods
All calculations were performed using the ASMD dataset version 1.1, which contained 119 mutations in 37 exons. It is implemented using MySQL and PHP on GNU/Linux.
A set of 20,433 sequences of human intron-containing protein coding genes from the Exon-Intron Database [20,21] was purged of all homologs (≥50% protein identity) and of genes with multiple repetitive domains (more than 4 repeats of the same 5-aa fragment) to obtain a reduced set of 11,316 human genes. This sample of nonredundant human genes is available from our web page http://hsc.utoledo.edu/depts/bioinfo/asmd/ as file "HS35.1.purge3.dEID".