ASMD dataset and browsing features
The ASMD web site consists of three main sections: a home page, a search page, and a public depositions area. The home page is the starting point and provides connections to all parts of the site. The search page is used for locating mutations and splicing effects in the database. It contains the complete search form at the bottom of the page. Figure 1 shows four entries of the ASMD and Figure 2 shows the search form. Views of mutations as well as sequences of genes and exons are accessible from this page. Figure 3 shows part of the detailed mutation view, which is accessible through the ASMD identifier. The public depositions area contains instructions and forms for the submission of mutations, published references, and notes.
The ASMD sequence data is available in FASTA format from a link on the home page. The informational lines in the file contain characteristics of the gene, the mutation, and the associated splicing effect(s) while the sequence contains the wild-type exon in which the mutation occurs. An explanation of the FASTA-formatted data is available on the web site.
ASMD usage
We expect researchers interested in understanding alternative splicing (AS) will use ASMD in their investigations in two complementary ways. By searching in ASMD for genes, exons, and mutations of interest, it is hoped that researchers may be able to link observed AS isoforms with particular mutations and their correlated sequence features, such as putative RNA secondary structures. And depositing new mutations and their splicing effects into ASMD, we foresee researchers interactively improving the power and utility of this resource.
Because ASMD fundamentally differs from other AS databases in its focus on the effects of mutations, it functions differently from other existing databases. Instead of receiving an exhaustive list of observed alternative splicing events for a gene or exon of interest, a researcher using ASMD can expect to find a curated list of small mutations that are correlated with alternative splicing effects, as documented in the literature. This will enable researchers to craft experiments accordingly, to either avoid duplication of effort or to further understanding of AS regulation, both at specific loci and in general.
Future development
The main task for the ASMD is to expand its dataset to cover all known mutations that affect splicing. The process of culling examples from the literature continues and new mutations are being added monthly. We are in the process of updating our sequences to build 36.1 of the human genome. Updates for tools and calculations will be performed every six months as the database grows.
Currently, entries are limited to mutations inside human exons. In future releases we wish to expand the domain to include mutations inside introns and in other mammalian species. Accordingly, we plan to expand our analysis of RNA secondary structures into all parts of pre-mRNA including introns and splicing junctions. Once a sufficient variety of exonic and intronic mutations is obtained for a given gene, a new display will be added to capture the effects of multiple mutations on alternative splicing. Where data exists, this display could also capture the synergistic effects of multiple mutations, a phenomenon already documented in the literature [23].
ASMD data analysis
ASMD version 1.1 data demonstrate that mutations affecting splicing are located throughout exons and are not restricted to the ends near splice junctions (see Fig. 4). An analysis of 34 unique exons in the database shows that their splice site strengths have median scores slightly below those of all human exons (see Fig. 5). The difference is small, however, compared to the standard deviation and is judged not to be significant.
ASMD version 1.1 data suggest that mutations affecting splicing are somewhat enriched within local RNA secondary structures (LRSS). Further, those mutations within LRSS may specifically avoid loops and may have a special preference for "dangling ends" (bases adjacent to helices in free ends and multi-loops).
We first observed that there are no strong LRSS in wild-type exons with mutations conferring a positive splicing effect (i.e. decreased skipping). The only putative LRSS in this subset of exons has a calculated minimum free energy (mfe) of -9.0 kcal/mol. None of the splice-affecting mutations in that exon (exon #10 of the CFTR gene) coincide with this putative secondary structure.
The ASMD version 1.1 dataset contains 91 mutations conferring a negative splicing effect (i.e. increased skipping). There is a greater prevalence of putative LRSS in the exons carrying these mutations. 11% of the bases in these exon sequences are within putative LRSS. The number of observed mutations within LRSS compared to random expectation represents an average enrichment of 21% for ten different combinations of folding parameters. The mutations that occur within putative LRSS of -10 kcal/mol or stronger are ASMD IDs 12, 25, 46, 47, 49, 52, 60, 73, 112, and 116.
We also examined the presence of splice-affecting mutations in stems and loops, where stem positions were further broken down into base-pairings, bulges, and dangling ends. Over the same set of parameter combinations, the average percentage of mutations within loops, base-pairings, bulges, and dangling ends is 5, 40, 35, and 20%, respectively.
We judge the current data to indicate a slight trend toward splice-affecting mutations occurring within the stems of local RNA secondary structures, specifically at the "dangling ends." However, subsequent Monte Carlo simulations with the appropriate statistical tests (Chi-squared or Fisher exact) revealed none of these trends to be statistically significant (α = 0.1) with the current data. Statistical evaluation of a larger data set should be performed to confirm or reject these hypotheses.