- Technical Note
- Open Access
BMC Research Notes volume 8, Article number: 70 (2015)
Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community.
Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.
Annotating regions or features within nucleotide and protein sequences (such as locations of binding sites, conserved residues, transmembrane regions, protein domain boundaries, or protein secondary structure) is a ubiquitous part of biological research. Previous annotations are an essential component of experimental design and interpretation, and new sequence annotations are often the goal of new studies—themselves becoming part of subsequent experimental design and interpretation in future studies. Given the growth of sequence annotation data and the importance of these data in research, it is becoming increasingly important to effectively disseminate and visualize these data. Of particular importance is the ability to merge separate sequence annotations into a single view that allows for the interpretation of new data in the context of known annotations.
Aligning and displaying multiple sequence annotations is already a core feature of genome browsers—software designed for navigating whole genomes and capable of visualizing a very wide array of annotations for genetic loci. Prominent examples of genome browsers include the UCSC genome browser , GBrowse , the Ensembl genome browser , and JBrowse . While these tools are well-designed, mature, and feature rich; these tools are not designed to disseminate feature annotations for individual sequences outside the context of a broader genome. Other websites have developed web pages for displaying aligned feature annotations of individual protein sequences, including the UCSC Proteome Browser , the Protein Data Bank (PDB) , InterPro , WormBase , and the Saccharomyces Genome Database (SGD) . While well-designed and informative, these views are optimized for the particular features they are displaying. Additionally, they are only available as parts of their respective web sites and not as a generalized distributable tool that may be integrated into other websites.
Full implementation details, including examples and documentation of the interfaces for callback functions, input data format, and the customization options are provided at the Mason GitHub site at https://github.com/yeastrc/mason. Additionally, this site includes several pre-built modules for common sources of sequence annotations. These are discussed in more detail in the Results section.
The data will be read in from the indicated file location and a Mason viewer will be automatically created at the location of the DIV. (Note: because of web browser security models, the JSON file must be accessed via a web server and that must be the same web server address as the HTML file referencing it.) Alternatively, the text in above may be present within the page, itself, by leaving out the attribute and assigning the “masonData” variable equal to the text contents of the file inside of a < script > element. For full documentation, including the syntax of the JSON, examples, and download files for the generic JSON viewer, visit the Mason demo page at http://www.yeastrc.org/mason/.
Where is the location on the page to build the viewer (jQuery variable), includes the data to be displayed, includes configuration parameters, and is an object containing the customized callback functions that constitute a module for a given type of sequence annotation. Note that multiple Mason viewers may be added to the same page by making multiple calls to .
Detailed documentation for installation, the input data format, configuration parameters, and the callback functions are available at the Mason GitHub site at https://github.com/yeastrc/mason.
Graphical user interface
The Mason viewer graphically represents a sequence horizontally, with position 1 on the left and the final position on the right. Each set of feature annotations is represented as a separate row, where each annotation includes a starting and ending position in the sequence. These annotations are represented as blocks in that row that start and end at the specified positions (Figure 1). Mason is capable of displaying multiple rows of annotations per viewer, which is meant to display multiple sets of annotations of the same type from separate sources (e.g., sets of secondary structure predictions from different programs or protein coverage from multiple proteomics experiments) (Figure 2). Because sequence positions are consistent between multiple rows in the Mason viewer, the positions of the annotations may be directly compared between the different rows. Additionally, multiple Mason viewers containing data of different types may be available on the same page (e.g., one viewer for secondary structure predictions and one viewer for disordered regions) (Figure 2). The positions in the sequences between different viewers also line up and may also be directly compared. Furthermore, Mason is aware of multiple instances of the Mason viewer on the same page, and provides a visual indication of how annotations in distinct viewers line up when the user moves their mouse arrow over an annotation of interest (or tap on mobile devices) (Figure 2).
Overlapping feature annotations
Feature annotations may sometimes overlap in the sequence. For example, annotation A may describe positions 2–10 and annotation B may describe positions 8-19—creating overlapping annotations for positions 8–10. Visually, this will appear as a single block from positions 2–19; however, a clickable icon will appear to the left of the row label that indicates overlapping annotations are present. When click, that row will expand such that overlapping features are displayed in multiple rows, ensuring all distinct annotated features may be displayed (Figure 3).
Tooltips and click events
Text to appear in a tooltip when the user mouses-over (or taps) on any annotated feature may be defined in a callback function passed into the Mason viewer creator (see Implementation). Examples include displaying the starting and ending positions and the confidence scores associated with the annotation. Likewise, the result of clicking (or double tapping) on any of the annotated features may be similarly defined via another callback function. This may be useful as a means for users to click through to another web page with more information about the specific annotation.
Colors and shading
The color of the blocks in the Mason viewer may be customized via a callback function that has access to the data associated with the annotations. This enables a very broad range of capabilities regarding data visualization. Coloring schemes may range from simple (all blocks are the same color) to more sophisticated schemes that use shading to indicate annotation confidence scores or separate colors to indicate annotation properties (such as different colors for an alpha-helix or beta-sheets in secondary structure predictions).
Lines noting positions of interest
Mason may also display vertical lines at specific positions in the rows to note positions of interest that aid in interpretation of the data. Examples would include noting cleavage sites in DNA sequences or trypsin cut sites in protein sequence (Figure 4). The positions to draw lines is passed into the Mason creator, the color of the lines are defined via callback functions, and the visibility of the lines may be toggled via a simple function call to the Mason viewer.
Mason may optionally show a summary bar on the right-hand side of the rows to visually indicate some type of summary statistic associated with the entire row of sequence annotations. Examples including showing protein quantitation data or protein sequence coverage for a given mass spectrometry run. Multiple rows containing summary bars effectively provide a horizontal bar graph for comparing summary statistics between rows. Custom colors, shading, tooltips, and click handlers may be defined for the summary bars using callback functions.
The Mason viewer has been integrated into two upcoming (not yet published) large-scale proteomics data resources (Figure 5). In the first case (Figure 5A), Mason is used to visualize the relative abundance of a protein and the relative abundance of the individual peptides used to identify that protein across many different conditions. This implementation of Mason makes use of the summary bar feature (to the right of the rows) to show overall relative protein abundance, makes use of data-driven coloring and shading to provide an indicator for relative abundances of the peptides, and makes use of Mason’s ability to disambiguate overlapping annotations to show relative abundances of distinct peptides that were identified and to provide links for viewing the underlying mass spectrometry data collected for each peptide. FeatureViewer and pViz.js would not be suitable solutions for this visualization, as the row level summaries and dynamic disambiguation of overlapping annotations are essential aspects of this view of the data. Additionally, coloring and shading that describe underlying values in the data (such as quality or quantity of identifications) would be difficult to accomplish by pre-defining classes of colors using CSS, which is the default coloring model used by pViz.js.
In the second case (Figure 5B), Mason is again used to visualize the coverage of a protein in many different conditions—but in this case, those conditions are separate proteomics experiments where the protein was identified. The top viewer in the figure visualizes the coverage of this protein in many different runs, where the shading of the red blocks indicates the strength of the identification, and the row-level summary bars to the right indicate the overall protein coverage in that run. In this type of data, identifying overlapping peptides for a protein in an experiment is very common, so the ability to handle many overlapping annotations for the protein is essential to effectively disseminating the data. Attempting to show all disambiguated peptides from all runs at once in multiple tracks would result in a much more cluttered and non-informative view. To provide context, the remaining viewers on the page display annotations for this protein from other sources, and makes use of Mason’s ability to communicate between instances of the viewer to show precisely how annotations in one viewer map to the others.
Several pre-built code examples are available for displaying data from common sources of sequence annotations. Working demos and downloads are available at http://www.yeastrc.org/mason/.
Generic JSON module
The Mason site includes code for reading and displaying data formatted as JSON adhering to a simplified schema (available on the web site). This module is suitable for providing a simple view of sequence annotation data from nearly any source, especially data that has many overlapping annotations. This module supports overlapping features, tooltips, links to external URLs, and row-level coloring.
Transmembrane and signal peptides
The Mason site includes code for displaying transmembrane and signal peptide predictions from the Philius prediction server . The code accepts a protein sequence directly, submits this to the Philius prediction server, and displays the results in the newly-built Mason viewer. Only the protein sequence is required, and there is no need to install or run Philius on the part of the web site operator.
The Mason site includes code for displaying predicted protein secondary structure as generated by the psipred program . This is accomplished by pointing the code to the URL for a .ss2 file (PSIPRED VFORMAT) that is generated by the psipred program—the code for accessing the data and converting it to JSON is provided. Consequently, psipred must be run in advance and the resulting file made available on a web server.
The Mason site includes code for displaying predicted coiled-coil regions generated by the Paircoil2 program . This is accomplished by pointing the code to the URL for a .pc2 file that is generated by the Paircoil2 program—the code for accessing the data and converting it to JSON is provided. Consequently, Paircoil2 must be run in advance and the resulting file made available on a web server. This module also includes a custom options menu that allows the user to filter the data based on the P-score generated by Paircoil2.
The Mason site includes code for displaying predicted disordered regions generated by the DISOPRED program . This is accomplished by pointing the code to the URL for a .diso file that is generated by the DISOPRED program—the code for accessing the data and converting it to JSON is provided. Consequently, DISOPRED must be run in advance and the resulting file made available on a web server.
Availability and requirements
Project name: Mason
Project home page: https://github.com/yeastrc/mason
Operating system(s): Platform independent
Other requirements: None
License: Apache 2.0
Any restrictions to use by non-academics: None
Cascading style sheets
Distributed annotation system
Graphical user interface
Hypertext markup language
Scalable Vector Graphics
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res. 2002;12(6):996–1006.
Donlin MJ. Using the Generic Genome Browser (GBrowse). Curr Protoc Bioinformatics. 2009;Chapter 9(9):9.
Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids Res. 2014;42(Database issue):D749–55.
Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. JBrowse: a next-generation genome browser. Genome Res. 2009;19(9):1630–8.
Hsu F, Pringle TH, Kuhn RM, Karolchik D, Diekhans M, Haussler D, et al. The UCSC Proteome Browser. Nucleic Acids Res. 2005;33(Database issue):D454–8.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42.
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012;40(Database issue):D306–12.
Harris TW, Baran J, Bieri T, Cabunoc A, Chan J, Chen WJ, et al. WormBase 2014: new views of curated biology. Nucleic Acids Res. 2014;42(Database issue):D789–93.
Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 2012;40(Database issue):D700–5.
Garcia L, Yachdav G, Martin MJ. FeatureViewer, a BioJS component for visualization of position-based annotations in protein sequences. F1000Res. 2014;3:47.
Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, et al. The PeptideAtlas project. Nucleic Acids Res. 2006;34(Database issue):D655–8.
Reynolds SM, Kall L, Riffle ME, Bilmes JA, Noble WS. Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS Comput Biol. 2008;4(11):e1000213.
Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292(2):195–202.
McDonnell AV, Jiang T, Keating AE, Berger B. Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics. 2006;22(3):356–8.
Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004;20(13):2138–9.
This work is supported by grants P41 GM103533 (to T.N.D.) from the National Institute of General Medical Studies from the National Institutes of Health and the University of Washington Proteomics Resource (UWPR95794).
The authors declare that they have no competing interests.
DJ performed the programming and prepared online documentation. TND supported the project, provided scientific guidance, and contributed to the manuscript. MR conceived of and managed the project, and prepared the manuscript. All authors read and approved the final manuscript.
About this article
Cite this article
- Sequence annotation
- Data visualization
- Sequence feature annotation
- Feature annotation