mmView: a web-based viewer of the mmCIF format
© Svozil et al; licensee BioMed Central Ltd. 2011
Received: 16 September 2010
Accepted: 12 April 2011
Published: 12 April 2011
Structural biomolecular data are commonly stored in the PDB format. The PDB format is widely supported by software vendors because of its simplicity and readability. However, the PDB format cannot fully address many informatics challenges related to the growing amount of structural data. To overcome the limitations of the PDB format, a new textual format mmCIF was released in June 1997 in its version 1.0. mmCIF provides extra information which has the advantage of being in a computer readable form. However, this advantage becomes a disadvantage if a human must read and understand the stored data. While software tools exist to help to prepare mmCIF files, the number of available systems simplifying the comprehension and interpretation of the mmCIF files is limited.
In this paper we present mmView - a cross-platform web-based application that allows to explore comfortably the structural data of biomacromolecules stored in the mmCIF format. The mmCIF categories can be easily browsed in a tree-like structure, and the corresponding data are presented in a well arranged tabular form. The application also allows to display and investigate biomolecular structures via an integrated Java application Jmol.
The mmView software system is primarily intended for educational purposes, but it can also serve as a useful research tool. The mmView application is offered in two flavors: as an open-source stand-alone application (available from http://sourceforge.net/projects/mmview) that can be installed on the user's computer, and as a publicly available web server.
The Protein Data Bank (PDB)  is a publicly available central repository containing experimentally determined structures of proteins, nucleic acids and complex assemblies. The core of the system are relational databases, together with the so-called „PDB archive“ - a collection of the manually curated flat (i.e. ASCII) files. They are available for download in three different formats: the legacy PDB format [2–4], the mmCIF format [5, 6] and the PDBML format .
The PDB format  is still the most widely supported and used because of its simplicity. It uses fixed format records (i.e. the individual entries must be put in specified character positions) with maximum line's width of 80 characters (a reminiscence of Fortran's 80 column wide punched cards), and allows for description of atomic coordinates, chemical and biochemical features, experimental details of structure determination, and some structural features such as secondary structure assignments or biological assemblies. The current PDB format, being a legacy format, cannot fully address many informatics challenges related to the growing amount of structural data. The main limitation is the fixed width format placing absolute limits on the size of data items. For instance, the maximum number of atoms represented in a single PDB file is limited to 99 999, and large molecular systems, such as ribosomal units, cannot be represented in a single PDB entry. The experimental details are stored in REMARK records that are relatively easy for a human to read, however the automatic extraction of information from these records is rather difficult. PDB format also suffers from serious internal inconsistencies, such as relation between the sequence defined by the SEQRES records and the sequence derived from the observed residues within the ATOM records.
To overcome the limitations of the PDB format, a new textual format mmCIF was released in June 1997 [5, 6]. mmCIF is the extension of the Crystallographic Information File (CIF)  format developed by the International Union of Crystallography (IUCr) and used for description of small molecule structures and associated difraction experiments. The format of the CIF dictionary (and the data based on that dictionary) conforms to a restricted version of the Self Defining Text Archive and Retrieval (STAR) representation . STAR is a general ontology framework defining a set of encoding rules. These rules are then used by the Dictionary Definition Language (DDL) that enables the definition of various terms needed by a given discipline. The DDL provides a convention for naming and defining data items within the dictionary, declaring specific attributes of those data items, and for declaring relationships between data items. The STAR encoding rules and the DDL are widely used to develop a variety of domain specific dictionaries , e. g. the powder diffraction dictionary  or an NMR dictionary , including the Crystallographic Information File (CIF). The CIF dictionary was extended to include data relevant to the macromolecular experiments, and the mmCIF (macromolecular CIF) was created. The version 1.0 of the mmCIF format was further expanded by more than 100 new definitions leading to the release of the mmCIF 2.0 in the Fall of 2000.
The name construct is of the form _category.extension (e.g. _atom_site.group_PDB) where category defines a natural grouping of data items contained within a single loop_. The values corresponding to the names from the header are given in individual lines (corresponding to individual data items). Thus, e.g. the value of the isotropic atomic displacement parameter (given by _atom_site.B_iso_or_equiv name) is the 15th value in each of the lines (Figure 1). Unknown data values are represented by a question mark (?), and undefined data are represented by a period (.) (Figure 1).
Compared to PDB, mmCIF provides extra information which has the advantage of being in a computer readable form. However, the biggest advantage of the mmCIF - well-formed, consistent (though also not perfect ) and computer readable format for storing the macromolecular structural data - becomes its disadvantage if a human must read and understand the stored data, as demonstrated in the example above (Figure 1). This makes difficult not only the adoption of mmCIF files, but also lays barriers in the educational process, e.g. in classes of structural biology or structural bioinformatics.
The importance of the mmCIF format is demonstrated by the fact, that it represents the data standard upon which the PDB is built . PDB uses the data processing tool MAXIT, an integrated system helping to ensure that the data submitted are consistent with the mmCIF dictionary. In addition the schema of the PDB's core database is a subset of the conceptual schema specified by the mmCIF dictionary. Thus, the knowledge of the mmCIF format becomes essential for each scientist dealing with biomacromolecular structures. While software tools exist to help to prepare mmCIF files , the number of available systems simplifying the comprehension and interpretation of the mmCIF files is limited . Therefore a web-based application mmView allowing for comfortable exploring of the biomolecular structural data stored in the mmCIF format was developed.
The View layer (Figure 2) controls access to the web services called from the mmView system. This includes the PDB server, from which mmCIF files are downloaded, and the PubMed service , from which additional bibliographic information is received. Several third party specific-purpose modules are employed in the mmView system. The key component is the mmLib  parser - a library for the analysis and manipulation of macromolecular structural models stored in mmCIF format. It significantly simplifies programmatic access to the biomolecular structures by parsing mmCIF (or PDB) files directly into the set of Python objects (Figure 2). To display structures in 3D the open-source software molecular structure viewer Jmol  is integrated into the mmView application, and its look and feel is adjusted usings Jmol scripting abilities.
Results and discussion
All information about structure is presented in two different ways. The leftside menu is divided in two parts, where the first part (Figure 3b, further referred to as Aggregated view) represents the most important information about the structure. The second part (Figure 3c, further referred to as Category view) contains list of all categories presented in the analyzed mmCIF file.
The Aggregated view contains six topics covering various aspects of molecular structure. Individual topics, containing data combined from different mmCIF categories, simplify the access to the most significant information. Hypertext links to the structure's original mmCIF categories are available at every Aggregated view page. In Aggregated view the standard data present in mmCIF file may also be enriched by the additional information obtained from external resources. Each of the topics is described in the integrated help. Aggregated view contains the following topics.
Basic information about the investigated structure is summarized under the Structure info topic and contains the PDB ID, the full title of the primary reference, names of authors and dates of deposition and release of the structure. If the cif file contains sections describing the presence of mutations, ligands, modified residues or oligomeric state of the structure, such information is also displayed in this section.
Biomolecular structure consists of one or more entities, large units describing the chemistry of molecules under investigation. Entities are of three types: polymer (DNA, RNA or protein), non-polymer (e.g. ions), and water. Entities consist of chemical components - all monomers (residues, ions, water) found in the structure. Entities are summarized in the Entities topic, and components are displayed in the Chemical components topic (for its description, see next paragraph).
The Entities topic gives general overview about all entities found in the structure. If the entity is of the polymer type, short description and sequence using one letter codes (codes closed in brackets correspond to non-standard residues) is displayed. Instead of original mmCIF terminology used in _entity_poly.type item (polypeptide D/L, polydeoxyribonucleotide, polyribonucleotide, polysacharide D/L) entity types are labeled as protein, DNA, RNA or polysacharide. For all other entities (non-polymer, water) code and description is displayed. The image of the chemical structure will appear on roll over the non-polymer entity code.
Experimental details and conditions are summarized in the Experiment topic. The provided information depends on the type of experimental method. Two most common experimental approaches for biomolecular structure determination are X-ray crystallography, and nuclear magnetic resonance (NMR).
residual factors R for all reflections that satisfy the given resolution limits
high and low resolution of interplanar spacings for the reflection data used in the refinement
number of all and observed reflections
source of diffraction rays
crystal grow details
the pH at which the crystal was grown
the pH of the solution
software used for refinement
For NMR experiments the following attributes, if given in mmCIF, are displayed:
strength of the magnetic field
the pH at which the NMR data were collected
number of models
software used in structure modelling
Structure 3D view
show/hide unit cell
show/hide wireframe model of the structure
change size of the spacefill of atoms
change the background color
view the structure in a cartoon representation
explore the solvent accesible and Van der Waals surfaces
rotate the structure
zoom the structure
The expandable Category view (Figure 3c) lists all categories available in the mmCIF file. Each item represents the original table stored in the mmCIF file without presenting any additional data (e.g. citation information from the PubMed database). Only the categories present in the given mmCIF file are shown. The category help available for each category is distinguished from the mmView's integrated help by using the icon with circled question mark. This depiction indicates that link leads to external resources, in this case to the HTML version of the IUCr standard mmCIF dictionary. In this way the up-to-date information is always available for users of the the mmView system.
mmView can display only the information that is available in the corresponding mmCIF file. However, if the data in the mmCIF file are not correct or the file contains errors, these are shown as they are. The number of requests that can be made on the PubMed server is currently limited to three requests every second (state of 19. 7. 2010). Thus, depending on the number of publications, generating the Citations topic can be slow. The dynamically generated leftside menu contains only categories present in the given mmCIF file. However, parsing large mmCIF files is a slow procedure, and thus it can produce additional delays in displaying leftside menu, especially for large structures. Another drawback of the mmView application is the fact, that at present it does not allow to investigate custom mmCIF files. Only structures from the PDB database can be studied, and their PDB IDs must be known in advance utilizing e.g. the search capability of the PDB web site.
Future development directly relates to the above mentioned disadvantages of the mmView application. The delays can be avoided by deploying the relational database for storing critical data such as publication abstracts or lists of categories and topics of individual structures. The incorporation of simple form allowing to search PDB database directly from mmView, as well as the possibility to upload custom users' mmCIF files, are also planned for the next version of the application.
The mmView application provides a simple but powerful tool for researchers in the fields of tructural biology, and structural bioinformatics. mmView is well suited as an educational tool (it is succesfully used by authors in their course of Structural bioinformatics), but can also serve as a research tool for exploring the details about biomolecular structures. Its online version does not require any installation and provides an intuitive and easy-to-use interface. In addition, the version that can be installed localy on the end-user's computer is also available.
Availability and requirements
Project name: mmView
Operating system: Platform independent, any modern web browser needed
Programming language: Python
Other requirements: Python 2.5.4, Django 1.0, NumPy 1.2.1, PymmLib 1.0, JRE 1.6
License: GNU General Public License (Version 2)
Any restrictions to use by non-academics: no restrictions
This project has been supported by Research grants MSM 6046137306 and MSM 6046137302. We would like to thank Bohdan Schneider for help with peculiarities of the PDB and mmCIF formats, for help with designing the Aggregated view topics, as well as for numerous discussions and insightful suggestions on this project. We also thank anonymous reviewers whose valuable comments and suggestions improved the mmView application and the manuscript considerably.
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMedPubMed CentralView ArticleGoogle Scholar
- The PDB Contents guide. [http://www.wwpdb.org]
- Henrick K, Feng Y, Bluhm WF, Dimitropoulos D, Doreleijers JF, Dutta S, Flippen-Anderson JL, Ionides J, Kamada C, Krissinel E: Remediation of the Protein Data Bank archive. Nucleic Acids Research. 2008, 36: D426-D433. 10.1093/nar/gkm937.PubMedPubMed CentralView ArticleGoogle Scholar
- Gu J, Bourne PE: Structural Bioinformatics. 2009, Wiley-Blackwell, 2Google Scholar
- Fitzgerald PMD, Berman HM, Bourne PE, McMahon B, Watenpaugh K, Westbrook J: The mmCIF dictionary: community review and final approval. IUCr Congress and General Assembly: August 8-17; Seattle, WA. MSWK.CF.06. Acta Cryst. 1996Google Scholar
- Bourne PE, Berman HM, McMahon B, Watenpaugh KD, Westbrook J, Fitzgerald PMD: The Macromolecular Crystallographic Information File (mmCIF). Methods in Enzymology. 1997, 277: 571-590. full_text.PubMedView ArticleGoogle Scholar
- Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM: PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics. 2005, 21: 988-992. 10.1093/bioinformatics/bti082.PubMedView ArticleGoogle Scholar
- Atomic Coordinate Entry Format Version 3.2. [http://www.wwpdb.org/documentation/format32/v3.2.html]
- Hall SR, Allen FH, Brown ID: A new standard archive file for crystallography. Acta Cryst. 1991, A47: 655-685.View ArticleGoogle Scholar
- Hall SR: The STAR File: A new format for electronic data transfer and archiving. J Chem Inf Comp Sci. 1991, 31: 326-333.View ArticleGoogle Scholar
- RCSB data dictionaries. [http://mmcif.rcsb.org/dictionaries/index.html]
- Hall SR, McMahon B: International Tables for Crystallography. 2006, International Union of Crystallography, 1View ArticleGoogle Scholar
- NMR-STAR data dictionary. [http://www.bmrb.wisc.edu/formats.html]
- Schierz AC, Soldatova LN, King RD: Overhauling the PDB. Nature Biotechnology. 2007, 25: 437-442. 10.1038/nbt0407-437.PubMedView ArticleGoogle Scholar
- RCSB software tools. [http://sw-tools.rcsb.org/]
- Painter J, Merritt EA: mmLib Python toolkit for manipulating annotated structural models of biological macromolecules. Journal of Applied Crystallography. 2004, 37: 174-178. 10.1107/S0021889803025639.View ArticleGoogle Scholar
- van Rossum G, de Boer J: Interactively Testing Remote Servers Using the Python Programming Language. CWI Quarterly. 1991, 4: 283-303.Google Scholar
- Lutz M, Ascher D: Learning Python. 2003, O'Reilly Media, SecondGoogle Scholar
- Bassi S: A Primer on Python for Life Science Researchers. PLoS Computational Biology. 2007, 3: e199-10.1371/journal.pcbi.0030199.PubMedPubMed CentralView ArticleGoogle Scholar
- Holovaty A, Kaplan-Moss J: The Definitive Guide to Django: Web Development Done Right. 2007, ApressGoogle Scholar
- Django. [http://www.djangoproject.com/]
- SQLite. [http://www.sqlite.org/]
- MySQL. [http://www.mysql.com/]
- Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2010, 38: 5-16. 10.1093/nar/gkp967.View ArticleGoogle Scholar
- Jmol: an open-source Java viewer for chemical structures in 3D. [http://www.jmol.org/]
- McLaughlin B, Edelson J: Java and XML. 2006, O'Reilly Media, 3Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.