GlycoForm and Glycologue: two software applications for the rapid construction and display of N-glycans from mammalian sources
© McDonald et al; licensee BioMed Central Ltd. 2010
Received: 24 February 2010
Accepted: 18 June 2010
Published: 18 June 2010
The display of N-glycan carbohydrate structures is an essential part of glycoinformatics. Several tools exist for building such structures graphically, by selecting from a palette of symbols or sugar names, or else by specifying a structure in one of the chemical naming schemes currently available.
In the present work we present two tools for displaying N-glycans found in the mammalian CHO (Chinese hamster ovary) cell line, both of which take as input a 9-digit identifier that uniquely defines each structure. The first of these, GlycoForm, is designed to display a single structure automatically from an identifier entered by the user. The display is updated in real time, using symbols for the sugar residues, or in text-only form. Structures can be added to a library, which is recorded in a preference file and loaded automatically at start. Individual structures can be saved in a variety of bitmap image formats. The second program, Glycologue, reads a file containing columnar data of nine-digit codes, which can be displayed on-screen and printed at high resolution.
A key advantage of both programs is the speed and facility with which carbohydrate structures can be drawn. It is anticipated that these programs will be useful to glycobiologists, systems biologists and biotechnologists interested in N-glycosylation systems in mammalian cells.
The explosion of interest in glycobiology in recent years, and the complex nature of its subject matter, have led directly to the requirement for bioinformatics tools specific to the needs of those researchers investigating the modelling of glycosylation, the carbohydrate content of glycoproteins, or the recombinant engineering of specific glycoforms in bioprocessing systems.
While bioinformatics tools specific to glycobiology are relatively few in number at present, applications for the construction and display of branched carbohydrate structures are beginning to appear. LiGraph, a web-based application based on the LINUCS linear notation , is able to draw JPG and SVG images from a linear string representation of a sugar. LiGraph supports a wide variety of symbol formats, including the Heidelberg, Tokyo, Consortium for Functional Glycomics (CFG) and Oxford (UOXF) notations. A more recent example is the GlycanBuilder , a Java tool that allows the user to draw N- or O-glycans using a wide range of core structures. GlycanBuilder also exports its glycan structures in a wide variety of bitmap and vector image formats. KegDraw , which is a powerful tool not only for drawing glycans, but chemical structures generally, allows the user to draw the structure of a glycan using a specialized tool palette and then query the KEGG GLYCAN database for similar structures.
The N-glycan encoding system .
Number of mannose residues
0 or 1
0 or 1
Extension level of branch 1
Extension level of branch 2
Extension level of branch 3
Extension level of branch 4
Number of galactose residues
Number of N-acetylneuraminic acid residues
Glycosynthetic enzymes typically present in mammalian systems .
GlycoForm and Glycologue are written in the object-oriented language, REALbasic  and distributed as binary executable files for Windows, Mac OS X and Linux x86, and are freely available for download . The Mac OS X version is packaged as a Universal Binary, which will run on both Intel- and PowerPC-based Macintoshes. The source code is also available from the applications' website.
The greater part of the main window is devoted to the display of the oligosaccharide. If the window is resized, the drawable region rescales to accommodate the change. The current value of the magnification factor is shown to the top right of the main window and can be adjusted by the user (see Fig. 2). The default rendering method is CFG colour notation, however, menu options exist, which allow the user to toggle between text and symbol methods, and to choose between that, CFG black-and-white and UOXF notations. When a symbol set is chosen, the key appropriate to that set is displayed to the right of the oligosaccharide structure.
GlycoForm takes advantage of the system-wide clipboard present in each of the supported operating systems. A timer control, which activates once every second, reads the contents of the clipboard and, if its contents are text, a regular expression is used to test for the presence of a valid identifier. When a valid match is found, the display region is updated automatically with new structure using the current rendering parameters. While strings containing nine digits are accepted, the last two digits are ignored, since only the first seven of digits are necessary to specify a structure uniquely. A single space between each digit present on the clipboard is permitted, but not required, and any number of flanking spaces is permissible. Other than digits and spaces no other characters are accepted as valid.
Identifiers can be added to a library by pressing the Add button at the end of the edit-field array. The identifiers are shown as a list in the Library window, a floating window that can be opened or closed by clicking a small triangle at the upper right of the main window (see Fig. 2). N-Glycan identifier codes are selected for display by left clicking. Right-clicking (or control-clicking, using the Macintosh's single-button mouse) on any identifier in the library invokes a contextual menu with options to copy it to the clipboard, or to remove it permanently from the list. GlycoForm saves the current library of codes to a preferences file, along with the current display preference, symbolic or text-only. The file, which is an XML document, is saved to hard disk automatically when the program terminates, to a location appropriate to the current operating system.
GlycoBase  is a relation database of 2-AB labelled N-glycans. GlycoForm parses the current glycan identifier to form the corresponding GlycoBase abbreviation, whose display can be toggled by the user via a menu item. Our implementation of the GlycoBase formalism does not yet handle N-acetyllactosamine (NLac) repeating units (branch extension levels 4-6; see Fig. 1(b)) and is therefore currently limited to structures with one galactose residue per branch, of the type shown in Fig. 2. Individual structures drawn by GlycoForm can be saved as image files either Portable Network Graphics (PNG), JPEG or Windows Bitmap (BMP) formats.
Glycologue is a tool for displaying or printing multiple N-glycan structures simultaneously. By default, it reads the user's current GlycoForm library file, loading and displaying any structures found therein in a grid. Text files containing 9-digit identifiers, one per line, is also an option. On systems supporting drag and drop, it is possible to open such a text file by dropping its icon onto that of the application. The number of grid elements is variable, but is fixed at 22 rows and 13 columns for the drag and drop method. Structures are drawn one column at a time, top to bottom, starting at the left-hand side of the main window. A secondary window displays the current list of identifiers. Clicking on an identifier in the list will highlight the corresponding image in the main Glycologue window.
Blank lines in the input file produce blank cells in the output, a feature that can be used to advantage if a graphical comparison between two or more sets of N-glycans is desired. The method consists in constructing the superset of all Glycologue input files that are to be compared, then sorting all files in numerical order. The second step is to create code-files from each of the same length as the superset, but with blank lines where a code in the superset is missing from the subset. This is difficult to accomplish manually, but is made easier by means of a script. The output files generated by such a script can then be loaded into Glycologue, where each N-glycan will appear at the same cell location as its counterpart in all the other files, including the superset. An example script, written in Perl, is provided as supplementary material [see Additional file 1].
Printing to PDF is supported natively by Mac OS X, and to PostScript by most Linux implementations. Windows users can use the free PDFCreator  printer driver to output as PDF. Being vector images, arrays of structures printed to PDF files are resolution-independent, and can be read and manipulated by many existing graphics tools.
Glycologue displays in one of the symbol formats supported by its companion program, GlycoForm, selectable via a menu option. Text-only display is not provided as an option, because in most instances the text will be too small to be legible. The identifier code is shown under each N-glycan by default, but can be hidden at the behest of the user.
Results and Discussion
Of the symbols used by the two utilities, the CFG and UOXF formats each have their own merits: CFG is the most widely adopted, and has the support of one of the most popular glycobiology textbooks ; the UOXF notation, with its ability to describe different linkage type, is in theory able to encode more information in a single structure. Moreover, the symbols used by UOXF possess a higher degree of information content, for instance, in the use of a hexagon shape for hexose, and the use of solid fill to denote the N-acetylated variants of residues. A proposal for a new standard, which merges common elements of the CFG and UOXF symbolisms, has recently been presented .
Development of both tools is continuing. While the current implementations require the user to enter data in numeric form, it is hoped that input in other linear formats, such as the GlycoBase annotation system for 2AB-labelled N-glycans  or Linear Code , will be offered as options in a future release. While the display of structures in Glycologue is currently limited to a single window, this will be updated to allow for larger collections to be displayed within a scrolling window. It is further intended that the full complement of CFG and UOXF symbols, as well as the newly proposed standard referred to above, will be included; the UOXF variant will be supported by an updated rendering system to cope with new linkage types and residue positions, as well as expansion of the encoding system for both. In some instances, widening the range of allowable values of a digit might be sufficient. In the case of fucosylation, for instance, it would be possible to allow the second digit to possess values above 1, to encode fucosylation at different positions on the N-glycan, or with different linkage types. The present system was designed specifically for mammalian N-glycans but a similar approach could be used, with alterations to the encoding system, to accommodate the structures found in other species. Thus extra digits could be used for the additional residues, such as Xyl and Ara, which can occur in plant N-glycans , and for the pentaantennarity that can occur in certain ovomucoids from birds and fish [21, 22].
We have developed two new applications for the display of asparagine-linked oligosaccharides, both of which are freely available for use by the scientific community. The encoding system, which was initially developed specifically for the Chinese hamster ovary cell line, captures a subset of the N-glycan structures found in mammalian cells. A key advantage of the software is the rapidity with which it is possible to specify and display structures, the succinctness of the numeric encoding scheme permitting faster display and rendering of N-glycans than other utilities currently available.
Availability and Requirements
Project home page: http://www.boxer.tcd.ie/gf
Operating system(s): Linux x86 (with GTK 2+); Mac OS × 10.4 or higher; Windows XP or higher
Programming langauge: REALbasic
The authors would like to thank Tania O'Connor for reviewing an earlier draft of the manuscript.
- Bohne-Lang A, Lang E, Förster T, von der Lieth CW: LINUCS: LInear Notation for Unique description of Carbohydrate Sequences. Carbohydrate Research. 2001, 336 (1): 1-11. 10.1016/S0008-6215(01)00230-0.PubMedView ArticleGoogle Scholar
- Ceroni A, Dell A, Haslam SM: The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source Code Biol Med. 2007, 2: 3-10.1186/1751-0473-2-3.PubMed CentralPubMedView ArticleGoogle Scholar
- Hashimoto K, Goto S, Kawano S, Aoki-Kinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M: KEGG as a glycome informatics resource. Glycobiology. 2006, 16 (5): 63R-70R. 10.1093/glycob/cwj010.PubMedView ArticleGoogle Scholar
- Krambeck FJ, Betenbaugh MJ: A mathematical model of N-linked glycosylation. Biotech Bioeng. 2005, 92 (6): 711-728. 10.1002/bit.20645.View ArticleGoogle Scholar
- McDonald AG, Boyce S, Tipton KF: ExplorEnz: the primary source of the IUBMB enzyme list. Nucl Acids Res. 37: D593-D597. 10.1093/nar/gkn582. [http://www.enzyme-database.org]
- REALbasic, published by REAL Software, Inc. [http://www.realsoftware.com]
- GlycoForm and Glycologue. [http://www.boxer.tcd.ie/gf]
- Campbell MP, Royle L, Radcliffe CM, Dwek RA, Rudd PM: GlycoBase and autoGU: tools for HPLC-based glycan analysis. Bioinformatics. 2008, 24 (9): 1214-1216. 10.1093/bioinformatics/btn090.PubMedView ArticleGoogle Scholar
- PDFcreator. [http://sourceforge.net/projects/pdfcreator]
- Restelli V, Wang M-D, Huzel N, Ethier M, Perreault H, Butler M: The effect of dissolved oxygen on the production and the glycosylation profile of recombinant human erythropoeitin produced from CHO cells. Biotech Bioeng. 2006, 94: 481-494. 10.1002/bit.20875.View ArticleGoogle Scholar
- Llop E, Gallego RG, Belalcazar V, Gerwig GJ, Kamerling JP, Segura J, Pascual JA: Evaluation of protein N -glycosylation in 2-DE: erythropoietin as a study case. Proteomics. 2007, 7: 4278-4291. 10.1002/pmic.200700572.PubMedView ArticleGoogle Scholar
- Takamatsu S, Katsumata T, Inoue N, Watanabe T, Fujibayashi Y, Takeuchi M: Abnormal biantennary sugar chains are expressed in human chorionic gonadotropin produced in the choriocarcinoma cell line, JEG-3. Glycoconj J. 2004, 20: 473-481. 10.1023/B:GLYC.0000038293.37376.9f.PubMedView ArticleGoogle Scholar
- Hokke CH, Bergwerff AA, van Dedem GWK, Kamerling JP, Vliegenthart JFG: Structural analysis of the sialylated N- and O-linked carbohydrate chains of recombinant human erythropoietin expressed in Chinese hamster ovary cells. Sialylation patterns and branch location of dimeric N -acetyllactosamine units. Eur J Biochem. 1995, 228: 981-1008. 10.1111/j.1432-1033.1995.tb20350.x.PubMedView ArticleGoogle Scholar
- Sánchez O, Montesino R, Toledo JR, Rodríguez E, Díaz D, Royle L, Rudd PM, Dwek RA, Gerwig GJ, Kamerling JP, Harvey DJ, Cremata JA: The goat mammary glandular epithelial (GMGE) cell line promotes polyfucosylation and N, N'-diacetyllactosediaminylation of N-glycans linked to recombinant human erythropoietin. Arch Biochem Biophys. 2007, 464: 322-334. 10.1016/j.abb.2007.04.027.PubMedView ArticleGoogle Scholar
- North SJ, Huang H-H, Sundaram S, Jang-Lee J, Etienne AT, Trollope A, Chalabi S, Dell A, Stanley P, Haslam SM: Glycomics profiling of Chinese hamster ovary cell glycosylation mutants reveals N -glycans of a novel size and complexity. J Biol Chem. 2010, 285: 5759-5775. 10.1074/jbc.M109.068353.PubMed CentralPubMedView ArticleGoogle Scholar
- Varki A, Cummings R, Esko J, Freeze H, Hart G, Marth J, (Eds): Essentials of Glycobiology. 1999, Cold Spring Harbor: Cold Spring Harbor Laboratory Press
- Harvey DJ, Merry AH, Royle L, Campbell MP, Dwek RA, Rudd PM: Proposal for a standard system for drawing structural diagrams of N - and O-linked carbohydrates and related compounds. Proteomics. 2009, 9: 3796-3801. 10.1002/pmic.200900096.PubMedView ArticleGoogle Scholar
- GlycoBase v2.0 Nomenclature System. [http://glycobase.nibrt.ie:8080/database/documents/abbreviations.pdf]
- Banin E, Neuberger Y, Altshuler Y, Halevi A, Inbar O, Dotan N, Dukler A: A novel Linear Code® nomenclature for complex carbohydrates. Trends Glycosci Glycotechnol. 2002, 14 (77): 127-137.View ArticleGoogle Scholar
- Priem B, Gitti R, Bush CA, Gross KC: Structure of ten free N -glycans in ripening tomato fruit. Arabinose is a constituent of a plant N-glycan. Plant Physiol. 1993, 102: 445-458. 10.1104/pp.102.2.445.PubMed CentralPubMedView ArticleGoogle Scholar
- Brockhausen I, Hull E, Hindsgaul O, Schachter H, Shah RN, Michnick SW, Carver JP: Control of glycoprotein synthesis. Detection and characterization of a novel branching enzyme from hen oviduct, UDP-N -acetylglucosamine:GlcNAc β 1-6 (GlcNAc beta 1-2)Man α-R (GlcNAc to Man) β-4-N-acetylglucosaminyltransferase VI. J Biol Chem. 1989, 264: 11211-11221.PubMedGoogle Scholar
- Taguchi T, Seko A, Kitajima K, Inoue S, Iwamatsu T, Khoo KH, Morris HR, Dell A, Inoue Y: Structural studies of a novel type of tetraantennary sialoglycan unit in a carbohydrate-rich glycopeptide isolated from the fertilized eggs of Indian Medaka fish, Oryzias melastigma. J Biol Chem. 1993, 268: 2353-2362.PubMedGoogle Scholar