- Research article
- Open Access
An evaluation of the evolution of the gene structure of dystroglycan
BMC Research Notesvolume 10, Article number: 19 (2017)
Dystroglycan (DG) is an adhesion receptor complex composed of two non-covalently associated subunits, transcribed from a single gene. The extracellular α-DG is highly and heterogeneously glycosylated and binds with high affinity to laminins, and the transmembrane β-DG binds intracellular dystrophin. Multiple cellular functions have been proposed for DG, notwithstanding that its role in skeletal muscle appears central as demonstrated by both primary and secondary severe muscular dystrophic phenotypes collectively known as dystroglycanopathies. We recently analysed the molecular phylogeny of the DG core protein and identified the α/β interface, transmembrane and cytoplasmic domains of β-DG as the most conserved region. It was also identified that the IG2_MAT_NU region has been independently duplicated in multiple lineages.
To understand the evolution of dystroglycan in more depth, we investigated dystroglycan gene structure in 35 species representative of the phyla in which dystroglycan has been identified (i.e., all metazoan phyla except Ctenophora). The gene structure of three exons and two introns is remarkably conserved. However, additional lineage-specific introns were identified, which interrupt the coding sequence at distinct points, were identified in multiple metazoan groups, most prominently in ecdysozoans.
A coding DNA sequence (CDS) intron that interrupts the encoding of the IG1 domain is universally conserved and this intron is longer in gnathostomes (jawed vertebrates) than in other metazoans. Lineage-specific gain of additional introns has occurred notably in ecdysozoans, where multiple introns interrupt the large 3′ exon. More limited intron gain has also occurred in placozoa, cnidarians, urochordates and the DG paralogues of lamprey and teleost fish.
Dystroglycan (DG) is an adhesion receptor complex that provides mechanical stability to a wide variety of cells and tissues in mammals, zebrafish, Drosophila melanogaster and Caenorhabditis elegans. It forms a bridging element that connects the internal cytoskeleton to basement membrane extracellular matrix . In this regard, the two subunits of dystroglycan, α-DG and β-DG, play different roles. α-DG is highly glycosylated, located extracellularly and binds with high affinity to laminins and other laminin globular (LG) domain-containing proteins and proteoglycans . β-DG spans the plasma membrane and is anchored to the actin-binding protein, dystrophin, thereby forming a direct link to the actin cytoskeleton .
DG has been related to the function of skeletal muscle since its initial identification in rabbit sarcolemma . The calcium-dependent, high-affinity binding established between α-DG and laminin is believed to depend mainly on binding between carbohydrate moieties attached to the central elongated mucin-like domain of α-DG and the C-terminal LG domains of laminin α chains . The conditional disruption of the dystroglycan gene in mice causes muscular dystrophy, and severe congenital muscular dystrophy phenotypes emerge when α-DG is hypoglycosylated [5, 6]. Collectively, there is a subgroup of muscular dystrophies currently referred to as dystroglycanopathies, which are classified as (i) primary, when the DG core protein is mutated [7–9], and (ii) secondary, when genetic alterations of glycosyltransferases, or of other proteins important for DG maturation, are involved .
The two subunits of DG are transcribed from a single gene. The domain organization of the primary protein product is as follows: a signal peptide; immunoglobulin-like domain 1 (IG1); S6 domain (so-called because of its similarity to ribosomal protein S6, ; a mucin-like central region; immunoglobulin-like domain 2 (IG2); the so-called “α/β maturation interface” (MAT) which includes a 50 residue region of α-DG after the IG2 domain and the Gly-Ser site of proteolysis; a natively unfolded domain within the ectodomain of β-DG (NU); a single transmembrane domain and a cytoplasmic region that includes the dystrophin-binding site (DBS) at its C-terminus (see Fig. 1) . The IG1 domain of α-dystroglycan (PDB:1U2C) adopts an immunoglobulin-like fold for which twitchin (PDB:1WIT) is the closest structural neighbour . The α-DG IG1 domain is also a very close structural neighbour of the natural cysteine peptidase inhibitor of Leishmania mexicana (PDB:2C34) (Z-score of 5.1 and an RMSD of 3.2 Å over 82 residues) .
For the maturation of α-DG, the N-terminal region (IG1 and S6 domains) is considered highly important. Indeed, the N-terminal region in isolation displays a residual laminin-binding activity  and is likely to be important for directing the actions of a plethora of enzymes required for the glycosylation of α-DG [7, 14]. Based on pioneering recombinant protein analysis, the N-terminal domain of α-DG has been suggested to represent an autonomous module . This module can be liberated by furin-driven proteolysis [16, 17] within the extracellular space and/or body fluids [18–21]. It is also speculated that the IG1 domain might function in self-recognition (in cis) of carbohydrate moieties that protrude from the neighbouring mucin-like region, and therefore could have additional functions within the glycosylation and maturation pathway of the dystroglycan precursor molecule .
Recently, we conducted the first extensive evolutionary study of the dystroglycan core protein and demonstrated a high degree of conservation in all metazoan phyla except ctenophores, where DG is absent from the two available species, Mnemiopsis leidyi and Pleurobrachia bachei . Our study demonstrated that the most conserved region of DG encompasses the second IG-like domain (IG2), the α/β interface that is important for establishing non-covalent contacts between the two subunits, the ectodomain of β-DG (the MAT_NU module that includes the Gly-Ser α/β maturation site) and the transmembrane and cytoplasmic domains . A major unexpected finding was that multiple, presumably independent, lineage-specific duplication/domain shuffling events have led to repetitions of the IG2_MAT_NU module in species of hemichordates (2X), arthropods (2X), placozoa (2X) and in particular in the cnidarian sea anemone Nematostella vectensis (6X).
Apart from information on the DG gene in a few mammalian species [22, 23] or on the alternative spliced variants of Drosophila melanogaster , no detailed investigation of the gene organization of dystroglycans has been conducted. Here, we have investigated the evolution of the dystroglycan gene with reference to the metazoan phyla previously identified to encode DG . Especially, we were interested to study: (i) the overall degree of conservation of exon–intron organization of the dystroglycan (DG) gene; (ii) the relationship between DG domain organization and exon structure, particularly with regard to the IG_MAT_NU domain duplications identified previously in certain phyla, and (iii) if distinctions at the level of exon/intron organization have emerged by divergence in specific lineages.
Dystroglycan gene structure is remarkably conserved
Table 1 reports the details of DG gene organization with reference to 35 metazoan species that represent the major metazoan phyla which we previously identified to encode DG . These prior studies did not identify DG in Ctenophora . The identified DG gene organisations are schematized in Fig. 1, which also indicates the disposition of the encoded protein domains between the exons. It is apparent that DG gene structure is simple in all chordate species analysed to date (Fig. 1a), also in bivalve and gastropod molluscs and annelids (Fig. 1e). In all these species, the DG gene includes a single intron within its coding DNA sequence (CDS). This intron interrupts the DNA sequence encoding the IG1 domain and we therefore refer to it as the IG1-intron. Our survey demonstrates that an intron at this position is universally present (Fig. 1), albeit with a variable size (Table 1 and see section below). In Chordata, Cephalopoda, Arthropoda and Nematoda, the ATG-containing exon that anticipates the IG1-intron is preceded by an additional large (40–60 kb in mammals; Table 1) intron (designated pre-ATG intron in Fig. 1). The DG genes of these species also include a relatively short (ranging from 89 to 595 bp) non-coding exon, designated here the pre-ATG exon. This non-coding exon was not identified in the DG genes of urochordate, cephalochordate, bivalve and gastropod molluscs, or in DG genes of species representative of Annelida, Placozoa or Porifera (Fig. 1), however a pre-ATG exon appears present in the DG gene of the cnidarian sea anemone Exaiptasia pallida (AIPGENE266, ). In view that this exon is 5′ to the ATG codon, the possibility that the pre-ATG exon is not recognizable in other non-bilaterian species due to incomplete or inaccurate genome annotations cannot be ruled out at this time. Indeed, in our previous analysis of the DG of the sponge, Oscarella carmela, the predicted protein sequence identified was incomplete at the N-terminus . We subsequently identified a complete predicted DG sequence in the demosponge Amphimedon queenslandica (Table 1). Strikingly, the DG gene of A. queenslandica, in common with those of deuterostomes, most molluscs and annelids, contains the entire protein coding sequence in only two exons (Fig. 1j). Although partial DG sequences can be identified in other sponges, e.g. O. carmela , it was not possible to identify an intact 5′UTR in these species. Thus, at present it is not possible to establish if the A. queenslandica DG gene structure is unique or representative of other sponges. In addition, we cannot rule out the possibility that alternative splicing takes place at the 5′ end of the DG gene in these species.
Evidence for dystroglycan gene duplications and intron gain in some metazoans
In some species of teleost fish, such as Takifugu rubripes, the presence of a duplication event involving the Dag1 gene has been established . We identified two additional bony fish species with two Dag1 paralogues (Table 2). In line with the designations of Pavoni et al., we designated as Dag1a the paralogue that contains an additional short (126–726 bp) intron (mini-intron) that interrupts the encoding of the S6 domain of α-DG (Fig. 1b) . Interestingly, Petromyzon marinus (Cyclostomata) also has two DG genes, each with similar gene structure to the Dag1 genes of bony fish (Table 2) (Fig. 1a, b). A short intron that interrupts the S6 domain encoding region is also present in a very similar location in the DG genes of nematodes (C. elegans, Fig. 1g) and cnidarians [Hydra magnipapillata, Fig. 1h, and Nematostella vectensis (Table 1)]. Based on knowledge of the secondary and tertiary structure of the S6 domain of mouse DG, we determined that the mini-intron insertion site is predicted to fall within a loop that connects the antiparallel β3 and β4 strands, and thus lies in the middle of the “floor” of the S6 domain. Thus, the insertion site is not in register with the tertiary structure of the S6 domain (data not shown, further details can be found in ).
Additional introns that interrupt the coding exons at distinct points are present in representatives of some phyla; these appear to correspond to independent, lineage-specific, intron gain events. Specifically, in the DG gene of Ciona intestinalis (urochordate) four additional introns (i.e., in addition to the IG1-intron) split the CDS (Fig. 1c). In the DG gene of Branchiostoma floridae, introns interrupt the N-terminal coding sequence (Fig. 1d). The DG gene of the arthropod D. melanogaster (Fig. 1f), is particularly conspicuous for having acquired a large number of introns. In general, the location of these introns is not in register with the domain organization or domain boundaries of the DG protein . A notable exception is the “mucin-module” that appears to be encoded by the alternatively spliced exons 8 and 9 in the DG gene of D. melanogaster (Fig. 1f; ). Additional introns are also present in the DG genes of other insects (e.g., Tribolium castaneum) and in species representative of other arthropod classes (Crustacea and Chelicerata) (Table 1). The DG gene of C. elegans (nematode) includes multiple introns that interrupt the region encoding the IG2 domain and the cytoplasmic domain, respectively (Fig. 1g). Additional introns in the DG gene of Trichoplax adhaerens (placozoan) interrupt the N-terminal encoding sequences in a position similar to the additional introns in the DG gene of B. floridae (Fig. 1d) and the cytoplasmic domain-encoding region (Fig. 1i). Whereas the 5′ non-coding exon is apparent in many lineages of bilaterians (Fig. 1a, b, f, g), this exon could not be examined in the available species of cnidarians, placozoan, or the sponge A. queenslandica, due to uncertain annotation of DG gene structure 5′ to the ATG codon (Fig. 1h, i). Although this phylogenetic overview made evident the extent of conservation of the large 3′ exon of DG genes, the occurrence of additional introns in multiple lineages also makes apparent that few coding regions have been “privileged” from intron gain. However, in the MAT_NU encoding region around the α/β dystroglycan cleavage site intron addition has occurred only in the urochordate lineage (Fig. 1c). It can be noted that the DG protein sequence of urochordates is exceptionally divergent from that of other metazoans .
In our prior study of the molecular phylogeny of DG protein, we identified that the region including the IG2_MAT_NU domains has been independently duplicated in a number of phyla (Hemichordata, Arthropoda, Placozoa, Cnidaria). This phenomenon is particularly striking in the DG of the sea anemone Nematostella vectensis (Cnidaria) in which six repetitions of IG2_MAT_NU are present . The current study of dystroglycan gene organization makes it clear that these repetitions are not based on duplication of any exon module. In other words, the additional IG2_MAT_NU protein module(s) are never encoded by a unique exon.
The IG1-intron has undergone lineage-specific expansion during metazoan evolution
In addition to identifying the universality of the IG1-intron, the phylogenetic comparison of DG gene organization highlighted a striking variation in the length of the IG1-intron. A major overall increase in the size of this intron is apparent throughout metazoan evolution: the intron has a minimal size of 47 bp in A. queenslandica (Porifera) and 48 bp in Capitella teleta (Annelida) and yet comprises up to ~30 kb in Ornithorhynchus anatinus and other mammals (Table 1). Indeed, the IG1-intron size was found to increase proportionally with genome size (Fig. 2a) and IG1-intron size also increases with the apparent overall size of DAG1 gene (Fig. 2b). The underlying numerical data are presented in Additional file 1: Figure S1. To date, no additional CDS have been identified within the IG1-intron.
Analysis of IG1-intron boundaries
Further investigations of the IG1-intron were initiated by multiple sequence alignment of 50 nucleotides spanning either the exon/intron or intron/exon boundaries of the IG1-intron, using 23 DG gene sequences from representative dystroglycan-encoding species. The alignments demonstrate that the “AGGT exon–intron rule”  is largely respected (Fig. 3a, c). However, there are some relevant exceptions, namely P. marinus, Strongylocentrotus purpuratus, C. elegans, H. magnipapillata and A. queenslandica for the exon–intron boundary (Fig. 3a) and Anolis carolinensis, Xenopus tropicalis, Danio rerio, Xiphophorus maculatus, Callorhinchus milii, S. purpuratus and C. elegans for the intron–exon boundary (Fig. 3c).
The IG1-intron interrupts the coding sequence of human DG between amino acid positions 95 and 96 . MUSCLE multiple sequence alignment of DG protein sequences from the same 23 species and inspection of the locus of IG1-intron splice sites demonstrated that the location of the IG1-intron boundaries is conserved (Fig. 3b, d). However, the intron location does not correspond with the protein domain structure or the IG1 domain boundaries; the IG1-intron interrupts the sequence encoding the middle of the third β-strand of the IG1 domain [11, 14] (Fig. 3e).
As noted above, the smallest IG1-introns are found in A. queenslandica (sponge), in C. teleta (annelid) and in T. castaneum (insect) (Table 1). The IG1-intron sequences of A. queenslandica and C. teleta are well conserved (Fig. 3f). To the best of our knowledge, no distinct features of the IG1-intron sequence have been identified to date.
Conservation and diversification of DG gene structure
Collectively, these data demonstrate that the structure of the dystroglycan gene is highly conserved across many metazoan groups and features a universally conserved intron, designated here the IG1-intron, within the 5′ portion of the gene. In some phyla, multiple independent intron gain events have occurred, and gene duplication events have occurred in some teleost fish and in Cyclostomata. Although multiple intron gain seems to be typical of the dystroglycan gene, to the best of our knowledge, this is not the case for several gene families. For example, the RpL14 gene of D. melanogaster has fewer introns than the human gene . In the alpha-amylase family of genes, both intron gains and losses have been observed in Bilateria .
In Dag1a of some teleosts and the DG genes of insects, nematodes and cnidarians, one or more additional introns interrupt the region encoding the S6 domain. Although the IG2_MAT_NU domain region has been duplicated in species from various phyla , we did not identify any correlation between the protein domain organization of dystroglycan and its exon/intron structure. With the exception of urochordates, intron gain has not occurred in the MAT_NU region around the α/β processing site.
In vertebrates, a striking characteristic of Dag1 is its uncomplicated exon/intron arrangement and the presence of only two, relatively large (>15 kb) introns. These two introns are located within the 5′ portion of the gene, making the 3′ region essentially intron-less (Fig. 1). This gene structure is conserved across chordates, whereas the DG gene of species of urochordates (Ciona intestinalis), arthropods (in particular Drosophila melanogaster) and nematodes (e.g., C. elegans) includes multiple introns. Larger genomes generally contain genes with longer introns  and indeed the IG1-intron size increases with genome size (Fig. 2). General studies of eukaryote genomes have indicated a prevalence of intron gain over intron loss; however, in general, apparently very few, if any, introns were gained during the last ∼100 million years of animal and plant evolution . A tendency for extensive intron loss at the 3′ ends of genes has been observed in the genomes of unicellular eukaryotes [33, 34]. The acquisition of additional intronic sequences is considered to possibly represent a mechanism by which novel splice variants can be important for tuning of gene function to particular developmental stages and/or tissue types . Interestingly, rapidly regulated genes are commonly intron-poor . However, it is also the case that dystroglycan has a complex post-translational maturation process in which pre- and post-transcriptional control steps, including intron splicing, are not likely to represent rate-limiting steps .
We found that the observed repetitions of the intronless IG2_MAT_NU module in some species  do not involve intron sequences, thus all of its tandem repetitions, are found within the large 3′ exon of the DG gene. In general terms, there is extreme variability in the relationship between exon/intron boundaries and the boundaries of protein domain/modules. In some cases, single protein domains are encoded by exons but there are also many examples where a single domain is interrupted by intron(s) . Although there is no significant amino acid sequence homology between DG and these other proteins, it is interesting that a similar exon/intron arrangement as found in the IG1 domain of DG is present in some IG-domain-containing cell-surface receptors, for example, CD4, CD3δ, or NCAM .
The biological significance of the IG1 domain for DG function has been underscored by the recent identification of two novel compound heterozygous DG missense mutations, V74I and D111N, that are associated in a patient with asymptomatic hyperCKemia and hypoglycosylation of α-dystroglycan . The mutation T192M, within the β1 strand of the neighbouring S6 domain, also causes hypoglycosylation of α-DG with consequent neuromuscular and brain phenotypes . In view that the IG1 and S6 domains belong to an autonomous globular structural unit at the N-terminus of α-DG , the N-terminal region of DG is believed to play some, as yet, unidentified autonomous function both extracellularly and/or intracellularly [18–21].
Further work will be needed to analyse the 5′ and 3′ untranslated regions (UTR) of dystroglycan genes for possible conserved transcription factor binding sites and/or other regulatory elements such as miRNA hybridization sites [23, 40]. A preliminary search shows that organ-specific miRNA target sequences identified in the 3′ UTR of D. melanogaster DG (miR9a (CCAAAGA) in myotendinous junction and miR310 s (UGCAAUA) in the brain) [41, 42] are found exactly or with minimal variation (1 nucleotide out of 7) in the dystroglycan mRNA of Homo sapiens (5′CCAGAGA and 5′UGCAAUA, respectively), Mus musculus (5′CUAAAGA and 5′UGCAAUA, respectively) and Hydra magnipapillata (5′CAAAAGA, miR9a-like). This conservation might indicate that some of the regulatory mechanisms observed in Drosophila melanogaster might also be relevant to other species.
A summary model for evolution of the dystroglycan gene
Figure 4 presents a model of evolutionary changes in the DG gene as identified from our study. This model focuses on the phyla in which DG has been identified to be present, as established from genome-predicted protein sequences and the existence of corresponding mRNA transcripts . DG is not encoded in two species of ctenophores (Pleurobrachia bachei and Mnemiopsis leidyi), and phylum Ctenophora is not included in the model. Our previous study of the molecular phylogeny of the DG protein demonstrated that the IG2_MAT_NU region and the domains of β-dystroglycan are the most highly conserved regions that might reflect the ancestral form of DG . The current information on A. queenslandica DG indicates that the IG1 domain and IG1-intron have been part of the DG gene from its earliest origin. In contrast, the S6 domain appears to have been gained in the last common ancestor of placozoans, cnidarians and bilaterians, perhaps by exon shuffling (Fig. 4). The simple structure of the A. queenslandica DG gene, that is highly comparable to the DG gene structures found in annelids, molluscs, cephalochordate and jawed vertebrates, implicates that this gene organisation is likely to reflect the ancestral gene structure.
Given the different number, sizes and positions of additional introns evident in the DG genes of some metazoan groups, it is reasonable to hypothesize that these introns were acquired independently, as lineage-specific evolutionary events [32, 43, 44]. In particular, because multiple additional introns are present in the DG genes of nematodes and arthropods (the major phyla of the Ecdysozoa), but not in annelids or molluscs (the major phyla of the Lophotrochozoa) it can be proposed that intron gain occurred in the last ecdysozoan common ancestor, followed by phylum-specific intron gains or losses in nematodes and arthropods. The current data implicate that the pre-ATG exon was already present in the last bilaterian common ancestor (Figs. 1, 4). However, in view that some of the DG genes analysed in basal metazoans may be incompletely annotated at the 5′ end, this interpretation must be provisional at this time.
Other intron gain events, such as the mini-introns within the S6 or mucin domains, or introns 3′ to the S6 domain, appear to be entirely taxa- or lineage-specific and thus are proposed to be of later evolutionary origin (Fig. 4). Although the organisation of the paralogous DAG1a and DAG1b, respectively, are similar in lamprey and bony fish, it remains controversial whether the genome-wide duplications that took place in the early vertebrate lineage occurred before, or after, the divergence of cyclostomes, especially in view of the presence of independent gene losses and gains in extant lampreys [45–47].
In conclusion, although a simple organisation of the DG gene with 2 coding exons/1 CDS intron, has been conserved robustly, significant divergence and intron gain has occurred in Ecdysozoa and Urochordata, and to a lesser extent in the placozoan T. adhaerens. Generally the newly gained exon/intron architectures are unrelated to protein domain boundaries. In particular, the duplication of IG2_MAT_NU regions that has been identified in species from Urochordate, Arthropoda, Cnidaria and Placozoa is not related to the intron–exon organisation of these DG genes. Further analyses will be needed to investigate whether these aspects of DG gene structure are relevant to genes encoding other cell adhesion molecules.
Identification of DG gene sequences throughout the metazoa
All the gene sequences investigated were retrieved either from the Ensembl database  or from the Metazome v3.0 database from the University of California (http://www.metazome.net). Searches were completed as of the end of January 2016. Searches with protein sequences were performed by BLASTP at NCBI Genbank at default parameters and were based on the protein sequences studied in . Dystroglycan gene sequences identified were further confirmed by multiple sequence alignments in MUSCLE 3.8 using the human dystroglycan sequence as a reference. The accepted borders of the relevant dystroglycan domains were taken as described in .
Multiple sequence alignment
Multiple sequence alignments of nucleotide or protein sequences were constructed in MUSCLE 3.8  via the resources of EMBL/EBI (http://www.ebi.ac.uk/Tools/msa) and are presented in BoxShade 3.21 (http://www.ch.embnet.org/software/BOX_form.html). Secondary structure elements are reproduced from PDB 1U2C .
Graph presentations and fitted lines were generated using KaleidaGraph (Synergy Software).
coding DNA sequence
laminin G domain
Ervasti JM, Campbell KP. A role for the dystrophin-glycoprotein complex as a transmembrane linker between laminin and actin. J Cell Biol. 1993;122(4):809–23.
Sciandra F, Bozzi M, Bigotti MG, Brancaccio A. The multiple affinities of α-dystroglycan. Curr Protein Pept Sci. 2013;14(7):626–34.
Winder SJ. The complexities of dystroglycan. Trends Biochem Sci. 2001;26(2):118–24.
Ibraghimov-Beskrovnaya O, Ervasti JM, Leveille CJ, Slaughter CA, Sernett SW, Campbell KP. Primary structure of dystrophin-associated glycoproteins linking dystrophin to the extracellular matrix. Nature. 1992;355(6362):696–702.
Cohn RD, Henry MD, Michele DE, Barresi R, Saito F, Moore SA, Flanagan JD, Skwarchuk MW, Robbins ME, Mendell JR, et al. Disruption of DAG1 in differentiated skeletal muscle reveals a role for dystroglycan in muscle regeneration. Cell. 2002;110(5):639–48.
Michele DE, Barresi R, Kanagawa M, Saito F, Cohn RD, Satz JS, Dollar J, Nishino I, Kelley RI, Somer H, et al. Post-translational disruption of dystroglycan-ligand interactions in congenital muscular dystrophies. Nature. 2002;418(6896):417–22.
Hara Y, Balci-Hayta B, Yoshida-Moriguchi T, Kanagawa M, Beltrán-Valero de Bernabé D, Gündeşli H, Willer T, Satz JS, Crawford RW, Burden SJ, et al. A dystroglycan mutation associated with limb-girdle muscular dystrophy. N Engl J Med. 2011;364(10):939–46.
Dong M, Noguchi S, Endo Y, Hayashi YK, Yoshida S, Nonaka I, Nishino I. DAG1 mutations associated with asymptomatic hyperCKemia and hypoglycosylation of α-dystroglycan. Neurology. 2015;84(3):273–9.
Riemersma M, Mandel H, van Beusekom E, Gazzoli I, Roscioli T, Eran A, Gershoni-Baruch R, Gershoni M, Pietrokovski S, Vissers LE, et al. Absence of α- and β-dystroglycan is associated with Walker-Warburg syndrome. Neurology. 2015;84(21):2177–82.
Endo T. Glycobiology of α-dystroglycan and muscular dystrophy. J Biochem. 2015;157(1):1–12.
Bozic D, Sciandra F, Lamba D, Brancaccio A. The structure of the N-terminal region of murine skeletal muscle α-dystroglycan discloses a modular architecture. J Biol Chem. 2004;279(43):44812–6.
Adams JC, Brancaccio A. The evolution of the dystroglycan complex, a major mediator of muscle integrity. Biol Open. 2015;4(9):1163–79.
Smith BO, Picken NC, Westrop GD, Bromek K, Mottram JC, Coombs GH. The structure of Leishmania mexicana ICP provides evidence for convergent evolution of cysteine peptidase inhibitors. J Biol Chem. 2006;281(9):5821–8.
Bozzi M, Cassetta A, Covaceuszach S, Bigotti MG, Bannister S, Hübner W, Sciandra F, Lamba D, Brancaccio A. The structure of the T190M mutant of murine α-dystroglycan at high resolution: insight into the molecular basis of a primary dystroglycanopathy. Plos ONE. 2015;10(5):e0124277.
Brancaccio A, Schulthess T, Gesemann M, Engel J. The terminal region of α-dystroglycan is an autonomous globular domain. Eur J Biochem. 1997;246(1):166–72.
Kanagawa M, Saito F, Kunz S, Yoshida-Moriguchi T, Barresi R, Kobayashi YM, Muschler J, Dumanski JP, Michele DE, Oldstone MB, et al. Molecular recognition by LARGE is essential for expression of functional dystroglycan. Cell. 2004;117(7):953–64.
Singh J, Itahana Y, Knight-Krajewski S, Kanagawa M, Campbell KP, Bissell MJ, Muschler J. Proteolytic enzymes and altered glycosylation modulate dystroglycan function in carcinoma cells. Cancer Res. 2004;64(17):6152–9.
Saito F, Saito-Arai Y, Nakamura A, Shimizu T, Matsumura K. Processing and secretion of the N-terminal domain of alpha-dystroglycan in cell culture media. FEBS Lett. 2008;582(3):439–44.
Saito F, Saito-Arai Y, Nakamura-Okuma A, Ikeda M, Hagiwara H, Masaki T, Shimizu T, Matsumura K. Secretion of N-terminal domain of α-dystroglycan in cerebrospinal fluid. Biochem Biophys Res Commun. 2011;411(2):365–9.
Hesse C, Johansson I, Mattsson N, Bremell D, Andreasson U, Halim A, Anckarsäter R, Blennow K, Anckarsäter H, Zetterberg H, et al. The N-terminal domain of α-dystroglycan, released as a 38 kDa protein, is increased in cerebrospinal fluid in patients with Lyme neuroborreliosis. Biochem Biophys Res Commun. 2011;412(3):494–9.
Heng S, Paule SG, Li Y, Rombauts LJ, Vollenhoven B, Salamonsen LA, Nie G. Posttranslational removal of α-dystroglycan N terminus by PC5/6 cleavage is important for uterine preparation for embryo implantation in women. FASEB J. 2015;29(9):4011–22.
Ibraghimov-Beskrovnaya O, Milatovich A, Ozcelik T, Yang B, Koepnick K, Francke U, Campbell KP. Human dystroglycan: skeletal muscle cDNA, genomic structure, origin of tissue specific isoforms and chromosomal localization. Hum Mol Genet. 1993;2(10):1651–7.
Leeb T, Neumann S, Deppe A, Breen M, Brenig B. Genomic organization of the dog dystroglycan gene DAG1 locus on chromosome 20q15.1-q15.2. Genome Res. 2000;10(3):295–301.
Schneider M, Baumgartner S. Differential expression of dystroglycan-splice forms with and without the mucin-like domain during Drosophilaembryogenesis. Fly. 2008;2(1):29–35.
Coumou J, Narasimhan S, Trentelman JJ, Wagemakers A, Koetsveld J, Ersoz JI, Oei A, Fikrig E, Hovius JW. Ixodes scapularis dystroglycan-like protein promotes Borrelia burgdorferi migration from the gut. J Mol Med (Berl). 2016;94(3):361–70.
Baumgarten S, Simakov O, Esherick LY, Liew YJ, Lehnert EM, Michell CT, Li Y, Hambleton EA, Guse A, Oates ME, et al. The genome of Aiptasia, a sea anemone model for coral symbiosis. Proc Natl Acad Sci USA. 2015;112(38):11893–8.
Pavoni E, Cacchiarelli D, Tittarelli R, Orsini M, Galtieri A, Giardina B, Brancaccio A. Duplication of the dystroglycan gene in most branches of teleost fish. BMC Mol Biol. 2007;8:34.
Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res. 1982;10(2):459–72.
Enerly E, Ahmadi H, Shalchian-Tabrizi K, Lambertsson A. Identification and comparative analysis of the RpL14 gene from Takifugu rubripes. Hereditas. 2003;139(2):143–50.
Da Lage JL, Maczkowiak F, Cariou ML. Phylogenetic distribution of intron positions in alpha-amylase genes of bilateria suggests numerous gains and losses. Plos ONE. 2011;6(5):e19673.
Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302(5649):1401–4.
Babenko VN, Rogozin IB, Mekhedov SL, Koonin EV. Prevalence of intron gain over intron loss in the evolution of paralogous gene families. Nucleic Acids Res. 2004;32(12):3724–33.
Sverdlov AV, Babenko VN, Rogozin IB, Koonin EV. Preferential loss and gain of introns in 3′ portions of genes suggests a reverse-transcription mechanism of intron insertion. Gene. 2004;338(1):85–91.
Jeffares DC, Mourier T, Penny D. The biology of intron gain and loss. Trends Genet. 2006;22(1):16–22.
Gorlova O, Fedorov A, Logothetis C, Amos C, Gorlov I. Genes with a large intronic burden show greater evolutionary conservation on the protein level. BMC Evol Biol. 2014;14(1):50.
Jeffares DC, Penkett CJ, Bähler J. Rapidly regulated genes are intron poor. Trends Genet. 2008;24(10):375–8.
Brancaccio A. DAG1, no gene for RNA regulation? Gene. 2012;497(1):79–82.
Ny T, Elgh F, Lund B. The structure of the human tissue-type plasminogen activator gene: correlation of intron and exon structures to functional and structural domains. Proc Natl Acad Sci USA. 1984;81(17):5355–9.
Williams AF, Barclay AN. The immunoglobulin superfamily-domains for cell surface recognition. Annu Rev Immunol. 1988;6:381–405.
Li H, Chen D, Zhang J. Analysis of intron sequence features associated with transcriptional regulation in human genes. Plos ONE. 2012;7(10):e46784.
Yatsenko AS, Shcherbata HR. Drosophila miR-9a targets the ECM receptor dystroglycan to canalize myotendinous junction formation. Dev Cell. 2014;28(3):335–48.
Yatsenko AS, Marrone AK, Shcherbata HR. miRNA-based buffering of the cobblestone-lissencephaly-associated extracellular matrix receptor dystroglycan via its alternative 3′-UTR. Nat. Commun. 2014;5:4906.
Coulombe-Huntington J, Majewski J. Intron loss and gain in Drosophila. Mol Biol Evol. 2007;24(12):2842–50.
Li W, Kuzoff R, Wong CK, Tucker A, Lynch M. Characterization of newly gained introns in Daphnia populations. Genome Biol Evol. 2014;6(9):2218–34.
Kuraku S. Palaeophylogenomics of the vertebrate ancestor-impact of hidden paralogy on hagfish and lamprey gene phylogeny. Integr Comp Biol. 2010;50(1):124–9.
Caputo Barucchi V, Giovannotti M, Nisi Cerioni P, Splendiani A. Genome duplication in early vertebrates: insights from agnathan cytogenetics. Cytogenet Genome Res. 2013;141(2–3):80–9.
Smith JJ, Keinath MC. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications. Genome Res. 2015;25(8):1081–90.
Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, et al. Ensembl 2015. Nucleic Acids Res. 2015;43(Database issue):D662–9.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
AB and JCA conceived the project. AB executed the project. Both authors discussed and analysed data, prepared figures and wrote the paper. Both authors read and approved the final manuscript.
The School of Biochemistry of Bristol University is acknowledged for hosting A.B.
The authors declare that they have no competing interests.
Availability of data and materials
The data supporting the conclusions of this article are available in the Ensembl repository (http://www.ensembl.org/index.html) and the Metazome repository (http://www.metazome.net ). Accordingly, all the accession codes to the sequences analyzed are reported in Tables 1 and 2.
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.