Skip to main content

Whole-genome of Mexican-crAssphage isolated from the human gut microbiome



crAssphage is a newly found phage described as the most abundant virus in the human gut microbiome. The majority of the crAssphage proteins are unknown in sequences databases, and its pathogenicity and epidemiology in humans are yet unclear. Hence, being one of the most abundant phages in the human gut microbiome more investigation at the genomic level is necessary to improve our understanding, especially in the Latin American population.

Data description

In this article, we provide the whole genome of a crAssphage isolated from the human gut microbiome of the Mexican population, which was named Mexican-crAssphage. The genome consists of 96,283 bp, G+C content of 29.24% and 87 coding sequences. Notably, we did not find any transfer RNA genes in the genome sequence. We also sequenced viral-like enriched particles from 28 fecal samples, and we detected the presence of the Mexican-crAssphage genome in 8 samples (28.5%). To our knowledge, our data is the first whole genome report of the crAssphage isolated from the Latin American Population and provides valuable information for the experimental characterization of the most abundant human gut bacteriophage. The whole genome shotgun project of the Mexican-crAssphage is available at DDBJ/ENA/GenBank under the GenBank MK069403.


The human body is inhabited by a high diversity of bacteria, archaea, fungi, protozoa, and viruses. These microbes are collectively known as the human microbiota, whereas their collective genomes form the human microbiome [1]. The human gut virome is dominated by bacteriophages [2], infecting their bacterial hosts and they also impact the microbiome composition [1]. Interestingly, it has been proposed that bacteriophages may have a role in shaping the diversity and composition of the microbiota [1] and also play a role in some diseases such as bowel disease [3] and type 1 and 2 diabetes [4, 5].

A novel bacteriophage, named crAssphage, was recently discovered as the most abundant virus in the human gut microbiome [6]. After that, a crAss-like family was discovered and appears to be abundant and widespread in diverse habitats, both animal and environmental associated [7]. Various bacteria of the phylum Bacteroidetes appear to be the primary hosts of crAss-like phages [6, 7]. For example, ΦCrAss001, isolated from the human feces, was the first member of the extensive crAssphage family to be grown in pure culture and this phage infects the human gut symbiont Bacteroides intestinalis [8]. Recently, 98 complete circular genomes of crAss-like phages were reported and helped to establish the classification of this phage family into four candidate subfamilies composed of 10 candidate genera [9]. Furthermore, crAssphage was not associated with diarrhea in Chinese patients [10]. The crAssphage genomes have been isolated from the human gut of several geographical origins (Data file 1 in Table 1). However, a genome sequence from this phage family has not been reported to date in a Latin American population. Hence, being one of the most abundant phages in the human gut microbiome more investigation at the genomic level is necessary to improve our understanding about their function, especially in the Latin American population.

Table 1 Overview of data files/data sets

Data description

Phage-enriched filtrates of fecal samples from 28 Mexican children were isolated using a modified protocol [11]. In brief, 250 mg of feces were homogenized in SM Buffer for each sample and centrifuged 30 min at 4700×g. The supernatant was filtered through a 0.22 μm PES filter (720–1320, Nalgene, USA) and concentrated in Amicon Ultra 15, 100KDa (UFC910096, Millipore, USA). Then, Amicon was washed using one volume of SM Buffer, and the viral particles were concentrated in 200 µl of SM buffer. We extracted the DNA using the QIAamp MinElute Virus Spin kit (57704, QIAGEN, Hilden, Germany). The DNA quality and quantity were measured using agarose gel electrophoresis and Qubit High-sensitivity fluorometric assay (Cat. Q32851, Life Technologies, Carlsbad, CA, USA), respectively. The DNA was used to construct the pair-end libraries using the Nextera XT DNA Library Preparation kit (Cat. FC-131-1024, Illumina, CA, USA) selecting an insert size of 400–600 bp with the Ampure XP beads (Cat. A63882, Beckman Coulter, CA, USA). The libraries were analyzed with the 2100 Bioanalyzer instrument (Cat. 5067-1504, Agilent Technologies, CA, USA), and sequencing was performed using the Illumina NextSeq500 with a 300 cycle paired-end format (FC-404-2003; Illumina, CA, USA) at the National Institute of Genomic Medicine in Mexico City. The reads were analyzed using FastQC version 0.11.5, and only the reads with a quality > Q20 were used for further analysis. The resulting reads from each sample were mapped against the crAssphage reference genome (GenBank ID: JQ995537) using SMALT version 0.7.6. After that, we selected the sample with the highest number of reads mapped to crAssphage genome to conduct a denovo genome assembly using Spades version 3.8.1. The resulting contigs were ordered using MeDuSa [12] setting the default parameters.

The total size of the assembled Mexican-crAssphage genome was 96,283 bp and G+C content of 29.24% (Data set 1 in Table 1). The reads coverage of our Mexican-crAssphage genome was 188X. To visualize the read depth and codified proteins along the Mexican-crAssphage genome we used DNAPlotter (Data file 2 in Table 1). A total of 87 coding sequences (CDS) were predicted using RAST [13]. They were largely co-oriented, organized in two blocks of CDS alongside the genome. These sequences were BLASTed against the non-redundant (NR) proteins database using Blast2GO [14]. After that, 12 proteins (13.8%) were unknown, and 60 proteins (68.9%) were defined as hypothetical protein. The genome showed to encode phage proteins, including proteins involved in nucleic acid manipulation (helicase, ligase, primase, and polymerase), and phage structural proteins. Notably, we did not find any transfer RNA genes in the genome sequence.

Finally, the viral reads of the 28 samples were mapped against the Mexican-crAssphage genome using SMALT version 0.7.6. We detected the presence of this phage’s genome in eight samples, meaning that the Mexican-crAssphage was present in 28.5% of the analyzed samples. This is the first CrAssphage genome isolated from a Latin-American population, and it can be used in different applications of human viral metagenomics to understand the impact that host-genetics have in modulating the evolution of crAssphage across the world.


A more significant deep sequencing of viral particles should be used in the future to improve a region of 100 uncalled bases (N’s) reported in this Mexican-crAssphage genome. This region is in positions 40,116–40,215 of the reported genome. It is important to note that these are the only missing bases from all the genome.


  1. Bikel S, Valdez-Lara A, Cornejo-Granados F, Rico K, Canizales-Quintero S, Soberón X, Del Pozo-Yauner L, Ochoa-Leyva A. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput Struct Biotechnol J. 2015;13:390–401.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Minot S, Sinha R, Chen J, Li H, Keilbaugh S, Wu G, Lewis J, Bushman F. The human gut virome: Inter-individual variation and dynamic response to diet. Genome Res. 2011;21(10):1616–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Norman J, Handley S, Baldridge M, Doit L, Liu C, Keller B, et al. Disease-specific alterations in the enteric virome in inflammatory bowel disease. Cell. 2015;160(3):447–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ma Y, You X, Mai G, Tokuyasu T, Liu C. A human gut phage catalog correlates the gut phageome with type 2 diabetes. Microbiome. 2018;6(1):24.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Zhao G, Vatanen T, Droit L, Park A, Kostic AD, Poon TW, et al. Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children. Proc Natl Acad Sci U S A. 2017;114(30):E6166–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Dutilh B, Cassman N, McNair K, Sanchez S, Silva G, Boling L, et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat Commun. 2014;24(5):4498.

    Article  CAS  Google Scholar 

  7. Yutin N, Makarova K, Gussow A, Krupovic A, Segall A, Edwards R, Koonin E. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut. Nat Microbiol. 2018;3:38–46.

    Article  CAS  PubMed  Google Scholar 

  8. Shkoporov A, Khokhlova E, Fitzgerald C, Stockdale S, Draper L, Ross R, Hill C. ΦCrAss001, a member of the most abundant bacteriophage family in the human gut, infects Bacteroides. bioRxiv. 2018.

    Article  Google Scholar 

  9. Guerin E, Shkoporov A, Stockdale S, Clooney A, Ryan F, Draper L, Gonzalez-Tortuero E, Ross P, Hill C. Biology and taxonomy of crAss-like bacteriophages, the most abundant virus in the human gut. bioRxiv. 2018.

    Article  Google Scholar 

  10. Liang YY, Zhang E, Tong YG, Chen SP. CrAssphage is not associated with diarrhoea and has high genetic diversity. Epidemiol Infect. 2016;1(16):1–5.

    Article  CAS  Google Scholar 

  11. Reyes A, Wu M, McNulty NP, Rohwer F, Gordon J. Gnotobiotic mouse model of phage–bacterial host dynamics in the human gut. Proc Natl Acad Sci USA. 2013;110(50):20236–41.

    Article  CAS  PubMed  Google Scholar 

  12. Bosi E, Donati E, Galardini M, Brunetti S, Sagot MF, Lió P, Crescenzi P, Fani R, Fondi M. MeDuSa: a multi-draft based scaffolder. Bioinformatics. 2015;31(15):2443–51.

    Article  CAS  PubMed  Google Scholar 

  13. Aziz R, Bartels D, Best A, DeJongh M, Disz T, Edwards R, et al. The RAST server: rapid annotations using subsystems technology. BMC Genomics. 2008;9(1):75–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Götz S, García-Gómez J, Terol J, Williams T, Nagaraj S, Nueda M, Robles M, Talón M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Authors’ contributions

MCE, EEM, and AOL conceived and designed the experiments, performed the experiments, analyzed the data. FCG, AHR, FS, and AOL performed the experiments, analyzed the data, and contributed the reagents/materials/analysis tools. SCQ, BELC and AOL contributed the reagents/materials. All authors read and approved the final manuscript.


We thank Juan Manuel Hurtado-Ramírez and Gamaliel López-Leal of IBT-UNAM for bioinformatics technical support and Dr. Ricardo Alfredo Grande Cano and Gloria Tanahiry Vázquez Castro at IBT-UNAM for their experimental sequencing support. We thank Alfredo Mendoza-Vargas and Unidad de Secuenciación Masiva of the Instituto Nacional de Medicina Genómica for their technical support in the samples sequencing.

Competing interests

The authors declare that they have no competing interests.

Availability of data materials

The data described in this Data Note (whole genome project) is freely available at DDBJ/EMBL/GenBank under the accession number PRJNA495080 ( and GenBank MK069403 ( The data files 1 and 2 are available at and, respectively.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The Ethic Committee of the National Institute of Genomic Medicine in Mexico City approved the study. The parents or guardians of donors signed the informed consent form for participation, and the donors assented to participate.


The National Council for Science and Technology of Mexico (CONACyT) funded the reported study, Grant No. SALUD-2014-C01-234188 and the DGAPA PAPPIT UNAM (IA203118).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Adrián Ochoa-Leyva.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cervantes-Echeverría, M., Equihua-Medina, E., Cornejo-Granados, F. et al. Whole-genome of Mexican-crAssphage isolated from the human gut microbiome. BMC Res Notes 11, 902 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: