Skip to main content

Genomic characterization of bacteriophage BI-EHEC infecting strains of Enterohemorrhagic Escherichia coli



The aims of this research were to determine the genomic properties of BI-EHEC to control Enterohemorrhagic Escherichia coli (EHEC), which was isolated from previous study. Genomic analysis of this phage is essential for the assessment of this bacteriophage for further application as food preservatives.


Genome of BI-EHEC was successfully annotated using multiPhATE2. Structural and lytic cycle-related proteins such as head, tail, capsid, and lysozyme (lysin) were annotated. The phylogenetic tree of tail fiber protein and BRIG results showed that BI-EHEC was similar to phages of the same host in the bacteriophage genome database. There were no indications of virulence properties, antibiotic resistance genes and lysogenic protein among annotated genes which implied BI-EHEC followed a lytic life cycle. PHACTS analysis was done to confirm this notion further and yielded a lytic cycle result. Further analysis using CARD found that BI-EHEC does not contain residual ARGs per recommended parameter. Furthermore, BI-EHEC confirmed as lytic bacteriophage, making it a good candidate for biocontrol agent.


Foodborne disease is often caused by consuming food contaminated by bacteria, one of the foodborne bacteria is EHEC. Conventional preservation methods has many disadvantages, such as the loss of nutritional and organoleptic value [1, 2]. Bacteriophage can be used as an alternative approach. Bacteriophage has two different life cycles. Lytic life cycle enables bacteriophages to lyse the bacterial host and create progeny, it is preferable to be used as biocontrol agent to minimize the probability for horizontal gene transfer. While lysogenic life cycle only enables DNA replication in the host. Lytic bacteriophage is [3].

Our previous study isolated bacteriophage from bovine intestine referred as BI-EHEC, found to be effective in controlling EHEC with 91.02% of reduction [4]. However, genomic properties analysis of BI-EHEC has not been done. In this research, we would like to use in silico approach to determine that BI-EHEC possessed certain criteria as a promising candidate for biocontrol agent.

Main text


Bacteriophage enrichment and purification

EHEC were growth in Luria Bertani (LB) agar media (OXOID), incubated at 37 °C overnight, then stored in a refrigerator at 4 °C. Host bacteria were growth into LB broth media incubated in water bath shaker (Lab Companion) at 120 rpm, 37 °C overnight. BI-EHEC from stock solution was enriched by adding 1.63 ± 0.65 × 1010 PFU/mL of phage solution and 108 CFU/mL (OD600 = 0.132) of its host bacteria into a fresh LB broth media (OXOID). The suspension was then incubated using a water bath shaker at 120 rpm, 37 °C, overnight, then the suspension was centrifuged (Eppendorf) at 5488×g for 15 min. The pellet was removed, and the supernatant was taken to be filtered with a 0.22 μm microfilter (Himedia, Mumbai, India). The purified bacteriophage stock can be kept at 4 °C with the addition of Ringer Solution (OXOID) (1:9 v/v) for further steps. Additionally, agar overlay method was performed to verify the activity and presence of BI-EHEC by observing a clear plaque [5,6,7].

Isolation of bacteriophage genomic material

As much as 5 μL of DNase I (Geneaid) were added to 1.63 ± 0.65 × 1010 PFU/mL of purified bacteriophage, then incubated at 37 °C for 30 min. Then 6 μL of EDTA 0.05 M, 10 μL of 1% sodium dodecyl sulfate (SDS) and 6 μL of proteinase K (Geneaid) (10 mg/mL) were added. The mixture was incubated at 37 °C for 1 h. Then 600 μL of phenol–chloroform-isoamyl alcohol solution (25:24:1) was added and centrifuged (Thermo) at 2655g for 5 min. The upper phase was taken into a new microtube, mixed with 500 μL of chloroform-isoamyl alcohol solution (24:1), and centrifuged at 2655×g for 5 min. The upper phase was taken into a new microtube. A 3M sodium acetate pH 5.2 (1:10) solution followed by isopropyl alcohol (1:1) (MERCK) was added to the mixture and incubated in ice bath for 15 min. Then the suspension was centrifuged at 17,949×g for 10 min, and the supernatant was removed. About 700 μL of 70% of ethanol was added to the pellet, and the mixture was centrifuged again at 17,949×g for 10 min. The supernatant was removed, and the pellet was dried. A 50 μL of nuclease-free water (NFW) (Qiagen) solution was added to the pellet for DNA storage at 4 °C [8].

Next-generation sequencing (NGS)

gDNA sequences obtained from bacteriophage genomic isolation were sent to PT Genetika Science Indonesia for NGS using Oxford Nanopore Technologies (MinKNOW 20.06.9). Base Calling was done using Guppy 4.0.11 high accurate mode. Raw NGS data were filtered using Filtlong v.0.2.0, utilizing the default parameter without an external reference [9]. De novo assembly was done with Flye v.2.8.3 using the default parameter for Oxford nanopore input [10] on the resulting Filtlong fasta. Medaka 1.2.0 (default parameter) [11] was used to polish the assembled genome. The resulting fasta was treated as the complete genome assembly for BI-EHEC.

Bioinformatic analysis

Genome annotations were carried out with multiPhATE2, using default databases (Phantome, pVOGs) and supporting databases (NCBI virus genomes, NCBI Swissprot, CAZy) [12]. A phylogenetic tree of tail fiber protein was constructed using MEGAX (nucleotide sequence) [13, 14]. BLAST analysis was carried out to determine the similarity BI-EHEC most resembles [15]. Two additional bacteriophages were chosen from NCBI database to be compared with BI-EHEC using BRIG [16]. Virulence (eae, lpf, stx) and lysogenic (int, xis) genes were also compared with BI-EHEC via BRIG. Further analysis was done using CARD [17] to study the possible presence of antimicrobial resistance genes (ARGs). PHACTS [18] was performed to determine the life cycle of BI-EHEC.


Bacteriophage annotation

The BI-EHEC (GenBank accession number OL505078) is composed of 151.425 bp with 39% GC content. It has 12 encoded tRNA regions and 352 open reading frames (ORF). Genes associated with cell lysis, assembly, and packaging during the end of the lytic cycle were annotated. It includes putative T4-like lysozyme (EC, tail fiber assembly protein, gpH, and terminases. Other results include parts associated with bacteriophage structures [3, 19]. Complete annotation can be seen in Additional file 1: Table S1, genome map in Additional file 2: Figure S1 and selected results in Table 1.

Table 1 Notable BI-EHEC annotation results

Phylogenetic analysis of tail fiber protein

Sixteen tail fiber protein sequences were obtained from NCBI databases to be compared with BI-EHEC [see Additional file 3: Table S2]. BI-EHEC tail fiber protein showed high similarity with Escherichia phage ukendt tail fiber protein (Figs. 1, 2).

Fig. 1
figure 1

Unrooted phylogenetic tree of BI-EHEC tail fiber protein and other related phages. Phylogenetic tree was based on 100 replications on bootstrap percentage analysis. Homology between species in all tail fiber protein sequences used was indicated by the bar below the figure (20% homology). BI-EHEC used in this analysis was indicated by the arrow

Fig. 2
figure 2

Comparative genomic analysis of BI-EHEC and the other bacteriophages. The inner circle is the BI-EHEC genome as a reference. The dataset used for BRIG analysis using BI-EHEC as reference were: Unpublished bacteriophage DW-EC of ETEC (orange), E. coli phage anhysbys (blue) and E. coli phage ESCO13 (green). Additional annotation ring (red) was added, and the contents can be viewed on Table 1. Colored rings represent regions that were present on both reference and compared genomes. Conversely, white gaps indicate that certain sections were not present on BI-EHEC


BLAST (BLASTn) analysis was performed for BI-EHEC and exhibited the highest similarity with Escherichia phage ESCO13. Escherichia phage anhysbys and ESCO13 of NCBI database were selected and served as a comparison genome for BRIG analysis.

Separate BRIG analyses were also carried out using BI-EHEC against lysogenic [20] and virulence genes [21]. However, it yielded no colored rings, which indicated no such genes present on BI-EHEC.


Analysis using PHACTS was performed to confirm that BI-EHEC have lytic life cycle properties. The average probability produced by PHACTS for BI-EHEC is 0.519 with 0.05 standard deviation, non-confidently declared lytic bacteriophage by PHACTS. However, PHACTS have a high confidence rate (up to 99%) in determining phage lifestyle. According to McNair et al. 2012 [18], there is a high chance that non-confident prediction would yield an exact result as predicted.

CARD analyze a molecular sequence for predicting resistome based on homology and SNP models with perfect and strict parameters yielded zero results, and it was changed to loose hits to accommodate, the complete result can be observed in Additional file 4: Table S3, with TriC as the highest result. The loose hits algorithm can detect in lower similarity (< 95%) and more distant homologs of ARGs genes [17]. However, it only yielded results with less than 95% similarity therefore it can mislabel unrelated genes as antibiotic-resistant genes.


Annotations using multiPhATE2 could annotate proteins which are necessary for the end of a lytic cycle or structural proteins (Table 1). It was also noted that among successfully annotated CDs, lysogenic genes were not able to be found.

The phage genome-packaging component itself consists of portal protein, small terminase and large terminase. Small terminase can be annotated using multiPhATE2 (Table 1), this protein is used to initiate genome packaging and regulating large terminase functions. Meanwhile, large terminase is important to cleave concatenated DNA molecules to initiate packaging mechanisms [22].

Assembly for phages is done separately for the head, the tail, and the long tail fibers before joining to form a mature phage [23, 24]. Both Tail fiber assembly (Tfa) and gpH were involved in the tail assembly. Tfa is a family of proteins play a role in folding phage fibers as chaperones and determining host range specificity. Tape important in measure protein gpH and determines the length of the phage tail [25, 26].

Putative T4-like lysozyme is a hydrolytic enzyme used to cleave peptidoglycan bonds, It is produced during the late stage of the lytic cycle when assembled phages are ready to be released to the environment. Lysin possesses two main domains, N-terminal functions as a catalytic domain while C-terminal serves as a binding domain that targets and binds to specific peptidoglycan ligands [3].

Tail fiber functions as a receptor-binding protein (RBP) in many bacteriophages. RBP plays a role in phage host recognition and its interaction with other phages of the same host. For T4-like phages, the C-terminal and N-terminal regions of tail fiber are important to determine the receptor specificity as well as host range [27, 28]. BI-EHEC tail fiber protein showed the closest with Escherichia phage ukendt with E. coli K-12 MG1655 as its host [29]. Similar genetic make-up might contribute to different phages having the same host range. It is beneficial to study and observe a variety of tail fiber genes to expand knowledge of the host range used in phage cocktails [14].

Resulting annotations and BRIG analysis showed no lysogenic and virulence genes on BI-EHEC. Lysogenic bacteriophages utilize integrase and excisionase, encoded by int and xis, to bind their DNA to the host’s [6, 20, 22]. For virulence genes, analysis was done to three major virulence genes of EHEC: eae, lpf and stx, which encode for intimin, long polar fimbriae (LPF) and Shiga toxin respectively. Stx is the major virulence determinant of EHEC. Meanwhile, intimin and LPF aid the attachment of EHEC to its host cell [21].

Analysis using CARD database was done to determine whether samples carry over ARGs from the host or not. A temperate phage has a higher probability of carrying host genes, at this state, phage integrated their genome into the host and depends on hosts favorable conditions, phages could co-existence (prophages embedded) inside the DNA of the hosts, and there are possibility where temperate phage could carry host genes [30]. The possibility to carry over ARGs is rarely found. It was suggested that up to 1000-fold uncommon for phages to transfer ARGs via transduction compared to other means [31].

Initial analysis using CARD was done using perfect and strict hits only parameters. However, this run yielded no results, which might indicate no ARGs present on BI-EHEC. Another analysis was conducted with the loose hits parameter, including hits with less than 95% homology matches across the database. It could be beneficial in detecting emerging threats. Unfortunately, it also produces homolog hits that might be unrelated to its function as ARGs. By including as many hits as possible, loose hits can detect unknown proteins which can potentially be a new antibiotic-resistant protein. However, it makes it less specific to detect actual antibiotic-resistant protein [17].

Analysis using loose hits showed TriC as the highest possible match for BI-EHEC. The suggested mechanism of action by TriC is antibiotic efflux [31]. While ARGs were found during CARD analysis with loose hits parameter, it was still possible to rule out the presence of ARGs. Another study found that some proteins might be mistakenly labelled as ARGs while using CARD. This finding is common with phage genomes containing several leftover DNA from host cells. It was also suggested to use only conservative parameters when using in silico analysis to achieve the best possible matches, implying that only perfect and strict hits results are eligible to be included [31].

From the data produced by multiPhATE2, BRIG, and CARD analysis, it could be concluded that BI-EHEC leans towards following a strictly lytic life cycle and ARGs were not found in bacteriophage genome.


BI-EHEC were successfully annotated, including structural and lytic cycle-related genes. The phylogenetic tree of tail fiber protein and BRIG results showed that BI-EHEC were similar to phages of the same host in NCBI. There were no signs of virulence or lysogenic protein among annotated genes, and PHACTS analysis confirmed this notion further. CARD results indicate no ARGs present on BI-EHEC. It can be concluded that BI-EHEC is promising as candidate for food preservative.


The lack of annotated genes (resulting in many hypothetical protein hits) on the database has proven to be the limitation of this research.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Enterohemorrhagic Escherichia coli


Antimicrobial resistance genes


Next generation sequencing


Basic local alignment search tool


Molecular evolutionary genetics analysis


BLAST ring image generator


Phage classification tool set


Comprehensive antibiotic resistance database


National Centre for Biotechnology Information


  1. Berger CN, Sodha SV, Shaw RK, Griffin PM, Pink D, Hand P, Frankel G. Fresh fruit and vegetables as vehicles for the transmission of human pathogens. Environ Microbiol. 2010;12(9):2385–97.

    Article  PubMed  Google Scholar 

  2. Moye Z, Woolston J, Sulakvelidze A. Bacteriophage applications for food production and processing. Viruses. 2018;10(205):1–22.

    Article  CAS  Google Scholar 

  3. Pastagia M, Schuch R, Fischetti VA, Huang DB. Lysins: the arrival of pathogen-directed anti-infectives. J Med Microbiol. 2013;2013(62):1506–16.

    Article  CAS  Google Scholar 

  4. Lukman C, Yonathan C, Magdalena S, Waturangi DE. Isolation and characterization of pathogenic Escherichia coli bacteriophages from chicken and beef offal. BMC Res Notes. 2020;13(8):1–7.

    Article  CAS  Google Scholar 

  5. Crothers-Stomps C, Høj L, Bourne DG, Hall MR, Owens L. Isolation of lytic bacteriophages against Vibrio harveyi. J of Appl Microbiol. 2010;108(5):1744–50.

    Article  CAS  Google Scholar 

  6. Rasool MH, Yousaf R, Siddique AB, Saqalein M, Khurshid M. Isolation, characterization, and antibacterial activity of bacteriophages against methicillin-resistant Staphylococcus aureus in Pakistan. Jundishapur J Microbiol. 2016;9(10):1–8.

    Article  CAS  Google Scholar 

  7. Thung TY, Norshafawatie SBMF, Premarathne JMKJK, Chang WS, Loo YY, Kuan CH, New CY, Ubong A, Ramzi OSB, Mahyudin NA, Dayang FB, Jasimah WMR, Son R. Isolation of food-borne pathogen bacteriophages from retail food and environmental sewage. Int FRJ. 2018;24(1):450–4.

    Google Scholar 

  8. O’Flynn G, Ross RP, Fitzgerald GF, Coffey A. Evaluation of a cocktail of three bacteriophages for biocontrol of Escherichia coli O157:H7. J App Environ Microbiol. 2004;70(6):3417–24.

    Article  CAS  Google Scholar 

  9. Wick RR, Menzel P. 2018. Filtlong. Accessed 2 August 2021:

  10. Kolmogorov M, Yuan J, Lin Y, Pevzner P. Assembly of long error-prone reads using repeat graphs. Nat Biotechnol. 2019.

    Article  PubMed  Google Scholar 

  11. Nanopore Tech. 2021. Medaka. Accessed 2 August 2021:

  12. Zhou CLE, Kimbrel J, Edwards R, McNair K, Souza BA, Malfatti S. MultiPhATE2: code for functional annotation and comparison of phage genomes. G3. 2021;11(5):1–5.

    Article  Google Scholar 

  13. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35(6):1547–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kim GH, Kim JW, Kim J, Chae PJ, Lee JS, Yoon SS. Genetic analysis and characterization of a bacteriophage ØCJ19 active against enterotoxigenic Escherichia coli. Food Sci Anim Resour. 2020;40(5):746–57.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;1990(215):403–10.

    Article  Google Scholar 

  16. Alikhan NF, Petty NK, Zakour NLB, Beatson S. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genom. 2011;12(402):1–10.

    Article  CAS  Google Scholar 

  17. Alcock BP, Raphenya AR, Lau TTY, Tsang KK, Bouchard M, Edalatmand A, Huynh W, Nhuyen ALV, Cheng AA, Liu S, Min SY, et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 2020;48(1):517–25.

    Article  CAS  Google Scholar 

  18. McNair K, Bailey BA, Edwards RA. PHACTS, a computational approach to classifying the lifestyle of phages. Bioinformatics. 2012;28(5):614–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Roussel C, Cordonnler C, Galla W, Goff OL, Thévenot J, Chalancon S, Thevenor-Sergentet D, Leriche F, Wiele TVd, et al. Increased EHEC survival and virulence gene expression indicate an enhanced pathogenicity upon simulated pediatric gastrointestinal conditions. Pediatr Res. 2016;80(5):734–43.

    Article  CAS  PubMed  Google Scholar 

  20. Doss J, Culbertson K, Hahn D, Camacho J, Barekzi N. A review of phage therapy against bacterial pathogens of aquatic and terrestrial organisms. Viruses. 2017;9(50):1–10.

    Article  CAS  Google Scholar 

  21. Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA, editors. The double stranded DNA viruses. Massachusetts: Academic press; 2005.

    Google Scholar 

  22. Granoff A, Webster RG, editor. Encyclopedia of Virology (Second Edition). Amsterdam (NL): Elsevier.

  23. Yap ML, Rossmann MG. Structure and function of bacteriophage T4. Future Microbiol. 2014;2014(9):11319–27.

    Google Scholar 

  24. Häuser R, Blasche S, Dokland T, Haggård-Ljungquist E, Av B, Salas M, Casjens S, Molineux I, Uetz P. Bacteriophage protein–protein interactions. Adv Virus Res. 2012;2012(83):219–98.

    Article  CAS  Google Scholar 

  25. North OI, Davidson AR. Phage proteins required for tail fiber assembly also bind specifically to the surface of host bacterial strains. J Bacteriol. 2020;2013(3):1–19.

    Article  Google Scholar 

  26. Simpson DJ, Sacher JC, Szymanski CM. Development of an assay for the identification of receptor binding proteins from bacteriophages. Viruses. 2016;8(1):17.

    Article  CAS  PubMed Central  Google Scholar 

  27. Chen M, Zhang L, Abdelgader SA, Yu L, Xu J, Yao H, Lu C, Zhang W. Alterations in gp37 expand the host range of a T4-like phage. Appl Environ Microbiol. 2017;83(23):01576–617.

    Article  CAS  Google Scholar 

  28. Olsen NS, Forero-Junco L, Kot W, Hansen LH. Exploring the remarkable diversity of culturable Escherichia coli phages in the Danish wastewater environment. Viruses. 2020;12(9):986.

    Article  CAS  PubMed Central  Google Scholar 

  29. Enault F, Briet A, Bouteille L, Roux S, Sullivan MB, Petit MA. Phages rarely encode antibiotic resistance genes: a cautionary tale for virome analyses. ISMEJ. 2017;2017(11):237–47.

    Article  CAS  Google Scholar 

  30. Belinda L, Jiayuan C, Prasanth M, Yunsong Y, Xiaoting H, Sebastian L. A biological inventory of prophages in A. baumannii genomes reveal distinct distributions in classes, length, and genomic positions. Front Microbiol. 2020;11(2020):3055.

    Google Scholar 

  31. Zheng W, Xu W, Xu Y, Liao W, Zhao Y, Zheng X, Xu C, Zhou T, Cao J. The prevalence and mechanism of triclosan resistance in Escherichia coli isolated from urine samples in Wenzhou, China. Antimicrob Resist Infect Control. 2020;9(161):1–10.

    Article  CAS  Google Scholar 

Download references


The authors acknowledge research funding support by Indonesian Ministry of education and culture through the national research Grant 2020- Fundamental research.


This study was funded by DIKTI 2020. The funder has no contribution in design, collection, writing, and interpreting data in this study.

Author information

Authors and Affiliations



MND: conduct research, data analysis, manuscript preparation under the advisory of DEW and YY: advisory for bioinformatics part. DEW: personal investigator and design proposal and advisory the research. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Diana Elizabeth Waturangi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Table S1 Full annotations of US-EHEC.

Additional file 2.

Figure S1 Genome map of BI-EHEC. Annotation was selected based on its role on lyric cycle and/or structural.

Additional file 3.

Table S2 List of tail fiber from NCBI database and its accession number.

Additional file 4.

Table S3 US-EHEC CARD results (highest to lowest best identities).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dewanggana, M.N., Waturangi, D.E. & Yogiara Genomic characterization of bacteriophage BI-EHEC infecting strains of Enterohemorrhagic Escherichia coli. BMC Res Notes 14, 459 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: