- Data note
- Open Access
Sequencing of E. coli strain UTI89 on multiple sequencing platforms
BMC Research Notes volume 13, Article number: 487 (2020)
The availability of matched sequencing data for the same sample across different sequencing platforms is a necessity for validation and effective comparison of sequencing platforms. A commonly sequenced sample is the lab-adapted MG1655 strain of Escherichia coli; however, this strain is not fully representative of more complex and dynamic genomes of pathogenic E. coli strains.
We present six new sequencing data sets for another E. coli strain, UTI89, which is an extraintestinal pathogenic strain isolated from a patient suffering from a urinary tract infection. We now provide matched whole genome sequencing data generated using the PacBio RSII, Oxford Nanopore MinION R9.4, Ion Torrent, ABI SOLiD, and Illumina NextSeq sequencers. Together with other publically available datasets, UTI89 has a nearly complete suite of data generated on most second- and third-generation sequencers. These data can be used as an additional validation set for new sequencing technologies and analytical methods. More than being another E. coli strain, however, UTI89 is pathogenic, with a 10% larger genome, additional pathogenicity islands, and a large plasmid, features that are common among other naturally occurring and disease-causing E. coli isolates. These data therefore provide a more medically relevant test set for development of algorithms.
Control sequencing data across different sequencing platforms is extremely important for validation and effective comparison of sequencing platforms. A commonly sequenced sample that has been extensively used for these purposes is the MG1655 strain of E. coli . However, the MG1655 genome is smaller and less complex than those of some pathogenic E. coli strains [2, 3]. As part of control experiments, we have sequenced UTI89, a uropathogenic E. coli (UPEC) strain originally isolated from a patient suffering from an acute bladder infection , using several different sequencing technologies, including ABI SOLiD, Ion Torrent, PacBio, Oxford Nanopore, and Illumina. Our new data supplements previously published sequencing data generated using the Roche 454 , Illumina HiSeq , and the original Oxford Nanopore Technologies MinION . With the inclusion of these new data sets, E. coli strain UTI89 now has a nearly complete set of raw sequence data generated using most second- and third-generation sequencers. For some of the technologies we have multiple data sets, such as for PacBio, which spans the first iteration of the RSII sequencing chemistry (XL/C2) in 2012 up to the P6-C4 chemistry (which was current in 2018), which led to a more than fivefold increase in mean read length.
The new data sets are summarized in Table 1. Details of library preparation and sequencing methods for the new datasets are presented below.
Genomic DNA was extracted from UTI89 grown overnight in Lysogeny Broth (LB) and used to generate Long Mate Pair (LMP) libraries. LMP libraries were generated using an insert size of 3–4 kb according to the manufacturer’s instructions to produce a 375 bp library.
Genomic DNA was extracted from UTI89 harbouring the pBAD33 plasmid  grown overnight in LB. Sequencing libraries were then generated using the Ion Xpress™ Plus gDNA library preparation protocol according to the manufacturer’s instructions.
PacBio, RSII, XL/C2 Chemistry
Genomic DNA was extracted from SLC-66 (UTI89 with a kanamycin cassette integrated into the phage HK022 integration site) grown overnight in LB. Large insert (15 Kb) native SMRTbell sequencing libraries were generated according to the manufacturer’s protocols.
Genomic DNA was extracted from UTI89 grown overnight in LB. Sequencing libraries were built using the Illumina TruSeq Nano DNA LT kit according to the manufacturer’s instructions, with shearing to 350 bp.
Oxford Nanopore, MinION Mk1B Device, R9.4, 1D Ligation sequencing
Genomic DNA was extracted from UTI89 grown overnight in LB. 1 μg of unsheared DNA was used to prepare sequencing libraries using the Ligation sequencing kit 1D R9 version (SQK-LSK108) according to the manufacturer’s instructions.
The prepared sequencing library was loaded onto a FLO-MIN106 R9.4 with Spot-ON and a 24 h sequencing run was performed. Base calling was subsequently performed using Oxford Nanopore’s Albacore Sequencing Pipeline Software (version 1.2.1) [18, 19].
PacBio, RSII, P6-C4 Chemistry
Genomic DNA was extracted from UTI89 grown overnight in LB. Large insert (20 Kb) native SMRTbell sequencing libraries were generated according to the manufacturer’s instructions.
Previously published data sets
There are three previously published data sets generated using other sequencing platforms or sequencer versions: Roche 454 [4, 24,25,26,27,28,29,30], Illumina HiSeq 2000 [5, 31,32,33,34], and the original Oxford Nanopore MinION with an R7 flow cell [6, 35, 36]. The data presented in this manuscript complements these published datasets (also included in Table 1).
The following are limitations of these data:
The data was collected over a period of several years, and thus all experimental steps were performed by different persons.
Some strains contain plasmids or other markers (see details above).
Not every generation of sequencing machine or library preparation method was used.
Availability of data and materials
The data described in this Data note can be freely and openly accessed on Genbank. Please see Table 1 for accession numbers. Specifically, the experiment accessions for the newly presented data are: SRX4387579 , SRX4225380 , SRX4387449 , SRX4223297 , SRX4387499 , SRX5058882 , and SRX5058883 . The experiment accession for the previously published data are: SRX000179 , ERX632843 , ERX632844 , and ERX987748 .
Uropathogenic Escherichia coli
Urinary tract infection
Long mate pair
Personal genome machine
Blattner FR, Plunkett G 3rd, Bloch CA, Perna NT, Burland V, Riley M, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–62.
Welch RA, Burland V, Plunkett G 3rd, Redford P, Roesch P, Rasko D, et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA. 2002;99:17020–4.
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, et al. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001;8:11–22.
Chen SL, Hung C-S, Xu J, Reigstad CS, Magrini V, Sabo A, et al. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci USA. 2006;103:5977–82.
Sullivan MJ, Ben Zakour NL, Forde BM, Stanton-Cook M, Beatson SA. Contiguity: contig adjacency graph construction and visualisation; 2015. https://doi.org/10.7287/peerj.preprints.1037v1.
Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016;7:11307.
https://identifiers.org/ncbi/insdc.sra:SRX4387579. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7517573. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR8247388. Accessed 17 May 2020.
Guzman LM, Belin D, Carson MJ, Beckwith J. Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J Bacteriol. 1995;177:4121–30.
https://identifiers.org/ncbi/insdc.sra:SRX4225380. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7352157. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX4387449. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7517443. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7525090. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX4223297. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7349974. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX4387499. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR7517493. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX5058882. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX5058883. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR8240630. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR8240631. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRX000179. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000868. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000869. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000870. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000871. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000872. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:SRR000873. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERX632843. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERX632844. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERR687900. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERR687901. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERX987748. Accessed 17 May 2020.
https://identifiers.org/ncbi/insdc.sra:ERR908493. Accessed 17 May 2020.
The authors wish to thank Tyson Clarke and Jonas Korlach of Pacific Biosciences for help with sequencing the PacBIo XL/C2 data set. The authors also wish to thank the Next Generation Sequencing Platform and the GERMS Platform at the Genome Institute of Singapore for technical help and useful discussions related to the generation of these data.
This work was supported by the National Research Foundation, Singapore (NRF-RF2010-10), the Singapore Ministry of Health’s National Medical Research Council under two Clinician-Scientist Individual Research Grants (NMRC/CIRG/1357/2013 and NMRC/CIRG/1358/2013) and the Genome Institute of Singapore (GIS)/Agency for Science, Technology, and Research (A*STAR). The funders had no role in the design, collection, analysis, or interpretation of the data. The funders had no role in the writing of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Fenlon, S.N., Chee, Y.C., Chee, J.L.Y. et al. Sequencing of E. coli strain UTI89 on multiple sequencing platforms. BMC Res Notes 13, 487 (2020). https://doi.org/10.1186/s13104-020-05335-4
- Escherichia coli
- Urinary Tract Infection (UTI)
- Ion Torrent
- Oxford Nanopore