Genome data of four Pythium insidiosum strains from the phylogenetically-distinct clades I, II, and III

Objectives We employed the Illumina NGS platform to sequence genomes of 4 different strains of the pathogenic oomycete Pythium insidiosum, the causative agent of pythiosis. These strains were isolated from humans in Thailand (n=3) and the United States (n=1), and phylogenetically classified into clade-I, -II, and -III. Our study augmented the completeness of the P. insidiosum genome database for exploration of the biology, evolution, and pathogenesis of the pathogen. Data description One paired-end library (180-bp insert) was prepared from a gDNA sample of P. insidiosum strains ATCC200269 (clade-I), Pi19 (clade-II), MCC18 (clade-II), and SIMI4763 (clade-III) for whole-genome sequencing by Illumina HiSeq2000/HiSeq2500 NGS platform. A range of 28.459.4 million raw reads, accounted for 3.07.3Gb, were obtained and assembled into the genome sizes of 47.1Mb (15,153 contigs; 85% completeness; 19,329 open reading frames [ORFs]) for strain ATCC200269, 35.4Mb (14,576 contigs; 83% completeness; 13,895 ORFs) for strain Pi19, 34.5Mb (11,084 contigs; 84% completeness; 13,249 ORFs) for strain MCC18, and 47.1Mb (15,162 contigs; 85% completeness; 19,340 ORFs) for strain SIMI4763. The genome data can be downloaded from the NCBI/DDBJ databases under the accessions BCFN00000000.1 (ATCC200269), BCFS00000000.1 (Pi19), BCFT00000000.1 (MCC18), and BCFU00000000.1 (SIMI4763).

adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article' s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article' s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Objective
Next-generation sequencing (NGS) is a sophisticated technology that facilitates multiple genome sequencing of different strains of the same microbial species, in a short duration, and at a low cost [1]. Obtained data promise extensive comparative genomic analyses to better understand the biology, evolution, and pathogenesis of a pathogen of interest. Besides, such data could serve as a comprehensive genetic resource for the identification of diagnostic and therapeutic microbial markers. Here, we employed the Illumina HiSeq2000/HiSeq2500 NGS platform to sequence the genomes of 4 different strains (i.e., ATCC200269, Pi19, MCC18, and SIMI4763) of Pythium insidiosum, a prominent pathogenic oomycete microorganism that infects humans and animals worldwide and causes an infectious condition with high mortality and morbidity, called pythiosis [2][3][4]. These strains were isolated from human patients with pythiosis from Thailand (n = 3) and the United States (n = 1), and have been phylogenetically classified into clade-I (n = 1), clade-II (n = 2), and clade-III (n = 1), based on the ribosomal deoxyribonucleic acid (rDNA) sequence analysis [5]. So far, the draft genome sequences from 7 strains of P. insidiosum (including the synonym species Pythium destruens), isolated from humans, horses, and the environment in various countries, are available in the public databases [6][7][8][9][10][11][12]. This study contributed additional genomic data to augment the completeness of the public P. insidiosum genome database. Researchers around the world can use this genome data as a basis to explore the biology, evolution, and pathogenesis of P. insidiosum, which could provide knowledge that can be adapted for the development of preventive measures, reliable diagnostic assay, and effective therapeutic modality for pythiosis.

Data description
The P. insidiosum strain ATCC200269 (phylogenetic clade-I) was isolated from a human patient in the United States, while the strains Pi19 (clade-II), MCC18 (clade-II), and SIMI4763 (clade-III) were isolated from human patients in Thailand. The identity (i.e., species) and genotype (i.e., clade) of each strain were confirmed by the rDNA sequence analysis [accession numbers: AB898108 (for strain ATCC200269), AB898113 (Pi19), AB971183 (MCC18), and AB971189 (SIMI4763)] [5]. These organisms were cultured in Sabouraud dextrose broth with shaking (50-150 rounds per min) for one week at 37 °C. The resulting hyphal material of each strain was harvested and subjected to genomic deoxyribonucleic acid (gDNA) extraction, using an established method [13]. The identity of each strain was re-assessed by the rDNA sequence analysis, using the obtained gDNA [5]. One paired-end library with a 180-bp gap was prepared for each gDNA sample before proceeding to whole-genome sequencing by the Illumina HiSeq2000 (for strains Pi19 and MCC18) and HiSeq2500 (for strains ATCC200269 and SIMI4763) NGS platforms (Yourgene Bioscience, Taiwan), as previously described [6,7,10,12]. In brief, the Qiagen CLC Genomics Workbench software trimmed raw reads to ensure a read length of at least 35 bases. Cutadapt 1.8.1 [14]  In summary, the draft genomes of P. insidiosum strains ATCC200269 (genome size: 47.1 Mb), Pi19 (35.4 Mb), MCC18 (34.5 Mb), and SIMI4763 (47.1 Mb) isolated from human patients with pythiosis living in Thailand and the United States, have been generated and publicly available. The obtained genome data could be a useful dataset to enhance the exploration of the biology, evolution, and pathogenesis of P. insidiosum, which can lead to clinical applications for better management of patients with pythiosis.

Limitations
We used the Illumina HiSeq2000/HiSeq2500 shortread NGS platform to sequence 4 genomes of P. insidiosum (strains ATCC200269, Pi19, MCC18, and SIMI4763). Users of the genome data should be aware that the sequencing-by-synthesis technique in the Illumina platforms constructs a library base on DNA amplification, which could result in sequence coverage biases and substitution errors. As seen in the genome data of these P. insidiosum strains, the total bases ranged from 3.0 to 7.3 Gb, and the genome sequence coverages ranged from 74× to 154×. Another limitation of the study is the number and type of the DNA library. The genome sequences of each P. insidiosum strain were obtained from only one paired-end library. As expected, all strains showed a less complete genome (83-85% CEGMA-based genome completeness), a higher number of contigs (11,084-15,162 contigs), and a smaller genome size (34.5-47.1 Mb), when compared with the P. insidiosum's reference genome (92% completeness; 1192 contigs; 53.2-Mb size) generated from one paired-end and three mate-pair libraries [8].
Abbreviations CEGMA: Core Eukaryotic Genes Mapping Approach; DDBJ: DNA Data Bank of Japan; gDNA: Genomic deoxyribonucleic acid; NCBI: National Center for Biotechnology Information; NGS: Next-generation sequencing; ORF: Open reading frame; rDNA: Ribosomal deoxyribonucleic acid.