The first genome assembly of fungal pathogen Pyrenophora tritici-repentis race 1 isolate using Oxford Nanopore MinION sequencing

Objectives The assembly of fungal genomes using short-reads is challenged by long repetitive and low GC regions. However, long-read sequencing technologies, such as PacBio and Oxford Nanopore, are able to overcome many problematic regions, thereby providing an opportunity to improve fragmented genome assemblies derived from short reads only. Here, a necrotrophic fungal pathogen Pyrenophora tritici-repentis (Ptr) isolate 134 (Ptr134), which causes tan spot disease on wheat, was sequenced on a MinION using Oxford Nanopore Technologies (ONT), to improve on a previous Illumina short-read genome assembly and provide a more complete genome resource for pan-genomic analyses of Ptr. Results The genome of Ptr134 sequenced on a MinION using ONT was assembled into 28 contiguous sequences with a total length of 40.79 Mb and GC content of 50.81%. The long-read assembly provided 6.79 Mb of new sequence and 2846 extra annotated protein coding genes as compared to the previous short-read assembly. This improved genome sequence represents near complete chromosomes, an important resource for large scale and pan genomic comparative analyses.


Introduction
The necrotrophic fungal pathogen Pyrenophora triticirepentis (Ptr) is the causal agent of tan (or yellow) spot a major disease of wheat (Triticum aestivum) [1]. A number of genomic sequencing projects have been undertaken for Ptr [2][3][4][5][6], the majority derived solely from Illumina sequence. Many of these short-read assemblies are incomplete as many genomic regions in Ptr contain long repetitive regions and identical gene copies that are not resolved by short reads [5]. We therefore undertook the currently more affordable Oxford Nanopore Technologies (ONT) long-read sequencing of an Australian Ptr isolate 134 (Ptr134) that was previously sequenced by short read (150 bp paired end) Illumina technology [3].

Isolate collection and sequencing
The pathogenic isolate Ptr134 was isolated from tan spot infected leaves collected from Queensland, Australia in 2001. Ptr134 was cultured in vitro from a single spore [7]. Ptr134 genomic DNA was extracted from 3-day old mycelia grown in vitro in Fries 3 liquid medium, using DNeasy Plant Mini Kit (Qiagen, Hilden, Germany). DNA was further treated with phenol/chloroform extraction, followed by precipitation with sodium acetate and ethanol, and finally resuspension in TE buffer [3]. The Ptr134 genomic DNA was sequenced using a MinION (MIN-101B) Oxford Nanopore StarterPack, R9 (FLO-MINSP6) flow cell, flow cell priming kit (XP-FLP001) and Rapid Sequencing Kit SQK-RAD004, following manufacturers (Oxford Nanopore Technologies, Oxford, UK) protocol. ONT sequencing after 24 h yielded 4,37,865 passed long reads with a total length of 2.6 Gb (65 × genome coverage), base called in real time using MinKNOW version 127.0.0.1 software on a MacBook Pro (version 10.13.6, 2.6 GHz Intel Core i7 processor and 16 GB 2400 MHz DDR4 memory) to a 1 TB Seagate Backup Plus Slim portable storage device (model SRC0VN2), at the Centre for Crop Disease and Management, Perth, Western Australia. ONT sequence data was based called in real time using the MinKNOW Fast basecalling model from Fast5 into FastQ file format. Raw reads were classed as passed by MinKNOW based on the average read quality score > 7. The Ptr134 genome was also previously sequenced via Illumina HiSeq stranded (150 bp paired end reads) by Novogene Co., Ltd (Hong Kong) to yield 3.2 Gb at 80× coverage [3]. The median and maximum read lengths obtained from the MinION were 4253 bp and 91,723 bp, respectively.

Genome assembly of Ptr134
The passed FastQ data was error-corrected and assembled using linux-amd64 Canu 1.8 software [8] guided by a genome size of 40 Mb and option for raw nanopore data. Illumina PE reads were quality trimmed for random hexamer primers on the 5′ read end using Trimmomatic v0.22 [9]. The high quality trimmed Illumina reads were aligned to the Canu genome assembly using BWA 0.7.14-r1138 [10] and filtered for concordant PE read alignments using samtools 0.1.19-96b5f2294a [11]. The genome assembly was then corrected with the high quality Illumina alignments using Pilon 1.23 [12] to generate a final polished Ptr134 sequence assembly with 2407 SNPs, 1,64,237 small insertions (totalling 208,176 bases) and 123 small deletions (totalling 151 bases) corrected. Post Canu and Pilon error corrections, the average weighted Phred score base qualities for Ptr134 ONT sequence and a previously PacBio RSII sequenced M4 isolate [3] were 36 and 37, respectively.
The ONT Ptr134 annotated genome has been deposited with DDBJ/ENA/GenBank under the updated accession MVBF02000000.

Genome assembly and annotation of Ptr134
The Ptr134 genome assembled into 28 contiguous sequences with of total length 40.79 Mb and GC content of 50.81% (Table 1). Ptr134 ONT (Version 2) contig length statistics showed marked improvements in comparison to the short-read assembly (Version 1) [3]. In comparison to the previous short read assembly, the long-read assembly provided 6.79 Mb of new sequence. A total of 13,918 protein coding genes were also predicted for the Ptr134 ONT assembly, 2,846 more than the previous short read assembly (Table 1). Although there was no improvement in the BUSCO scores for predicted protein coding genes the new predictions are possible pathogen specific genes found in the more complex regions which are harder to assemble with short reads. The ONT Ptr134 annotated genome has been deposited with DDBJ/ENA/ GenBank under the updated accession MVBF02000000 ( Table 1).
Furthermore, sequence breaks in Ptr134 relative to M4 chromosomes 1, 3, 7 and 10 reflect sequence variations between the two isolates. In particular, the Ptr134 sequence break relative to M4 chromosome 10 coincides with the chromosome 10 and 11 fusion site revealed previously by optical mapping of M4 [3]. This is the first ONT sequenced, assembled and annotated genome for a Ptr race 1 isolate. The improved ONT genome assembly of Ptr134, over the former Illumina assembly, will enable the better characterization of important genes involved in pathogenicity that are often contained in highly complex genomic regions [5], and contribute to improved pan genomic analyses of this important fungal pathogen. We demonstrate that ONT is a viable option for sequencing less fragmented and near complete genome assemblies for fungal species. Using these methods researchers can sequence and assemble 'in house' isolates of interest to create quality reference genomes.

Limitations
All methods have been made as consistent as possible for comparative analyses, this analysis has used databases, software and PacBio sequencing versions currently available, which may be updated in the future. The comparison of the two Australian long-read assemblies is only an indication of potential genome stability in Australia.