Generating long-read sequences using Oxford Nanopore Technology from Diospyros celebica genomic DNA

Objectives Development of sequencing technology has opened up vast opportunities for tree genomic research in the tropics. One of the aforesaid technologies named ONT (Oxford Nanopore Technology) has attracted researchers in undertaking testings and experiments due to its affordability and accessibility. To the best of our knowledge, there has been no published reports on the use of ONT for genomic analysis of Indonesian tree species. This progress is promising for further improvement in order to acquire more genomic data for research purposes. Therefore, the present study was carried out to determine the effectiveness of ONT in generating long-read DNA sequences using DNA isolated from leaves and wood cores of Macassar ebony (Diospyros celebica Bakh.). Data description Long-read sequences data of leaves and wood cores of Macassar ebony were generated by using the MinION device and MinKnow v3.6.5 (ONT). The obtained data, as the first long-read sequence dataset for Macassar ebony, is of great importance to conserve the genetic diversity, understanding the molecular mechanism, and sustainable use of plant genetic resources for downstream applications.


Objective
The third-generation sequencing from Oxford Nanopore Technologies (ONT) that is capable of generating longread sequences was applied to fill the existing technological gaps, particularly with respect to capital cost, use of native DNA/RNA samples, simplicity, portability, ease of use for library preparation, etc. In particular, these technologies provided an on-site analysis that is of a significant advantage considering possible constraints due to existing gaps in the current regulation (e.g. Nagoya Protocol and others) on sample transfer permits either from field to laboratory, both within the country and overseas [1,2]. The use of ONT on-site required fewer efforts in arranging the administration process for sample transfer, hence these functions enabled to accelerate data generation for various immediate needs, ones of which were urgent decision-making for species identification and conservation and even for on-site forensic investigation. The ONT could also be used in a hybrid system with other sequencing platforms, such as short-read sequencing in order to analyze missing fragments, structural variations, etc. [3]. In the tropics, research on the use of ONT to dissect biodiversity has been still limited due to the new finding scarcity, especially in regards to tree genomic variation analysis. Associated problems such as DNA/ RNA yields and quality have been still consistently found depending on species and sample sources led by mainly more complex chemical compounds (such as phenols) and samples' accessibility. In addition, site conditions Open Access BMC Research Notes *Correspondence: siregar@apps.ipb.ac.id 1 Department of Silviculture, Faculty of Forestry and Environment, IPB University (Bogor Agricultural University), Bogor, Indonesia Full list of author information is available at the end of the article might also influence the DNA yields forcing the use of only one general protocol across the samples. Macassar ebony-an endemic and vulnerable species in Sulawesi (Celebes), Indonesia, was utilized in the experiment and designed to determine the utilization efficacy aiming for long-read sequencing using samples from both leaves and small wood cores collected in Celebes [4]. Results of this study are presented in Table 1.

Data description
Total genomic DNA from 15 individuals of Macassar ebony (Diospyros celebica Bakh.) leaves (n = 11) and wood core (n = 4) collected by using Pickering Punch in three provinces in Indonesia, namely Central Sulawesi, West Sulawesi, and South Sulawesi, were extracted using a modified CTAB methods [5] in which the CTAB buffer contained CTAB 10%, Tris HCl, NaCl 5 M, EDTA 0.5 M, PVP 1%, β-Mercaptoethanol, and dH 2 O. DNA quality was evaluated by electrophoresis using a Gel Doc EZ System (Bio-Rad, USA) and DNA concentration was measured by using NanoPhotometer NP80 (IMPLEN, Germany).
The library preparation of genomic DNA sample was followed the Nanopore Protocol for Native barcoding genomic DNA (with EXP-NBD104 and SQK-LSK109), version NBE_9065_v109_revJ_23May2018. Sequencing was done in two rounds using two flowcells (FLO-MIN106). The list of samples per flowcell, as well as the native barcode (NBD01-NBD12) used in the study, were listed in Data File 1.
The sequencing run of genomic DNA samples was performed using the MinION device and MinKnow v3.6.5.
Sequencing was terminated after no more pores actively sequenced the DNA. The high-accuracy base-calling mode was used to base-call the signal in FAST5 files and outputted FASTQ files. Samples were separated according to each barcode, where afterwards the barcodes were set to automatically trimmed from the reads (Data set 1). All samples were combined using cat command on Linux Mint terminal and analyzed by using NanoStat v1.2.1 to assess the reads quality and reads' statistics. Meanwhile, distribution plots were generated by using NanoPlot v1.31.0 [6] (Data file 2). We obtained 302 567 reads with 99.5% reads quality > Q7 (nanopore default passed quality). After statistic inspection, all reads quality was filtered through NanoFilt v2.7.1 [6]. Reads with Q-score lower than 7 and less than 500 bp were filtered out, with parameter -headcrop and -tailcrop of 10 were applied. Reads filtering resulted in 134 220 reads, then subject to correction, trimming and De novo assembly using Canu v2.0 [7] with option of genome Size = 800 m. Another De novo long-reads assembler was applied to compare the contig assemblies from plant DNA using SMARTdenovo [8] with minimum read length (−J) 2 000. SMARTdenovo utilized corrected reads 'step from Canu correction stage, thus expected to result in better outcome than the Canu assembly's. The contig assemblies were 358 (N50 6.5 kb, GC 39.91%) and 39 (N50 12.7 kb, GC 41.14%) for Canu and SMARTdenovo respectively. The draft assembly then was polished (corrected) against the individual sequencing reads using medaka_consensus v1.0.3 [9] with parameter model for nanopore sequencing (−m) r941_min_high_g330 (Data file 3). The resulting polished assembly statistics was calculated using QUAST v5.0.2  7). These scaffolds assemblies were then annotated by using GeSeq platform for Organellar Genomes [12], resulted in the GenBank annotation (Data file 8) and their visualization (Data file 9).

Limitations
The long-read sequencing of the Macassar ebony tree equipped with nanopore sequencing was quite challenging. Extraction of genomic DNA shall be optimized to obtain high-quality gDNA without excessive fragmentation. The resulting fragmented DNA required to be removed prior to library preparation as they might occupy nanopores within the flowcells and cause too many short reads across the sequencing outputs. The library preparation shall be optimized as well, for example, the DNA concentration was measured with a spectrophotometer, which could lead to a biased number of the aforementioned concentration. DNA fluorometer was preferred to accurately calculate DNA concentration. The correct DNA concentration loaded into the MinION flowcell would enable the optimal DNA sequencing process and pores occupancy. Achieving higher sequencing throughput is necessary to improve the read accuracy limitation of MinION as been observed in this study.  [11][12][13][14][15][16][17][18][19][20] for details and links to the data.

Ethics approval and consent to participate
Biological material samples in forms of dried leaves and wood cores were collected from (i) South Sulawesi following permit approvals from South Sulawesi