Skip to main content

Generating long-read sequences using Oxford Nanopore Technology from Diospyros celebica genomic DNA

Abstract

Objectives

Development of sequencing technology has opened up vast opportunities for tree genomic research in the tropics. One of the aforesaid technologies named ONT (Oxford Nanopore Technology) has attracted researchers in undertaking testings and experiments due to its affordability and accessibility. To the best of our knowledge, there has been no published reports on the use of ONT for genomic analysis of Indonesian tree species. This progress is promising for further improvement in order to acquire more genomic data for research purposes. Therefore, the present study was carried out to determine the effectiveness of ONT in generating long-read DNA sequences using DNA isolated from leaves and wood cores of Macassar ebony (Diospyros celebica Bakh.).

Data description

Long-read sequences data of leaves and wood cores of Macassar ebony were generated by using the MinION device and MinKnow v3.6.5 (ONT). The obtained data, as the first long-read sequence dataset for Macassar ebony, is of great importance to conserve the genetic diversity, understanding the molecular mechanism, and sustainable use of plant genetic resources for downstream applications.

Objective

The third-generation sequencing from Oxford Nanopore Technologies (ONT) that is capable of generating long-read sequences was applied to fill the existing technological gaps, particularly with respect to capital cost, use of native DNA/RNA samples, simplicity, portability, ease of use for library preparation, etc. In particular, these technologies provided an on-site analysis that is of a significant advantage considering possible constraints due to existing gaps in the current regulation (e.g. Nagoya Protocol and others) on sample transfer permits either from field to laboratory, both within the country and overseas [1, 2]. The use of ONT on-site required fewer efforts in arranging the administration process for sample transfer, hence these functions enabled to accelerate data generation for various immediate needs, ones of which were urgent decision-making for species identification and conservation and even for on-site forensic investigation. The ONT could also be used in a hybrid system with other sequencing platforms, such as short-read sequencing in order to analyze missing fragments, structural variations, etc. [3]. In the tropics, research on the use of ONT to dissect biodiversity has been still limited due to the new finding scarcity, especially in regards to tree genomic variation analysis. Associated problems such as DNA/RNA yields and quality have been still consistently found depending on species and sample sources led by mainly more complex chemical compounds (such as phenols) and samples’ accessibility. In addition, site conditions might also influence the DNA yields forcing the use of only one general protocol across the samples. Macassar ebony—an endemic and vulnerable species in Sulawesi (Celebes), Indonesia, was utilized in the experiment and designed to determine the utilization efficacy aiming for long-read sequencing using samples from both leaves and small wood cores collected in Celebes [4]. Results of this study are presented in Table 1.

Table 1 Overview of data files/data sets

Data description

Total genomic DNA from 15 individuals of Macassar ebony (Diospyros celebica Bakh.) leaves (n = 11) and wood core (n = 4) collected by using Pickering Punch in three provinces in Indonesia, namely Central Sulawesi, West Sulawesi, and South Sulawesi, were extracted using a modified CTAB methods [5] in which the CTAB buffer contained CTAB 10%, Tris HCl, NaCl 5 M, EDTA 0.5 M, PVP 1%, β-Mercaptoethanol, and dH2O. DNA quality was evaluated by electrophoresis using a Gel Doc EZ System (Bio-Rad, USA) and DNA concentration was measured by using NanoPhotometer NP80 (IMPLEN, Germany).

The library preparation of genomic DNA sample was followed the Nanopore Protocol for Native barcoding genomic DNA (with EXP-NBD104 and SQK-LSK109), version NBE_9065_v109_revJ_23May2018. Sequencing was done in two rounds using two flowcells (FLO-MIN106). The list of samples per flowcell, as well as the native barcode (NBD01–NBD12) used in the study, were listed in Data File 1.

The sequencing run of genomic DNA samples was performed using the MinION device and MinKnow v3.6.5. Sequencing was terminated after no more pores actively sequenced the DNA. The high-accuracy base-calling mode was used to base-call the signal in FAST5 files and outputted FASTQ files. Samples were separated according to each barcode, where afterwards the barcodes were set to automatically trimmed from the reads (Data set 1). All samples were combined using cat command on Linux Mint terminal and analyzed by using NanoStat v1.2.1 to assess the reads quality and reads' statistics. Meanwhile, distribution plots were generated by using NanoPlot v1.31.0 [6] (Data file 2). We obtained 302 567 reads with 99.5% reads quality > Q7 (nanopore default passed quality). After statistic inspection, all reads quality was filtered through NanoFilt v2.7.1 [6]. Reads with Q-score lower than 7 and less than 500 bp were filtered out, with parameter –headcrop and –tailcrop of 10 were applied. Reads filtering resulted in 134 220 reads, then subject to correction, trimming and De novo assembly using Canu v2.0 [7] with option of genome Size = 800 m. Another De novo long-reads assembler was applied to compare the contig assemblies from plant DNA using SMARTdenovo [8] with minimum read length (−J) 2 000. SMARTdenovo utilized corrected reads 'step from Canu correction stage, thus expected to result in better outcome than the Canu assembly's. The contig assemblies were 358 (N50 6.5 kb, GC 39.91%) and 39 (N50 12.7 kb, GC 41.14%) for Canu and SMARTdenovo respectively. The draft assembly then was polished (corrected) against the individual sequencing reads using medaka_consensus v1.0.3 [9] with parameter model for nanopore sequencing (−m) r941_min_high_g330 (Data file 3). The resulting polished assembly statistics was calculated using QUAST v5.0.2 [10], with references of Diospyros celebica chloroplast (Data file 4) and Diospyros lotus genome (Data file 5). This statistic calculation informed how much reference genome fraction covered by the contig assemblies. The polished contigs then were chosen to construct scaffold using LINKS v1.8.7 [11] with default parameter (Data file 6). The resulted scaffolds were 266 (N50 11.3 kb, GC 39.91%) and 33 (N50 17.8 kb, GC 41.14%) for Canu and SMARTdenovo respectively. The longest scaffolds assemblies from both assembler (Canu 141.6 kb, SMARTdenovo 145 kb) were validated with QUAST v5.0.2 with D. celebica chloroplast to check the genome fraction that scaffolds covered (Data file 7). These scaffolds assemblies were then annotated by using GeSeq platform for Organellar Genomes [12], resulted in the GenBank annotation (Data file 8) and their visualization (Data file 9).

Limitations

The long-read sequencing of the Macassar ebony tree equipped with nanopore sequencing was quite challenging. Extraction of genomic DNA shall be optimized to obtain high-quality gDNA without excessive fragmentation. The resulting fragmented DNA required to be removed prior to library preparation as they might occupy nanopores within the flowcells and cause too many short reads across the sequencing outputs. The library preparation shall be optimized as well, for example, the DNA concentration was measured with a spectrophotometer, which could lead to a biased number of the aforementioned concentration. DNA fluorometer was preferred to accurately calculate DNA concentration. The correct DNA concentration loaded into the MinION flowcell would enable the optimal DNA sequencing process and pores occupancy. Achieving higher sequencing throughput is necessary to improve the read accuracy limitation of MinION as been observed in this study.

Availability of data and materials

The data described in this Data note can be freely and openly accessed on figshare [9, 10] (https://doi.org/10.6084/m9.figshare.13027991.v1, https://doi.org/10.6084/m9.figshare.13028069.v1, https://doi.org/10.6084/m9.figshare.13031195.v1, https://doi.org/10.6084/m9.figshare.13028177.v1, https://doi.org/10.6084/m9.figshare.13028180.v1, https://doi.org/10.6084/m9.figshare.13031702.v1, https://doi.org/10.6084/m9.figshare.13031693.v1, https://doi.org/10.6084/m9.figshare.13031708.v1, https://doi.org/10.6084/m9.figshare.13031714.v1, https://identifiers.org/insdc.sra:DRP006615). Please revert to Table 1 and references list [11,12,13,14,15,16,17,18,19,20] for details and links to the data.

Abbreviations

ONT:

Oxford Nanopore Technology

DNA:

Deoxyribonucleic acid

RNA:

Ribonucleic acid

CTAB:

Cetyl Trimethyl Ammonium Bromide

HCl:

Hydrochloric acid

NaCl:

Sodium chloride

EDTA:

Ethylenediaminetetraacetic acid

PVP:

Polyvinylpyrrolidone

gDNA:

Genomic deoxyribonucleic acid

References

  1. 1.

    Runtuwene LR, Tuda JSB, Mongan AE, Suzuki Y. On-site MinION sequencing. Adv Exp Med Biol. 2019;1129:143–50. https://doi.org/10.1007/978-981-13-6037-4_10.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Mardiastuti A. Implementation of access and benefit sharing in Indonesia: review and case studies. JMHT. 2019;25(1):35–43. https://doi.org/10.7226/jtfm.25.1.35.

    Article  Google Scholar 

  3. 3.

    Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;1(1):1–16. https://doi.org/10.1186/s13059-020-1935-5.

    Article  Google Scholar 

  4. 4.

    Karlinasari L, Noviyanti N, Purwanto YA, Majiidu M, Dwiyanti FG, Rafi M, Damayanti R, Harnelly E, Siregar IZ. Discrimination and determination of extractive content of ebony (Diospyros celebica Bakh.) from celebes island by near-infrared spectroscopy. Forests. 2021;12(1):6. https://doi.org/10.3390/f12010006.

    Article  Google Scholar 

  5. 5.

    Doyle JJ, Doyle JL. Isolation of plant DNA from fresh tissue. Focus. 1990;12:13–5.

    Google Scholar 

  6. 6.

    De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–9. https://doi.org/10.1093/bioinformatics/bty149.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: Scalable and accurate long-read assembly via adaptive κ-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. https://doi.org/10.1101/gr.215087.116.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Liu H, Wu S, Li A, Ruan J. SMARTdenovo: A de novo assembler using long noisy reads. Preprints. 2020. https://doi.org/10.20944/preprints202009.0207.v1.

    Article  Google Scholar 

  9. 9.

    Oxford Nanopore Technologies. Medaka: Sequence correction provided by ONT Research. 2018. https://github.com/nanoporetech/medaka. Accessed 23 Sep 2020.

  10. 10.

    Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–5. https://doi.org/10.1093/bioinformatics/btt086.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJM, Birol I. LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience. 2015;4(1):35. https://doi.org/10.1186/s13742-015-0076-3.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, Greiner S. GeSeq—versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45(W1):W6–11. https://doi.org/10.1093/nar/gkx391.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Sample list and barcode. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13027991.v1.

    Article  Google Scholar 

  14. 14.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Raw reads statistic and plot. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13028069.v1.

    Article  Google Scholar 

  15. 15.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Contig assembly. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13031195.v1.

    Article  Google Scholar 

  16. 16.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Contig assembly statistic 1. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13028177.v1.

    Article  Google Scholar 

  17. 17.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Contig assembly statistic 2. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13028180.v1.

    Article  Google Scholar 

  18. 18.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Scaffold assembly. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13031702.v1.

    Article  Google Scholar 

  19. 19.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Scaffold statistic. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13031693.v1.

    Article  Google Scholar 

  20. 20.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Genbank annotation. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13031708.v1.

    Article  Google Scholar 

  21. 21.

    Siregar IZ, Pratama R, Dwiyanti FG, Matra DD, Muhammad M. Scaffold annotation visualization. Figshare. 2021. https://doi.org/10.6084/m9.figshare.13031714.v1.

    Article  Google Scholar 

  22. 22.

    DNA Data Bank of Japan. 2020. https://identifiers.org/insdc.sra:DRP006615. Accessed 27 Jan 2021.

Download references

Acknowledgements

The authors thank (i) South Sulawesi Natural Resources Conservation Agency (BKSDA) for providing research permission and assisting during sample collection in Cani Sirenreng Nature Park, Kalaena Nature Reserve, and Mappu-Gandang Dewata National Park, (ii) South Sulawesi Investment and One-stop Integrated Services (DPMPTSP) for providing research permission in Coppo Village, Belabori Secondary Forest, and Tana Toro Protection Forest (iii) West Sulawesi Investment and One-stop Integrated Services (DPMPTSP) for providing research permit in Batu Ampa Protection Forest, and Sondoang Village, and (iv) Central Sulawesi Investment and One-stop Integrated Services (DPMPTSP) for providing research permission in Wawopada Village and Sausu Village. High appreciation also goes to Molecular Science Laboratory, Advanced Research Laboratory, IPB University, Bogor, Indonesia for its contribution in providing lab facilities.

Funding

This research was supported by the World Resources Institute's (WRI) Forest Program grant (Project Code: 04682, period March 1st, 2019–September 30th, 2020) titled “Timber Tracking Project IPB-WRI: Collection of physical timber reference material and setting up a reference data building pipeline for DNA of commercial timber species (Diospyros celebica. Bakh)” awarded by Norwegian International Climate & Forest Initiative (NICFI). Funding is used to cover research design, sample collection, laboratory expenses, sample preparation, sequencing, data collection, and analysis of results.

Author information

Affiliations

Authors

Contributions

IZS conceived and designed experiments of the study. MMD performed genomic DNA Extraction. FGD, RHP, DDM, and MMD performed experimental treatments and managed DNA sequencing analysis. RHP analyzed and interpreted the DNA sequencing data. RHP and IZS prepared the first draft of manuscript, while FGD, DDM, and MMD made major contributions in writing and editing the manuscript. All authors reviewed, discussed, and revised the contents of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Iskandar Zulkarnaen Siregar.

Ethics declarations

Ethics approval and consent to participate

Biological material samples in forms of dried leaves and wood cores were collected from (i) South Sulawesi following permit approvals from South Sulawesi Natural Resources Conservation Agency/BKSDA (No: SK.151/K.8/BIDTEK/KSA/5/2019 for Cani Sirenreng Nature Park and Kalaena Nature Reserve, and No: SK.295/K.8/BIDTEK/KSA/10/2019 for Mappu-Gandang Dewata National Park), and South Sulawesi Investment and One-stop Integrated Services/DPMPTSP (No: 16163/S.01/PTSP/2019 for Coppo Village, Bellabori Secondary Forest, and Tana Toro Protection Forest), (ii) West Sulawesi following permit approvals from West Sulawesi Investment and One-stop Integrated Services/DPMPTSP No. 000451/76.RP.PTSP.B/X/2019 for Batu Ampa Protection Forest, and Sondoang Village, and (iii) Central Sulawesi following permit approvals from Central Sulawesi Investment and One-stop Integrated Services/DPMPTSP No. 070/433/REK-PL/DPMPTSP/2019 for Wawopada Village and Sausu Village. The herbaria vouchers were identified by Mr Denny and are stored in Forest Research and Development Center and Nature Conservation (BZF)-Forestry and Environmental Research Development and Innovation Agency (FOERDIA) of the Ministry of Environment and Forestry of Republic of Indonesia (KLHK).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Siregar, I.Z., Dwiyanti, F.G., Pratama, R. et al. Generating long-read sequences using Oxford Nanopore Technology from Diospyros celebica genomic DNA. BMC Res Notes 14, 75 (2021). https://doi.org/10.1186/s13104-021-05484-0

Download citation

Keywords

  • Diospyros celebica
  • Genome
  • ONT
  • Scaffolds
  • Long-read sequences
\