Skip to main content

Advertisement

Table 3 Sequence summary and resource usage for tree construction and traversal of test datasets

From: Suffix tree searcher: exploration of common substrings in large DNA sequence sets

  Tree construction Full traversal
Dataset Size (Mbp) Time1 Rate (kbp/s) Peak memory (Gbp) Disk usage Time Peak memory (Mbp)
10 random 10 kbp seqs 0.1 154 ms 649.4 1.72 4.0 Mb 21 ms 2.3
100 random 10 kbp seqs 1.0 733 ms 1,364.3 1.72 39.7 Mb 150 ms 22.9
1,000 random 10 kbp seqs 10.0 9.2 s 1,084.9 1.72 397.1 Mb 3.8 s 24.0
10,000 random 10 kbp seqs 100.0 1 m 56 s 861.3 1.72 3.9 Gb 5 m 10s 24.1
100,000 random 10 kbp seqs 1,000.0 29 m 52 s 558.1 1.72 38.8 Gb 7 h 23 m 25.2
10,000 random 100 bp seqs 1.0 6.0 s 166.4 1.72 111.6 Mb 1.7 s 22.0
10,000 random 1 kbp seqs 10.0 13.7 s 732.1 1.72 459.7 Mb 25.5 s 24.1
10,000 random 100 kbp seqs 1,000.0 25 m 42 s 648.6 1.72 38.5 Gb 51 m 22 s 24.3
10,000 random 1 Mbp seqs2 10,000.0 12 h 29 m 222.5 1.72 384.7 Gb 9 h 18 m 24.0
62 E. coli genomes 310.5 21 m 55 s 235.7 1.72 11.9 Gb 2 m 33 s 24.0
62 random seqs w/ E. coli lengths 310.5 7 m 3 s 734.0 1.72 11.9 Gb 2 m 56 s 24.0
4 Chlorella virus genomes 0.9 815 ms 1,074.5 1.72 41.8 Mb 203 ms 24.0
Human genome (hg38)2 3,209.3 3 h 58 m 223.7 2.07 117.2 Gb 45 m 33 s 24.1
  1. 1Tree construction time includes time to both Import sequences and Build the suffix index.
  2. 2The "10,000 random 1 Mbp" and human genome datasets utilized an external USB2.0 hard drive.