Skip to main content

Table 3 Sequence summary and resource usage for tree construction and traversal of test datasets

From: Suffix tree searcher: exploration of common substrings in large DNA sequence sets

 

Tree construction

Full traversal

Dataset

Size (Mbp)

Time1

Rate (kbp/s)

Peak memory (Gbp)

Disk usage

Time

Peak memory (Mbp)

10 random 10 kbp seqs

0.1

154 ms

649.4

1.72

4.0 Mb

21 ms

2.3

100 random 10 kbp seqs

1.0

733 ms

1,364.3

1.72

39.7 Mb

150 ms

22.9

1,000 random 10 kbp seqs

10.0

9.2 s

1,084.9

1.72

397.1 Mb

3.8 s

24.0

10,000 random 10 kbp seqs

100.0

1 m 56 s

861.3

1.72

3.9 Gb

5 m 10s

24.1

100,000 random 10 kbp seqs

1,000.0

29 m 52 s

558.1

1.72

38.8 Gb

7 h 23 m

25.2

10,000 random 100 bp seqs

1.0

6.0 s

166.4

1.72

111.6 Mb

1.7 s

22.0

10,000 random 1 kbp seqs

10.0

13.7 s

732.1

1.72

459.7 Mb

25.5 s

24.1

10,000 random 100 kbp seqs

1,000.0

25 m 42 s

648.6

1.72

38.5 Gb

51 m 22 s

24.3

10,000 random 1 Mbp seqs2

10,000.0

12 h 29 m

222.5

1.72

384.7 Gb

9 h 18 m

24.0

62 E. coli genomes

310.5

21 m 55 s

235.7

1.72

11.9 Gb

2 m 33 s

24.0

62 random seqs w/ E. coli lengths

310.5

7 m 3 s

734.0

1.72

11.9 Gb

2 m 56 s

24.0

4 Chlorella virus genomes

0.9

815 ms

1,074.5

1.72

41.8 Mb

203 ms

24.0

Human genome (hg38)2

3,209.3

3 h 58 m

223.7

2.07

117.2 Gb

45 m 33 s

24.1

  1. 1Tree construction time includes time to both Import sequences and Build the suffix index.
  2. 2The "10,000 random 1 Mbp" and human genome datasets utilized an external USB2.0 hard drive.