Skip to main content
Figure 8 | BMC Research Notes

Figure 8

From: Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio

Figure 8

Diagrams of the improvements introduced with the new Scipio version. The diagrams describe the improvement of the gene reconstructions of the DHC genes in the cross-species search of the human homologs (query sequences) in elephant (target sequence) using different Scipio versions and parameters. (A) The base-line is the result of the search using the old Scipio v1.0. The maximal possible annotation is represented by the gene reconstructions based on the manually annotated elephant DHC genes (reference dataset, purple). The blue bars show the reconstruction with Scipio v1.5 using --blat_tilesize = 7, --exhaust_align_size = 500 and --exhaust_gap_size = 21 (dataset s1). Green bars are results from the second search (dataset s2) with same parameters as for the first search, except for --blat_tilesize = 6 and --exhaust_gap_size = 18 (three times the tilesize). This dataset represents improvements independent of Scipio. The red bars represent searches with same parameters as for dataset s1, except for the increased parameters --exhaust_align_size = 5,000 and --exhaust_gap_size = 25 (dataset s4). This data takes far longer to compute compared to the first search, because of the Needleman-Wunsch search in longer regions. For the DHC1 gene Scipio v1.0 maps too many amino acids of the human query sequence to the elephant genome. So the negative bar representing the other datasets shows that these datasets cover the right number of 4561 amino acids. (B) This diagram depicts the number of gaps (human query sequence not matched in the elephant genome) and questionable introns (intron?; introns with uncommon splice sites) for the searches with the old Scipio version and the new version applying different parameters as in (A). The detailed values of the diagrams are shown in tables in Additional file 3.

Back to article page