ITScan: a web-based analysis tool for Internal Transcribed Spacer (ITS) sequences
© Ferro et al.; licensee BioMed Central Ltd. 2014
Received: 18 March 2014
Accepted: 19 November 2014
Published: 27 November 2014
Studies on fungal diversity and ecology aim to identify fungi and to investigate their interactions with each other and with the environment. DNA sequence-based tools are essential for these studies because they can speed up the identification process and access greater fungal diversity than traditional methods. The nucleotide sequence encoding for the internal transcribed spacer (ITS) of the nuclear ribosomal RNA has recently been proposed as a standard marker for molecular identification of fungi and evaluation of fungal diversity. However, the analysis of large sets of ITS sequences involves many programs and steps, which makes this task intensive and laborious.
We developed the web-based pipeline ITScan, which automates the analysis of fungal ITS sequences generated either by Sanger or Next Generation Sequencing (NGS) platforms. Validation was performed using datasets containing ca. 2,000 to 40,000 sequences each.
ITScan is an online and user-friendly automated pipeline for fungal diversity analysis and identification based on ITS sequences. It speeds up a process which would otherwise be repetitive and time-consuming for users. The ITScan tool and documentation are available at http://evol.rc.unesp.br:8083/itscan.
Studies on fungal biodiversity use DNA sequence-based tools to generate molecular marker to identify rare species and determine associations in a microbial community. The technique is particularly powerful in characterizing fungal diversity in environmental samples containing many fungal species which do not grow, or grow poorly, in laboratory cultures. Many biodiversity studies are based on the nuclear ribosomal Internal Transcribed Spacer (ITS) region[3, 4], which is a small (~500 base-pair) region occurring in multiple copies in the fungal nuclear genome and shows a high degree of variation even between closely related species.
The ITS region has been recently designated as a universal marker for molecular barcoding of fungi or the default region for species identification. To determine the microbial diversity in environmental samples, generated ITS sequences are grouped in operational taxonomic units (OTUs), often using the MOTHUR program and an OTU-based approach analysis[7, 8]. The use of multiple programs and stages of analysis make the process laborious and time-consuming. In this work, we describe a web-based pipeline that automates the study of fungal diversity and identification based on ITS sequences.
Client Mode — aims at dealing with client-side concerns;
Request-Response Mode — performs a set of server-side and business logic concerns using coupled third-party programs and their business rules. The Pipeline Manager provides Representation State Transfer - REST service.
This architecture assists background information to check for failures in client and server sides.
Pipeline for fungal ITS analysis
ITScan requires a FASTA-formatted input file containing pre-processed sequences, i.e., high quality sequences (usually Phred ≥20) without primer and adaptor sequences. Pre-processing programs, such as SEQTRIM, SCATA, PANGEA, CANGS and PYRONOISE, can be used to trim data from different sequencing platforms (e.g. 454, Illumina, regular Sanger reads) and the resulting output files can then be read by ITScan.
The architectural model enables the user to develop web service components and to couple them in a new customized pipeline. R language scripts provide graphic results and spreadsheets representing rarefaction curves as well as Shannon or Simpson diversity indexes and Chao1 richness estimator.
ITScan has a user-friendly interface and can process up to three a FASTA-formatted input files simultaneously and compare these files with each other. The pipeline was validated using Sanger sequences (Mantovani et al., in preparation) and a large dataset (2,000 to 40,000 sequences) simulating results from Next Generation Sequencing (NGS), which was retrieved from the UNITE database.
Many programs which analyze ITS fungal sequences, such as FungalITSPipeline, QIIME and FHiTINGS, require the user installation and operation via command line. These requirements are not necessary in ITScan, which was built with a web-based interface.
The ITScan pipeline comes with some limitations. For instance, it processes only three FASTA files simultaneously. In addition, it relies on GenBank servers to run BLASTN searches, instead of implementing time-consuming local searches on annotated databases which would improve taxonomic assignment. Future expansions in our servers will allow us to implement multi sample analyses based on local annotated fungal ITS databases.
This work describes an architectural model that can be used with bioinformatics third-party programs. All components follow the same framework, which facilitates the development of new components. ITScan works with sequences derived from both Sanger and NGS technologies. The pipeline can process single or as many as three datasets to compare distinct biological samples. Output data include graphs and spreadsheets that are automatically generated to represent fungal diversity. ITScan includes an user manual and an example dataset. We validated ITScan using datasets containing ca. 2,000 and 40,000 sequences retrieved from the UNITE database. Using of ITScan does not require computational expertise.
Availability and requirements
Project name: ITScan
Project home page: http://evol.rc.unesp.br:8083/itscan
Operating system(s): Platform independent
Programming language: Perl, Java
Other requirements: Web browser
License: ITScan web tool is freely available for all users. ITScan is open source under the GNU GPL license.
This work was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (Proc. 2011/50226-0). MF receives a doctoral fellowship (Proc. 2009/52289-9).
- Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Fungal Barcoding Consortium: Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc Natl Acad Sci. 2012, 109: 6241-6246. 10.1073/pnas.1117018109.PubMedPubMed CentralView ArticleGoogle Scholar
- Sun X, Guo LD: Endophytic fungal diversity: review of traditional and molecular techniques. Mycology. 2012, 3 (1): 65-76.Google Scholar
- Rittenour WR, Ciaccio CE, Barnes CS, Kashon ML, Lemons AR, Beezhold DH, Green BJ: Internal transcribed spacer rRNA gene sequencing analysis of fungal diversity in Kansas City indoor environments. Environ Sci Process Impacts. 2014, 16 (1): 33-43. 10.1039/c3em00441d.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu YT, Chen RK, Lin SJ, Chen YC, Chin SW, Chen FC, Lee CY: Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species. Genet Mol Res. 2014, 13 (2): 2709-2717. 10.4238/2014.April.8.15.PubMedView ArticleGoogle Scholar
- Bellemain E, Carlsen T, Brochmann C, Coissac E, Taberlet P, Kauserud H: ITS as an environmental DNA barcode for fungi: an in silico approach reveals PCR biases. BMC Microbiol. 2010, 10: 189-10.1186/1471-2180-10-189.PubMedPubMed CentralView ArticleGoogle Scholar
- Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF: Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009, 75 (23): 7537-7541. 10.1128/AEM.01541-09.PubMedPubMed CentralView ArticleGoogle Scholar
- Schloss PD, Westcott SL: Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol. 2011, 77 (10): 3219-3226. 10.1128/AEM.02810-10.PubMedPubMed CentralView ArticleGoogle Scholar
- Links MG, Chaban B, Hemmingsen SM, Muirhead K, Hill JE: mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences. Microbiome. 2013, 1: 23-10.1186/2049-2618-1-23.PubMedPubMed CentralView ArticleGoogle Scholar
- Fowler M: Patterns of Enterprise Application Architecture. 2002, Boston: Addison-WesleyGoogle Scholar
- Richardson L, Ruby S: RESTful Web Services: Web Services for the Real World. 2007, Sebastopol: O’Reilly MediaGoogle Scholar
- Falgueras J, Lara AJ, Fernández Pozo N, Cantón FR, Pérez Trabado G, Claros MG: SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinform. 2010, 20 (11): 38-View ArticleGoogle Scholar
- SCATA: Sequence Clustering and Analysis of Tagged Amplicons.http://scata.mykopat.slu.se/,
- Giongo A, Crabb DB, Davis-Richardson AG, Chauliac D, Mobberley JM, Gano KA, Mukherjee N, Casella G, Roesch LF, Walts B, Riva A, King G, Triplett EW: PANGEA: pipeline for analysis of next generation amplicons. ISME J. 2010, 4 (7): 852-861. 10.1038/ismej.2010.16.PubMedPubMed CentralView ArticleGoogle Scholar
- Pandey RV, Nolte V, Schlotterer C: CANGS: a user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies. BMC Res Notes. 2010, 11 (3): 3-View ArticleGoogle Scholar
- Quince C, Lanzen A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT: Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009, 6 (9): 639-641. 10.1038/nmeth.1361.PubMedView ArticleGoogle Scholar
- Nilsson RH, Abarenkov K, Veldre V, Nylinder S, Wit P, Brosché S, Alfredsson JF, Ryberg M, Kristiansson E: An open source chimera checker for the fungal ITS region. Mol Ecol Resour. 2010, 10: 1076-1081. 10.1111/j.1755-0998.2010.02850.x.PubMedView ArticleGoogle Scholar
- Katoh M, Kuma M: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30: 3059-3066. 10.1093/nar/gkf436.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389. 1PubMedPubMed CentralView ArticleGoogle Scholar
- Booch G, Rumbaugh J, Jacobson I: Unified Modeling Language User Guide. 2005, Reading: Addison-Wesley ProfessionalGoogle Scholar
- Katayama T, Nakao M, Takagi T: TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services. Nucleic Acids Res. 2010, 38: W706-W711. 10.1093/nar/gkq386.PubMedPubMed CentralView ArticleGoogle Scholar
- Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J: VARIANT: command Line, web service and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 2012, 40: W40-W58. 10.1093/nar/gkr1174.View ArticleGoogle Scholar
- Abarenkov K, Nilsson RH, Larsson K, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AFS, Tedersoo L, Ursing BM, Vrålstad T, Liimatainen K, Peintner U, Kõljalg U: The UNITE database for molecular identification of fungi - recent updates and future perspectives. New Phytol. 2010, 186 (2): 281-285. 10.1111/j.1469-8137.2009.03160.x.PubMedView ArticleGoogle Scholar
- Nilsson RH, Bok G, Ryberg M, Kristiansson E, Hallenberg N: A software pipeline for processing and identification of fungal ITS sequences. Source Code Biol Med. 2009, 4: 1-10.1186/1751-0473-4-1.PubMedPubMed CentralView ArticleGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R: QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010, 7 (5): 335-336. 10.1038/nmeth.f.303.PubMedPubMed CentralView ArticleGoogle Scholar
- Dannemiller KC, Reeves D, Bibby K, Yamamoto N, Peccia J: Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS). J Basic Microbiol. 2014, 54: 315-321. 10.1002/jobm.201200507.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.