fastQ_brew was developed using Perl and successfully tested on Microsoft Windows 7 Enterprise ver.6.1, Linux Ubuntu 64-bit ver.16.04 LTS, and Linux Mint 18.1 Serena. fastQ_brew does not rely on any dependencies that are not currently part of the Perl Core Modules (http://perldoc.perl.org/index-modules-A.html), which makes fastQ_brew very straight forward to implement. fastQ_brew is composed of two separate packages: fastQ_brew.pm and fastQ_brew_Utilities.pm. fastQ_brew_Utilities.pm provides fastQ_brew.pm with access to various subroutines that are called to handle FASTQ manipulations and quality control. The fastQ_brew object is instantiated by calling the constructor subroutine called “new” which creates a ‘blessed’ object that begins gathering methods and properties by calling the load_fastQ_brew method. Once the object has been populated, the user can call run_fastQ_brew to begin processing the FASTQ data. Sample data are provided at the GitHub repo and directions for usage are described in the README.md file.
The command-line arguments supplied to the fastQ_brew object are as follows: (1) -lib, which can be either sanger or illumina; (2) -path, specifies the path to the input file (can use “./” for current directory with UNIX or “.\” on Windows cmd); (3) -i, this is the name of the file containing the FASTQ reads; (4) -smry, return summary statistics table on the unfiltered data and filtered data; (5) -qf, this option will filter reads by Phred (also called Q score) quality score—any reads having an average Phred score below the threshold will be removed: e.g. -qf = 20 will remove reads with Phred scores below 20; (6) -lf, this will filter reads below a specified length; (7) -trim_l, will trim the specified number of bases from the left end of each read; (8) -trim_r, same as left-trim except that here the reads will be trimmed from the right side; (9) -adpt_l, will remove a specified adapter sequence from the left end of a read; (10) -adpt_r, same as -adpt_l except that here the reads will be trimmed from the right side; (11) -mis_l, allows for a specified number of mismatches between the user provided -adpt_l sequence and each read e.g. a mismatch = 1, would match a hypothetical 3 base adapter, TAG, to the left end of a sequence that started with TAG or AAG or TAA or any of the nine possibilities; (12) -mis_r, same as -mis_l except that this relates to the adpt_r sequence supplied by the user; (13) -dup, removes duplicate reads; (14) -no_n, removes reads that contain non-designated bases i.e. bases that are not A, G, C or T e.g. N; (15) -fasta, this option will convert the FASTQ file to FASTA format; (16) -rev_comp, will reverse complement reads in the supplied FASTQ file; (17) -rna, will convert each read to the corresponding RNA sequence in the supplied FASTQ file; (18) -clean, option to delete temporary files created during the run. If the summary option is selected, fastQ_brew will return a results table to STDOUT with summary statistics of the FASTQ data file prior to filtering and after filtering. The summary report will provide a table detailing max, min, and average GC% values for all reads; max, min, and average read lengths, max, min, and average Phred scores, and max, min, and average error probabilities. The Phred score (denoted as Q) represents the probability of an error for each base, and is logarithmically related to the base-calling error probability, P such that:
or
$$P = 10\frac{ - Q}{10}$$
In the case of arguments 15–17 above, a new file will be generated in each case, whereas for all other options the user-supplied arguments will be chained together to return a single filtered file.