Optimus Primer: A PCR enrichment primer design program for next-generation sequencing of human exonic regions
© Lettre et al; licensee BioMed Central Ltd. 2010
Received: 25 March 2010
Accepted: 7 July 2010
Published: 7 July 2010
Polymerase chain reaction (PCR) remains a simple, flexible, and inexpensive method for enriching genomic regions of interest for next-generation sequencing. In order to utilize PCR in this context, a major challenge facing researchers is how to generate a very large number of functional PCR primers that will successfully generate useable amplicons. For instance, in an exon-only re-sequencing project targeting 100 genes, each with 10 exons, 1,000 pairs of primers are required. In fact, the reality is often more complex as each gene might have several isoforms and large exons need to be divided to maintain the desired amplicon size. With only a list of gene names, our program Optimus Primer (OP) automatically takes into account all these variables, and can generate primers with no need to provide genome coordinates. More importantly however, OP, unlike other primer design programs, uniquely utilizes Primer3 in an iterative manner that allows the user to progressively design up to four iterations of primer designs. Through a single interface, the user can specify up to four different design parameters with different stringencies, thus increasing the probability that a functional PCR primer pair will be designed for all regions of interest in a single pass of the pipeline.
To demonstrate the effectiveness of the program, we designed PCR primers against 77 genes located in loci associated with ulcerative colitis as part of a candidate gene re-sequencing experiment. We achieved an experimental success rate of 93% or 472 out of 508 amplicons spanning the exonic regions of the 77 genes. Moreover, by automatically passing amplicons that failed primer design through three additional iterations of design parameters, we achieved an additional 170 successful primer pairs or 34% more in a single pass of OP than by conventional methods.
With only a gene list and PCR parameters, a user can produce hundreds of PCR primer designs for regions of interest with a high probability of success in a very short amount of time. Optimus Primer is an essential tool for researchers who want to pursue PCR-based enrichment strategies for next-generation re-sequencing applications. The program can be accessed via website at http://op.pgx.ca.
The development of next-generation sequencing (NGS) technologies has dramatically increased the size and scale of sequencing experiments. It is now possible to produce several gigabases of DNA sequence in a short period of time . To date, the cost of whole human genome sequencing remains prohibitive. Focusing NGS experiments to specific genomic regions is an alternative, cost-effective approach to whole genome sequencing but it requires the enrichment of the targeted regions before library construction. Several hybridization-based methods - each with its own strengths and weaknesses - have been developed . While these DNA enrichment methods continue to be developed and improved, a simple and inexpensive alternative is PCR. PCR is a robust, well-understood, very accessible and flexible strategy for DNA enrichment. It also allows for the very specific amplification of targeted regions without the high background found in hybridization-based methods. Hundreds or even thousands of PCR amplicons that span selected genomic regions of interest can be pooled together and used as input material for the sequencing reaction. PCR has been traditionally used for classic Sanger chemistry-based sequencing of few genes. However, the throughput of next-generation DNA re-sequencing is such that new tools need to be developed to facilitate the implementation of PCR as enrichment strategy for these new sequencing methods.
Enrichment of exonic regions is of particular interest as the functionality of variations within these regions can be more easily inferred than variations in non-coding DNA. Surveying genetic variation by NGS for all exons in candidate genes, such as those identified in genome wide association studies (GWAS), may contribute to the identification of the causal genes and variants, and therefore the underlying biology of the disease.
Example PCR Primer Pairs Designed After Submitting TCF7L2
Optimus Primer (OP) is a web-based automated pipeline that requires the user to submit only a gene list, or list of regions of interest, and primer design parameters. The pipeline consists of four steps. First, all exons for all known isoforms for each gene submitted are identified using the RefSeq database . This step is skipped if regions of interest are submitted. A list of all unique exons for each gene is then generated with exons/regions that are in close proximity to each other (< 25 base pairs for example) merged into a single element in the list. Second, the pipeline extracts the desired genomic sequences from the current build of the human genome (currently hg18/NCBI36), plus additional flanking sequence at a length defined by the user to facilitate the design of the PCR primers. OP will prioritize the design of the primers to these flanking regions to ensure complete coverage of the specific exonic regions. The user has the option of including or excluding sequence that has been masked with RepeatMasker . Additionally, polymorphisms from the current build of dbSNP (currently build 130) can be masked to ensure that primers are not designed to locations with underlying SNPs . Primer3 has been integrated into the pipeline to design PCR primers using user defined parameters . Exons/regions that are larger than the specified amplicon size will be automatically split into smaller amplicons, with a minimum 25 bp overlap to ensure that every base can be amplified and sequenced. Exons for which no PCR primer design is possible using the initial parameters are passed on to a second iteration of Primer3 with modified design criteria defined by the user.
Currently, the pipeline allows the user to define up to four iterations of Primer3 design criteria in a single pass to attempt to design PCR primers for all amplicons with up to 5 primer pairs for each amplicon. The final step of the pipeline is to run all designed PCR primers through the UCSC Genome Browser in-silico PCR (isPCR) utility as a validation step for the primer pairs selected . The isPCR utility allows the user to check the human genome for the presence of unique primer pairs, ensures that they are designed correctly on opposite strands, that they are the correct distance apart and generates a report of the theoretical amplicons produced by the primer pair. OP then uses this data to generate a report for all primer designs as well as the percent coverage for each exon/region for each gene for all isPCR validated primer pairs. Primers designed with OP can then be used to amplify genes of interest as the enrichment step prior to library construction for NGS experiments. In particular, because PCR is flexible and easily implementable, OP will be ideal to target for NGS genes that are difficult to enrich using solid- or liquid-based capture reagents and for genes that are very polymorphic. Additionally, for genes whose annotation is dynamic from one build of the human genome to the other, PCR can be easily adapted whereas probes-based capture reagents will need to be re-synthesized.
Primer Design Parameters Used in the Four Passes of Primer3.
Optimum Size (BP)
Minimum Size (BP)
Maximum Size (BP)
Minimum TM (°C)
Maximum TM (°C)
PRODUCT SIZE RANGE (BP)
PCR is currently the cheapest, simplest, most flexible approach for sample enrichment prior to NGS experiments. It also has some distinctive advantages over the less specific enrichment methodologies currently used for targeted next generation resequencing. In order to capitalize on PCR-based methodologies, hundreds if not thousands of PCR primers need to be designed. To address this gap in bioinformatic tools, we have developed Optimus Primer (OP), a web-based automated PCR design pipeline that facilitates the simultaneous design of PCR primers for the enrichment of exonic regions in multiple genes. This tool can be useful not only for the enrichment of exonic regions for NGS experiments, but it also has much more general applicability to other experiments that require the rapid design of PCR primers for multiple regions of interest such as genotyping, Sanger sequencing and real time PCR. With only a gene list and PCR parameters, a user can design hundreds of PCR primers in a very short timeframe.
We would like to acknowledge Christopher Beck, Tibor van Rooij and Sharon Marsh for their input and ideas.
This work was supported by La Fondation de l'Institut de Cardiologie de Montréal (GL) and Genome Canada and Genome Quebec (M.S.P), National Institutes of Allergy and Infectious Diseases AI065687; AI067152 (JDR), National Institute of Diabetes and Digestive and Kidney Diseases DK064869; DK062432 (JDR) and the Crohn's and Colitis Foundation of America SRA512 (JDR).
- Ansorge WJ: Next-generation DNA sequencing techniques. N Biotechnol. 2009, 25: 195-203. 10.1016/j.nbt.2008.12.009.PubMedView ArticleGoogle Scholar
- Summerer D: Enabling technologies of genomic-scale sequence enrichment for targeted high-throughput sequencing. Genomics. 2009Google Scholar
- Tsai MF, Lin YJ, Cheng YC, Lee KH, Huang CC, Chen YT, Yao A: PrimerZ: streamlined primer design for promoters, exons and human SNPs. Nucleic Acids Res. 2007, 35: W63-65. 10.1093/nar/gkm383.PubMed CentralPubMedView ArticleGoogle Scholar
- ExonPrimer. [http://ihg2.helmholtz-muenchen.de/ihg/ExonPrimer.html]
- EasyExonPrimer. [http://18.104.22.168/~primer/EasyExonPrimer.html]
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-65. 10.1093/nar/gkl842.PubMed CentralPubMedView ArticleGoogle Scholar
- RepeatMasker Open-3.0. [http://www.repeatmasker.org]
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29: 308-311. 10.1093/nar/29.1.308.PubMed CentralPubMedView ArticleGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34: D590-598. 10.1093/nar/gkj144.PubMed CentralPubMedView ArticleGoogle Scholar