DNA microarrays remain a popular technology for measuring gene expression and other global properties of the genome, with over 2200 experiments representing tens of thousands of samples published in ArrayExpress [1, 2] so far in 2012. Even as next-gen sequencing technology has begun to supersede microarrays for such measurements, many researchers still rely on them for various applications. For instance, in a recent, highly cited sequencing study of plague (Y. pestis), Bos et al.  used microarrays as a capture technology to concentrate samples for sequencing. Stransky et al.  recently used microarrays for the purpose of screening head and neck tumor samples prior to sequencing; these are just two of many examples. In this paper, we report on the potential for stable duplex formation between partially complementary oligonucleotides and unintended DNA targets, which has significant implications for their ability to capture non-target material whether in the context of a wholly microarray-based experiment, or a sample concentration protocol.
The conventional wisdom surrounding design of oligonucleotide microarrays, specifically those of the type that rely on 50–60mer oligonucleotides for detection, was established in the early 2000s. Cross-hybridization is defined as a specific side reaction between a probe and an unintended target to form a stable duplex, and microarray design pipelines generally attempt to avoid this either by screening for defined levels of sequence complementarity, or by using a thermodynamic cutoff, though in the latter cases, sequence complementarity is often used as a pre-screen.
A common criterion for microarray design, used in many oligonucleotide design software pipelines  either as a pre-screen or as the sole predictor of potential cross hybridization, is based largely on an early paper from Kane et al.  and is generally referred to as Kane’s first criterion. This criterion, that eliminating stretches of apparent complementarity longer than 15 nucleotides between a probe and unintended targets will eliminate cross-hybridization, is very convenient for microarray designers, because it justifies the use of fast suffix-tree based methods for sequence screening using a word size that will automatically exclude most entirely random short alignments. While shorter complementary stretches can be identified using Smith-Waterman alignment or other approaches [7, 8], it may be impossible to eliminate shorter cross-hybridizing stretches for every gene in expression experiments, due to the relatively limited sequence space explored by mRNAs and noncoding RNAs .
Kane’s criteria have always been somewhat problematic, because in the original experiment, the effects of complementarity may have been confounded with unimolecular structure formation in the reagents. Later studies that attempted to validate the Kane criteria did not challenge the established lower bound of complementarity using constructed complementary stretches shorter than 15 nt.  The behavior of competing closest thermodynamic near neighbors has been investigated by Chou et al. in , but the focus there was competition between the intended target and the thermodynamically nearest neighbor, rather than on the general hybridization potential of partially matched duplexes. The results presented here suggest that Kane’s first criterion may be insufficiently conservative to eliminate significant specific cross-hybridization in surface hybridization experiments.
To explore the hybridization potential of suboptimal duplexes, we first performed computational modeling of duplex formation in partially complementary oligonucleotide pairs, using the DNA Software hybridization modeling package based on the work of SantaLucia et al. [11, 12] The interactions of 50mer oligonucleotides containing complementary stretches of nucleotides from 6 nt to 25 nt at different positions in the sequence were modeled, alone and in the presence of a perfect match competitor at different relative concentrations. The predicted behavior of these oligonucleotide pairs indicated that we could expect significant signal from specific, partial cross-hybridization, due to complementary stretches forming as few as twelve consecutive base pairs, and detectable hybridization even due to a complementary 9mer in an otherwise anticomplementary probe-target pair.
We then selected a typical perfect match oligonucleotide duplex pair, with average GC content and Tm relative to a probeset designed for the E. coli genome. We created permutations of the sequence of one of the perfect match partners, leaving a continuous complementary stretch of varying length either in the center of the molecules, or positioned near one end. We synthesized the perfect match partners and a selection of the permuted, partially matched sequences to observe the hybridization behavior of these sequences in solution and on the array.
Here we report the results of hybridization of those permuted oligos to their binding partner, both in solution, and at the microarray surface. The analysis of this permuted oligo pair confirms that a complementary stretch of nucleotides as short as 12 bp may result in the appearance of significant signal from an unintended binding partner, especially in the absence of the intended target. This illusion of specific capture has the potential to give rise to incorrect interpretations of expression data, but it is potentially a problem even in sample concentration applications, where a transcript with relatively little complementarity may be captured and interpreted as if it were part of the intended target, when in fact it is not.