- Technical Note
- Open Access
Thermodynamically optimal whole-genome tiling microarray design and validation
BMC Research Notes volume 9, Article number: 305 (2016)
Microarray is an efficient apparatus to interrogate the whole transcriptome of species. Microarray can be designed according to annotated gene sets, but the resulted microarrays cannot be used to identify novel transcripts and this design method is not applicable to unannotated species. Alternatively, a whole-genome tiling microarray can be designed using only genomic sequences without gene annotations, and it can be used to detect novel RNA transcripts as well as known genes. The difficulty with tiling microarray design lies in the tradeoff between probe-specificity and coverage of the genome. Sequence comparison methods based on BLAST or similar software are commonly employed in microarray design, but they cannot precisely determine the subtle thermodynamic competition between probe targets and partially matched probe nontargets during hybridizations.
Using the whole-genome thermodynamic analysis software PICKY to design tiling microarrays, we can achieve maximum whole-genome coverage allowable under the thermodynamic constraints of each target genome. The resulted tiling microarrays are thermodynamically optimal in the sense that all selected probes share the same melting temperature separation range between their targets and closest nontargets, and no additional probes can be added without violating the specificity of the microarray to the target genome.
This new design method was used to create two whole-genome tiling microarrays for Escherichia coli MG1655 and Agrobacterium tumefaciens C58 and the experiment results validated the design.
Different types of microarray exist, and they all have DNA probes on the microarray surface to hybridize, or capture, targeted sequences in the samples that are poured over them. Microarrays differ on how their probes are designed and what are their intended targets. The most common microarrays are designed to detect gene expressions; their probes are designed according to annotated gene sets and are used to detect individual gene expressions. Although gene expression microarrays have been in use for more than a decade and have produced a large volume of biological data, they are gradually being replaced by next-generation sequencing (NGS) techniques because NGS techniques can detect novel RNA transcripts and provide a better dynamic range of measured gene expression values .
Nevertheless, microarrays are still being used in some other applications. For example, sequence fragment capturing microarrays work by grabbing specific genome fragments or RNA transcripts of interest to researchers, hence enriching the targeted samples; the captured fragments can then be sequenced and analyzed using NGS techniques [2, 3]. In this work, we focus on another application where microarrays are still viable—the whole-genome tiling microarrays. A tiling microarray is designed against a genome, not a gene set, and can be used to detect all transcription activities from either annotated genes or novel transcripts; the latter may include short regulatory RNAs that are the interest of recent studies [4–6].
We have developed a new tiling microarray design method based on the whole-genome thermodynamic analysis software Picky that was previously developed to design traditional microarrays [7–9]. We then used our new method to design two whole-genome tiling microarrays for Escherichia coli (E. coli) MG1655 and Agrobacterium tumefaciens (A. tumefaciens) C58. Picky can analyze a whole genome to identify thermodynamically unique probes. The melting temperatures of each probe candidate with its intended target and with its closest nontargets anywhere in the genome are calculated by Picky according to the thermodynamic nearest-neighbor models of prefect matches , mismatches [11–14], bulges , and dangling-ends . The equations used by Picky are deterministic according to thermodynamic principles, and Picky exhaustively applies these equations to all potential probe-target as well as probe-nontarget hybridizations. Picky design qualities have been quantitatively validated .
The main advantage of Picky over sequence-level comparison software such as BLAST  is its precision. Although BLAST is often used to estimate probe specificity by calculating its identity and match length to nontargets [19, 20], these estimates are less precise than thermodynamics. For example, we screened a previous tiling microarray probe set containing 409,807 probes for a bacterial species using Picky , and found 21,773 (5.3 %) of its probes have the potential to hybridize to nontargets (output W83.picky in Additional file 1). In Table 1, sample data from this probe set shows that probes of the same length (50 bp), the same identity to nontarget [24 bp (base-pair); <50 %] and the same match stretch to nontarget (14 bp; <30 %) can still have a wide estimated melting temperatures range from 28 to 68 °C. Furthermore, the lowest probe-to-target melting temperature at 54.33 °C is much lower than the highest probe-to-nontarget melting temperature at 68.56 °C. Essentially, it is impossible to set a single microarray hybridization temperature that allows all probes to function effectively. This probe set was designed using a sophisticated pipeline involving BLAST to screen for cross-hybridizations and was considered optimal by that standard . Nevertheless, thermodynamic equations are inherently nonlinear, thus a wide range of melting temperatures can still be calculated from the same sequence-level identities.
Our goal in this work is to adapt Picky, which was originally developed to take annotated gene sets as input and design traditional microarrays, for the design of tiling microarrays. We hope to achieve maximum probe coverage of the genome while maintaining the same thermodynamic specificity of Picky designed probes. After the tiling microarrays for E. coli MG1655 and A. tumefaciens C58 were designed using our new method and manufactured, the two bacteria were grown under 10 different treatment conditions to trigger gene expression changes. Subsequently, samples extracted from them were applied to the two tiling microarrays to validate their design quality and also to uncover novel transcripts.
Sample procurement and genome confirmation
The E. coli MG1655 strain was obtained from CGSC E. coli genetics resources at Yale University (CGSC #6300) . The A. tumefaciens C58 strain was obtained from Dr. Kan Wang’s lab at Iowa State University . Bacteria were recovered from the delivery medium and grown under standard conditions (37 °C in Luria–Bertani medium for MG1655 and 28 °C in YEP medium for C58). The QIAGEN DNeasy blood & Tissue kit (#69504) was used to extract total DNA from both bacteria. The Qubit 2.0 Flurometer was used to precisely quantify DNA concentration in the samples and the Experion DNA 12 K Analysis Kit was used to check the DNA quality. The total DNA was eluted in 100 uL buffer and 50 uL of that was sent for sequencing confirmation.
The genomes of the two bacteria MG1655 and C58 were resequenced using the Illumina HiSeq 2000 instrument and de novo assembled using the Velvet software . Minimus2, which is part of the AMOS software package, was used to merge Velvet contigs to form longer scaffolds . BLAT was then used to align merged contigs to the reference genomes . The alignment is important to correctly orient some contigs, find repeated contigs and fill in the gaps among aligned contigs. The reference genomes were used to guide the assembly of the contigs, but not the individual reads. The AT plasmid of C58 was not successfully assembled due to lack of matched contigs, thus the reference sequence was used in subsequent design.
We have found hundreds of single nucleotide polymorphisms between the assembled genomes and the reference genomes, which support our initial concern that the bacteria we obtained might not match the reference genome sequences exactly. These polymorphisms, which are summarized in Table 2, might cause slightly less precise tiling microarray design if left unidentified. The resequencing confirmation step is entirely optional but it helps improve the tiling microarray design quality. The resequencing data can be obtained from NCBI short read archive database with accession numbers SRX806374 and SRX806654 and the assembled new genomic sequences are provided in Additional file 2.
Tiling microarray design
Based on the assembled genome sequences, we designed the two whole-genome tiling microarrays for E. coli MG1655 and A. tumefaciens C58 using PICKY [9, 28]. The design process is summarized as follows. The genome sequences were broken up into 100 bp fragments without overlaps—these were treated as gene targets for probe design to ensure even distribution of the tiling probes. Separately, 50 bp fragments centered on the boundaries between the target fragments (25 bp on either side of a boundary) were extracted and treated as unintended targets for probe design (i.e., fragments to avoid) to ensure that tiling microarray probes will not inadvertently target the boundaries between fragments. PICKY was run using both the targets and unintended fragments as input. The benefit of this approach is that we can take full advantage of the probe specificity calculation offered by PICKY while making it design tiling microarrays with evenly distributed probes. The following parameters specific to tiling microarray design were given to PICKY: maximum match length 18, minimum match length 8, minimum sequence similarity 66 %, and minimum melting temperature difference 5 °C. Any probe candidate with the maximum match to any off-targets are automatically ruled out for further consideration. Probe candidates with the minimum match to any off-targets are thermodynamically screened by extending around the matched region up to the minimum sequence similarity level to estimate its melting temperature with potential cross-hybridization off-targets; probe candidates with less than the minimum melting temperature difference between its target and its closest off-target will not be selected. All other PICKY parameters were taken at their default values, including the screening of both strands of each input sequence to ensure probe specificity in either direction. The minimum and maximum match length parameters and the minimum sequence similarity parameter ensure that a wide range of nontarget matches will be screened thermodynamically by PICKY. The minimum melting temperature difference ensures that only probes unique to the target fragments will be selected.
After running PICKY the first time, it turned out that some target fragments did not have matching probes under the stringent design parameters. To increase the number of useful probes, we ran PICKY again with the following new input. The target fragments were separated into two different sets: one containing fragments without probes as a new target set, and the other containing fragments that had probes designed for them during the first PICKY run. The second set was combined with a modified boundary fragment set to form the new unintended fragment set. The modified boundary fragments were shortened to 40 bp centered on the boundaries between the target fragments. The second PICKY run used the same parameters as in the first run, but it produced additional probes because the shortened boundary fragments allow more borderline probes to be selected.
Microarray probes obtained from both PICKY runs were merged to obtain the final design output. When designing tiling microarray for the C58 bacterium, the pTi plasmid of C58 was also added to the design data set to increase the versatility of the C58 tiling microarray. We have chosen the NimbleGen Custom Microarray Service to manufacture the tiling microarrays. The NimbleGen microarray platform has a synthesis cycle limitation of 148 on custom designed microarray probes . Therefore, 19 MG1655 and 20 C58 probes were removed because they exceeded the limit.
Probes for the exogenous gene hygromycin to E. coli and A. tumefaciens were added: 1000 hygromycin probes were added to the C58 probe set and 2000 hygromycin probes were added to the MG1655 probe set. These probes can be used as quality controls if the hygromycin gene is added to each bacteria sample during the microarray hybridization protocol to help detect any technical bias. The manufacturer also added other control probes to the final probe set for proprietary quality control and microarray image alignment. All probes are synthesized in situ on the NimbleGen microarray surface using the 4 × 72 K microarray layout, meaning that there are four independent microarrays per each NimbleGen glass chip and each microarray contains up to 72 K probes. Final results of the microarray design are summarized in Table 3. Complete microarray design information and experiment data were deposited into the NCBI gene expression omnibus (GEO) database with the Series Access Number GSE61738, which is a super series combining both microarray series for MG1655 and C58.
Although NimbleGen has exited the custom microarray manufacturing business, the tiling microarrays can still be manufactured by other manufacturers given the original microarray design information. Naturally, some array-specific protocols such as labeling and image quantification may need to be modified accordingly if different microarray platforms are used.
Results and discussion
E. coli MG1655 and A. tumefaciens C58 cells were grown under 10 different treatments listed in Table 4 (1 standard and 9 stressed conditions). The significantly varied growth conditions help induce large-scale gene expression changes that ideally should cover most of the transcriptome landscape of the two species. Cells were harvested after treatment at the harvest point given in Table 4.
To control noise and bias, samples were pooled, randomized, blocked and replicated. We treated each microarray as a statistical ‘block’ and randomly placed samples onto microarrays to balance variances from batch processes and positional effects . Two biological replicates were produced for each bacterium under each treatment condition. The biological replicates were prepared by pooling samples according Fig. 1; each biological replicate eventually was made from 4 different cell cultures grown at 2 different days .
Two biological replicates were performed for each species under each treatment condition. The Qiagen RNeasy Mini Kit (#74104) was used to purify total RNA after on-column DNase digestion to remove DNA contaminations (#79254). The total RNA were revere-transcribed to cDNA using Life Technologies Superscript double stranded cDNA synthesis kit (#11917-020) with a random primer set (#48190-011). Residue RNA were then removed using RNase H (NEB #M0297). The cDNA samples were labeled using the NimbleGen One-Color Labeling Kit (#06370411001) and quantified using a Nanodrop ND-1000 spectrophotometer.
Microarray hybridizations were carried out on a NimblemGen Hybridization Workstation 4 (#05223652001) after dissolving labeled cDNA samples in the Hybridization Kit (#05583683001) with appropriate Sample Tracking Controls added (#05223512001). After the manufacturer recommended overnight hybridization (about 16 h), microarrays were washed with the NimbleGen Wash Buffer Kit (#05584507001) and scanned using a GenePix 4100A Microarray Scanner at the maximum resolution of 5 µm for the Cy3 channel.
Scanned microarray images, which contained 4 microarrays on each chip, were processed using the NimbleGen DEVA 1.2.1 software . The DEVA software aligned and anchored the microarray images using special alignment probes on the microarray surface and then split the images into 4 subarrays for the 4 × 72 K NimbleGen layout before quantifying them into individual probe values. Although manual alignments can be performed, we found it unnecessary for all the microarray images processed. The DEVA software also provided automatic RMA normalizations (robust multi-array analysis) across each set of microarray data for MG1655 and C58 to reduce outliers and make data comparisons more meaningful. Two microarray samples were removed from the final data sets due to large bubbles on the microarray surface. In all, 38 microarray samples (18 for MG1655 and 20 for C58) were used for the following validation analyses. The data can be obtained from NCBI GEO database using the Super Series Access Number GSE61738.
Probe level consistency validations
For each strain and treatment condition that has two successful biological replicates, we calculated Spearman’s correlation and concordance correlation coefficient (CCC) between them to validate the consistency of the microarrays under biological replicates. Spearman’s correlation was used to measure the reproducibility of two replicates . Concordance correlation coefficient (CCC) provided a better indicator of the accuracy and precision of agreement between two biological replicates . Biological replicates are more variable than technical replicates because the two biological samples were independently grown, harvested and subjected to microarray protocols, thus correlation between biological replicates can be as low as 30 % .
Scatterplots for each pair of replicates with regression line and correlation coefficients are shown in Fig. 2 for E. coli MG1655. Both statistical tests were performed with 95 % confidential level. The oxidative condition shows the highest correlation values. The cold shock condition has the lowest correlation values, but its Spearman’s correlation value 0.7847 and CCC value 0.7676 still indicate high correlations between the replicates. Therefore, we conclude that the biological replicates of E. coli MG1655 under all treatment conditions are consistent enough to indicate that the MG1655 tiling microarray is reliable and reproducible at the individual probe level.
Scatterplots for each pair of biological replicates with regression line and correlation coefficients for A. tumefaciens C58 are similarly shown in Fig. 3. They show that the two biological replicates under each treatment condition are consistent with each other with a high correlation value at the 95 % confident level. The low pH and heat shock condition produced the highest correlation value of 0.9398. The cold shock and oxidative condition produced a lower correlation value of 0.7393, which is still high enough to conclude the replicates are highly related. Therefore, we also conclude that the C58 tiling microarray is also reliable at the probe level.
Gene level consistency validations
Significant analysis of microarray (SAM) [36, 37] and one-way ANOVA  were conducted to detect differentially expressed genes in E. coli MG1655 and A. tumefaciens C58 under the 10 treatment conditions. As stated earlier, most of the conditions are stress conditions that can promote stress gene responses. The two statistical analyses were performed to validate that the tiling microarrays can detect biologically significant gene expression changes. SAM detects differentially expressed genes across all 10 conditions for each bacterium. One-way ANOVA tests were performed for a few sets of probes targeting some known stress related genes to confirm that their means differ across the 10 conditions for each bacterium, which also identifies the stress responses of cells.
SAM was performed with unpaired two class (control and treatment), delta value of 0.06 and fold change of 2. It found 34 differentially expressed genes in E. coli MG1655, including 22 known stress response genes such as dnaX , entF , groL , and copA [42, 43]. For A. tumefaciens C58, SAM was ran with unpaired two class (control and treatment), delta of 0.065 and fold change of 3 to limit the number of differentially expressed genes reported. It detected 46 differentially expressed genes, including 16 known stress genes such as livJ , dadA , and rpoH . The one-way ANOVA tests were performed with small subsets of known stress response genes at the 95 % confidential level. Stress response genes detected by SAM and/or confirmed by one-way ANOVA test are summarized in Table 5.
After ANOVA tests, multiple pairwise comparison tests (Tukey HSD  and Dunnett’s test ) were conducted at 95 % confident level as post hoc tests to find out which pairs of treatment conditions have distinctive stress gene expression differences. Differentially expressed stress genes tested and detected for each pair of conditions are listed in Table 6 for E. coli MG1655 and Table 7 for A. tumefaciens C58. All differentially expressed stress genes confirmed by ANOVA were also found by the post hoc tests except the ‘fepC’ gene in A. tumefaciens C58. The p value of fepC from ANOVA is 0.00207 but Tukey HSD or Dunnett’s test cannot identify it as differentially expressed in all pairs of conditions. It may be inferred that the means for this gene in all conditions are different from each other and there is no pair of conditions that is significantly different to allow detection by the post hoc tests.
It is worth noting that SAM generally detected more differentially expressed stress response genes than ANOVA can confirm. For example, the heat shock response gene hslO was detected by SAM even though ANVOA was not able to confirm that its means are significantly different among the treatment conditions for E. coli MG1655. More interestingly, many stressful conditions triggered uvrA gene expressions, which is the SOS response gene in E. coli MG1655. For A. tumefaciens C58, stress response genes are induced more by combined treatment conditions. For example, many stress response genes are differentially expressed not just by heat shock but by heat shock and low pH combined. We can conclude that the tiling microarrays detected sensible gene expression changes that conform to our expectation with regard to known cell stress response gene behaviors in the two tested bacteria.
Novel transcript discoveries
One of the stated benefits of a tiling microarray is that it can detect unexpected expressions as well as annotated gene expressions. Indeed, the two tiling microarrays for E. coli MG1655 and A. tumefaciens C58 detected significant numbers of RNA expressions from non-gene-coding regions on the two genomes. For example, our tiling microarray detected all 65 non-coding regulatory RNAs (ncRNAs) annotated in GenBank report U00096.3 for E. coli MG1655. Among the 65 ncRNAs, dicF is computationally predicted to target the hslV gene according to the bacterial small regulatory RNA database (BSRD) . Since our tiling microarray covers both the dicF ncRNA and its target gene, we can calculate the correlation value between them is −0.4819401, which agrees with the predictions. Because the whole genome is monitored, when there are new predictions of such regulatory activities, the correlation values can be extracted from our tiling microarray data without having to design new experiments to validate the predictions. We have seen many such evidences of ncRNA and target gene correlations, but it will require more analysis and maybe some independent validation experiments to report their biological functions. It suffices to say here that the two tiling microarrays do allow novel transcript discoveries as we have anticipated.
In this work we have described the design strategies and validation experiments of two whole-genome tiling microarrays for E. coli MG1655 and A. tumefaciens C58 bacteria. The tiling microarrays are thermodynamically optimal for the two genomes based on the rigorous calculations conducted by the PICKY software . This means that all probes selected have maximum specificity toward their target genome regions and no additional probes can be added to the microarrays without jeopardizing its specificity under the given design constraints. In average, there is a unique microarray probe every 100 bp along the genomes to uniquely detect any transcripts coming from that region. Therefore, transcripts longer than 100 bp are likely to be detected by at least one tiling microarray probe. The 100 bp selection window can be adjusted upward or downward depending on the microarray probe count and user preferences, but we do not expect the selected probes to increase significantly when this window is reduced because most thermodynamically optimal probes, if not all, should have been found by PICKY at the current 100 bp window size given the ~40 bp non-overlapping probe length.
The tiling microarray probes can detect transcripts expressed from both strands of the genomes because most of the common cDNA conversion and labeling protocols automatically produced double-stranded DNAs from original RNA transcripts. Given gene annotation information and bioinformatic techniques such as gene predictions, determining the actual strand of expression is not difficult for most genes . One can also use some other methods such as RT-PCR to confirm the expressing strand for a few difficult transcripts.
We believe tiling microarrays are useful for many gene expression studies, especially for novel and non-model species that have not been annotated yet. Actually, tiling microarrays can help identify novel gene expressions and facilitate the annotation of novel species. Microarrays tend to produce data much faster (in just 2 days), can tolerate a few mismatched bases due to polymorphisms or sequencing errors, tend not to be overwhelmed by excessive bacterial rRNAs as RNA-Seq does, and does not usually require sophisticated computing capacity to interpret the data. The software used in this study and the data produced by the experiments are freely available to other researchers who may wish to design their tiling microarrays.
- E. coli :
- A. tumefaciens :
gene expression omnibus
robust multi-array analysis
concordance correlation coefficient
significant analysis of microarray
bacterial small regulatory RNA database
Zhao S, Fung-Leung W-P, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T Cells. PLoS One. 2014;9:e78644.
Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007;4:903–5.
NimbleGen. http://www.nimblegene.com. Accessed 13 Mar 2013.
Lee K, Huang X, Yang C, Lee D, Ho V, Nobuta K, Fan J-B, Wang K. A genome-wide survey of highly expressed non-coding RNAs and biological validation of selected candidates in Agrobacterium tumefaciens. PLoS One. 2013;8:e70720.
Huang H-Y, Chang H-Y, Chou C-H, Tseng C-P, Ho S-Y, Yang C-D, Ju Y-W, Huang H-D. sRNAMap: genomic maps for small non-coding RNAs, their regulators and their targets in microbial genomes. Nucleic Acids Res. 2009;37:D150–4 (Database issue).
Gottesman S. Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet. 2005;21:399–404.
Seo YS, Sriariyanun M, Wang L, Pfeiff J, Phetsom J, Lin Y, Jung KH, Chou HH, Bogdanove A, Ronald P. A two-genome microarray for the rice pathogens Xanthomonas oryzae pv. oryzae and X. oryzae pv. oryzicola and its use in the discovery of a difference in their regulation of hrp genes. BMC Microbiol. 2008;8:99.
Jung KH, Dardick C, Bartley LE, Cao P, Phetsom J, Canlas P, Seo YS, Shultz M, Ouyang S, Yuan Q, Frank BC, Ly E, Zheng L, Jia Y, Hsia AP, An K, Chou HH, Rocke D, Lee GC, Schnable PS, An G, Buell CR, Ronald PC. Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy. PLoS One. 2008;3:e3337.
Chou HH, Hsia AP, Mooney DL, Schnable PS. Picky: oligo microarray design for large genomes. Bioinformatics. 2004;20:2893–902.
SantaLucia JJ, Allawi HT, Seneviratne PA. Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry. 1996;35:3555–62.
Allawi HT, SantaLucia JJ. Thermodynamics and NMR of internal G*T mismatches in DNA. Biochemistry. 1997;36:10581–94.
Allawi HT, SantaLucia JJ. Thermodynamics of internal C*T mismatches in DNA. Nucleic Acids Res. 1998;26:2694–701.
Allawi HT, SantaLucia JJ. Nearest-neighbor thermodynamics of internal A*C mismatches in DNA: sequence dependence and pH effects. Biochemistry. 1998;37:9435–44.
Allawi HT, SantaLucia JJ. Nearest neighbor thermodynamic parameters for internal G*A mismatches in DNA. Biochemistry. 1998;37:2170–9.
SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA. 1998;95:1460–5.
Bommarito S, Peyret N. John SantaLucia J: Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res. 2000;28:1929–34.
Chou HH, Trisiriroj A, Park S, Hsing YI, Ronald PC, Schnable PS. Direct calibration of PICKY-designed microarrays. BMC Bioinform. 2009;10:347.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.
Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. 2000;28:4552–7.
Liebich J, Schadt CW, Chong SC, He Z, Rhee S-K, Zhou J. Improvement of oligonucleotide probe design criteria for functional gene microarrays in environmental applications. Appl Environ Microbiol. 2006;72:1688–91.
Yu W-H, Høvik H, Olsen I, Chen T. Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays. BMC Mol Biol. 2011;12:3.
Høvik H, Chen T. Dynamic probe selection for studying microbial transcriptome with high-density genomic tiling microarrays. BMC Bioinform. 2010;11:82.
Coli Genetic Stock Center. http://cgsc.biology.yale.edu. Accessed 11 Jul 2010.
Plant Transformation Facility. http://agron-www.agron.iastate.edu/ptf/employee/director.aspx. Accessed 27 Jun 2012.
Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–9.
Amos Consortium. http://amos.sourceforge.net/wiki/index.php/AMOS. Accessed 20 Apr 2014.
Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. The complete genome sequence of Escherichia coli K-12. Science. 1997;277:1453–62.
Chou HH. Shared probe design and existing microarray reanalysis using PICKY. BMC Bioinform. 2010;11:196.
Technical note Roche NimbleGen probe design fundamentals. https://roche-biochem.jp/products/pdf/custom_design/probe_design_2008_06_04.pdf. Accessed 15 Jun 2013.
Hsu JC, Chang J, Wang T, Steingrímsson E, Magnússon MK, Bergsteinsdottir K. Statistically designing microarrays and microarray experiments to enhance sensitivity and specificity. Brief Bioinform. 2007;8:22–31.
Yu X, Lund SP, Scott RA, Greenwald JW, Records AH, Nettleton D, Lindow SE, Gross DC, Beattie GA. Transcriptional responses of Pseudomonas syringae to growth in epiphytic versus apoplastic leaf sites. Proc Natl Acad Sci USA 2013;110:E425–34
Roche NimbleGen|DNA microarray support. http://www.nimblegen.com/support/dna-microarray-support.html. Accessed 17 Aug 2014.
Kim BS, Rha SY, Cho GB, Chung HC. Spearman’s footrule as a measure of cDNA microarray reproducibility. Genomics. 2004;84:441–8.
Miron M, Woody OZ, Marcil A, Murie C, Sladek R, Nadon R. A methodology for global validation of microarray experiments. BMC Bioinform. 2006;7:333.
Churchill GA. Fundamentals of experimental design for cDNA microarrays. Nat Genet. 2002;32(Suppl):490–5.
Zhang S. A comprehensive evaluation of SAM, the SAM R-package and a simple modification to improve its performance. BMC Bioinform. 2007;8:230.
Lin D, Shkedy Z, Burzykowski T, Ion R, Göhlmann HWH, De Bondt A, Perer T, Geerts T, Van den Wyngaert I, Bijnens L. An investigation on performance of significance analysis of microarray (SAM) for the comparisons of several treatments with one control in the presence of small-variance genes. Biom J. 2008;50:801–23.
Hulshizer R, Blalock EM. Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data. BMC Bioinform. 2007;8:240.
Blinkova A, Hermandson MJ, Walker JR. Suppression of temperature-sensitive chromosome replication of an Escherichia coli dnaX(Ts) mutant by reduction of initiation efficiency. J Bacteriol. 2003;185:3583–95.
Kumar R, Shimizu K. Transcriptional regulation of main metabolic pathways of cyoA, cydB, fnr, and fur gene knockout Escherichia coli in C-limited and N-limited aerobic continuous cultures. Microb Cell Fact. 2011;10:3.
King T, Lucchini S, Hinton JCD, Gobius K. Transcriptomic analysis of Escherichia coli O157:H7 and K-12 cultures exposed to inorganic and organic acids in stationary phase reveals acidulant- and strain-specific acid tolerance responses. Appl Environ Microbiol. 2010;76:6514–28.
Rensing C, Grass G. Escherichia coli mechanisms of copper homeostasis in a changing environment. FEMS Microbiol Rev. 2003;27:197–213.
Outten FW, Huffman DL, Hale JA, O’Halloran TV. The independent cue and cus systems confer copper tolerance during aerobic and anaerobic growth in Escherichia coli. J Biol Chem. 2001;276:30670–7.
Garmory HS, Titball RW. ATP-binding cassette transporters are targets for the development of antibacterial vaccines and therapies. Infect Immun. 2004;72:6757–63.
Dua A, Sangwan N, Kaur J, Saxena A, Kohli P, Gupta AK, Lal R. Draft genome sequence of Agrobacterium sp. Strain UHFBA-218 isolated from rhizosphere soil of crown gall-infected cherry rootstock colt. Genome Announc. 2013;1:e00302–13.
Rosen R, Büttner K, Becher D, Nakahigashi K, Yura T, Hecker M, Ron EZ. Heat shock proteome of Agrobacterium tumefaciens: evidence for new control systems. J Bacteriol. 2002;184:1772–8.
Li L, Huang D, Cheung MK, Nong W, Huang Q, Kwan HS. BSRD: a repository for bacterial small regulatory RNA. Nucleic Acids Res. 2013;41:D233–8 (Database issue).
Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR. Applications of DNA tiling arrays for whole-genome analysis. Genomics. 2005;85:1–15.
Neidhardt FC, Bloch PL, Smith DF. Culture medium for enterobacteria. J Bacteriol. 1974;119:736–47.
Yuan Z-C, Liu P, Saenkham P, Kerr K, Nester EW. Transcriptome profiling and functional analysis of Agrobacterium tumefaciens reveals a general conserved response to acidic conditions (pH 5.5) and a complex acid-mediated signaling involved in agrobacterium-plant interactions. J Bacteriol. 2008;190:494–507.
Bhubhanil S, Niamyim P, Sukchawalit R, Mongkolsuk S. Cysteine desulphurase-encoding gene sufS2 is required for the repressor function of RirA and oxidative resistance in Agrobacterium tumefaciens. Microbiology. 2014;160(Pt 1):79–90.
Chuang SE, Blattner FR. Characterization of twenty-six new heat shock genes of Escherichia coli. J Bacteriol. 1993;175:5242–52.
Yamanaka K. Cold shock response in Escherichia coli. J Mol Microbiol Biotechnol. 1999;1:193–202.
Michel B. After 30 years of study, the bacterial SOS response still surprises us. PLoS Biol. 2005;3:e255.
Papp-Szabò E, Sutherland CL, Josephy PD. Superoxide dismutase and the resistance of Escherichia coli to phagocytic killing by human neutrophils. Infect Immun. 1993;61:1442–6.
Strauss B, Kelly K, Ekiert D. Cytochrome oxidase deficiency protects Escherichia coli from cell death but not from filamentation due to thymine deficiency or DNA polymerase inactivation. J Bacteriol. 2005;187:2827–35.
Winkler JD, Garcia C, Olson M, Callaway E, Kao KC. Evolved osmotolerant Escherichia coli mutants frequently exhibit defective N-acetylglucosamine catabolism and point mutations in cell shape-regulating protein MreB. Appl Environ Microbiol. 2014;80:3729–40.
Rosen R, Büttner K, Schmid R, Hecker M, Ron EZ. Stress-induced proteins of Agrobacterium tumefaciens. FEMS Microbiol Ecol. 2001;35:277–85.
Segal G, Ron EZ. The dnaKJ operon of Agrobacterium tumefaciens: transcriptional analysis and evidence for a new heat shock promoter. J Bacteriol. 1995;177:5952–8.
Hungria M, Nicolás MF, Guimarães CT, Jardim SN, Gomes EA, de Vasconcelos ATR. Tolerance to stress and environmental adaptability of Chromobacterium violaceum. Genet Mol Res GMR. 2004;3:102–16.
Kitphati W, Ngok-Ngam P, Suwanmaneerat S, Sukchawalit R, Mongkolsuk S. Agrobacterium tumefaciens fur has important physiological roles in iron and manganese homeostasis, the oxidative stress response, and full virulence. Appl Environ Microbiol. 2007;73:4760–8.
HJC prepared, sequenced and assembled the two bacteria genomes and used that information to design the tiling microarrays. HJC also performed all experiments including bacteria maintenance, treatment, sample collection, purification, microarray hybridization, image quantification and statistical analysis. HHC conceived of this work, developed the Picky software for microarray design, and drafted the manuscript. All authors read and approved the final manuscript.
The authors would like to thank Dr. Kan Wang and Dr. Gwyn A. Beattie for their valuable advices to this work. We also thank Dr. Kan Wang and Dr. Keunsub Lee for providing the A. tumefaciens C58 strain and teaching us how to maintain this bacteria. We thank our colleagues Taryn Anderson and Matthew Hunt for their help in bacteria maintenance and RNA extraction, and Jennifer Chang for discussion about statistical methods.
Availability of data and material
The sequencing datasets supporting the conclusions of this article are available in the (NCBI Sequence Read Archive) repository, (Accession numbers are SRX806374 and SRX806654 and hyperlink to datasets in http://www.ncbi.nlm.nih.gov/sra/SRX806374[accn] and http://www.ncbi.nlm.nih.gov/sra/SRX806654[accn]).
The datasets supporting the conclusions of this article are included within the article and its additional files.
The authors declare that they have no competing interests.
This work is supported by the National Science Foundation grant DBI-0850195 to HHC, and the publication of this manuscript is supported by the Iowa State University Plant Sciences Institute Scholar Award to HHC. Neither funding agency plays any role in the design, collection, analysis and interpretation of the data, in the writing of the manuscript, and in the decision to submit the manuscript for publication.
Additional file 1. Picky analysis results for a previously designed tiling microarray. The tiling microarray created by Yu, et al.  was designed using a pipeline  that depends on BLAST for sequence-level probe specificity screening. This compressed ZIP archive contains the summary (W83.report) and detail analysis (W83.picky) produced by Picky for this microarray.
Additional file 2. Assembled genome sequences for E. coli MG1655 and Agrobacterium tumefaciens C58. Two FASTA files containing the assembled genomic sequences are provided in this compressed ZIP archive.
About this article
Cite this article
Cho, H., Chou, HH. Thermodynamically optimal whole-genome tiling microarray design and validation. BMC Res Notes 9, 305 (2016). https://doi.org/10.1186/s13104-016-2113-4
- Tiling microarray design
- Prokaryote transcriptome
- PICKY software
- Microarray validation