Identification of suitable endogenous control genes for microRNA expression profiling of childhood medulloblastoma and human neural stem cells

Background Medulloblastoma (MB) is the most common type of malignant childhood brain tumour. Although deregulated microRNA (miRNA) expression has been linked to MB pathogenesis, the selection of appropriate candidate endogenous control (EC) reference genes for MB miRNA expression profiling studies has not been systematically addressed. In this study we utilised reverse transcriptase quantitative PCR (RT-qPCR) to identify the most appropriate EC reference genes for the accurate normalisation of miRNA expression data in primary human MB specimens and neural stem cells. Results Expression profiling of 662 miRNAs and six small nuclear/ nucleolar RNAs in primary human MB specimens, two CD133+ neural stem cell (NSC) populations and two CD133- neural progenitor cell (NPC) populations was performed using TaqMan low-density array (TLDA) cards. Minimal intra-card variability for candidate EC reference gene replicates was observed, however significant inter-card variability was identified between replicates present on both TLDA cards A and B. A panel of 18 potentially suitable EC reference genes was identified for the normalisation of miRNA expression on TLDA cards. These candidates were not significantly differentially expressed between CD133+ NSCs/ CD133- NPCs and primary MB specimens. Of the six sn/snoRNA EC reference genes recommended by the manufacturer, only RNU44 was uniformly expressed between primary MB specimens and CD133+ NSC/CD133- NPC populations (P = 0.709; FC = 1.02). The suitability of candidate EC reference genes was assessed using geNorm and NormFinder software, with hsa-miR-301a and hsa-miR-339-5p found to be the most uniformly expressed EC reference genes on TLDA card A and hsa-miR-425* and RNU24 for TLDA card B. Conclusions A panel of 18 potential EC reference genes that were not significantly differentially expressed between CD133+ NSCs/ CD133- NPCs and primary human MB specimens was identified. The top ranked EC reference genes described here should be validated in a larger cohort of specimens to verify their utility as controls for the normalisation of RT-qPCR data generated in MB miRNA expression studies. Importantly, inter-card variability observed between replicates of certain candidate EC reference genes has major implications for the accurate normalisation of miRNA expression data obtained using the miRNA TLDA platform.


Background
Medulloblastoma (MB) is the most common malignant paediatric brain tumour, and a major cause of childhood cancer related morbidity and mortality [1]. Several molecular subtypes of MB have been identified on the basis of specific gene expression signatures [2][3][4][5][6], suggesting that different sub-groups may arise from distinct cells of origin. In human MB, a subpopulation of CD133expressing cells was identified that displayed similar properties to NSCs, including self-renewal and multipotency [7,8]. Additional studies demonstrated that these putative brain tumour stem cells (BTSCs) were capable of initiating tumour formation in immunodeficient mice [9]. These human data combined with evidence from two different murine MB models [10,11] strongly implicate CD133+ NSCs as a cell of origin for a subset of MB.
MicroRNAs (miRNAs) are a class of short, non-coding RNAs that down-regulate gene expression at a posttranscriptional level [12,13]. Functioning as guide molecules for silencing complexes, miRNAs utilise anti-sense complementarity to inhibit the expression of specific messenger RNA (mRNA) targets by either repressing translation and/or inducing deadenylation and subsequent mRNA degradation [13][14][15][16]. miRNAs play key regulatory roles in various cellular processes including cell proliferation [17], apoptosis [18] and differentiation [19]. Numerous studies have demonstrated significantly altered miRNA expression patterns in a wide range of human cancer types compared to normal tissues, suggesting that specific miRNAs might act as tumour suppressor genes or oncogenes (Reviewed in [20] and [21]). In addition to their potential as novel molecules for cancer therapy [22], miRNAs represent an emerging class of diagnostic and prognostic markers [23][24][25][26].
Adaptations of existing technologies for gene expression profiling including RT-qPCR, chip-based microarrays, and next generation sequencing have enabled the highthroughput profiling of miRNA expression [23,[27][28][29][30][31]. The accuracy of these methods is dependent upon correcting for non-biological sample-to-sample variation that could be introduced during the steps from sample preparation to amplification [23]. For miRNA RT-qPCR expression data, several methods have been described to correct for this variation, the most frequent of which is the normalisation to endogenous control (EC) reference genes [32]. An ideal reference gene should be highly expressed, exhibit minimal expression level variation in cells or tissues under investigation, and be of similar length to that of the target gene [33]. Additionally, extraction and quantification efficiency and storage stability of an ideal reference gene should be equivalent to the gene under investigation [33]. However, previous studies have demonstrated that a single universal EC reference gene with these properties for all cell or tissue types is unlikely to exist [34][35][36][37]. Normalisation of RT-qPCR data to unreliable reference genes may lead to incorrect quantification of miRNAs of interest [33,38], and the importance of validating suitable candidate EC reference genes in a cell and/or tissue-specific context has been demonstrated previously [33]. Although several research groups have performed miRNA expression profiling studies in primary MB specimens, the validation strategies for the selection of EC genes for normalisation of miRNA gene expression data were not reported [39][40][41][42][43][44][45]. Additionally, none of these studies profiled CD133+ NSCs and/or CD133-NPCs, instead normalising miRNA expression to EC gene expression levels in human adult and fetal cerebellum. This study describes a robust strategy for the identification of suitable EC genes for the normalisation of miRNA RT-qPCR data in MB profiling studies. Using the main selection criteria of good measurability and uniform expression, candidate EC reference genes for miRNA data normalisation were investigated in nine primary MB specimens, and two populations of CD133+ NSCs and CD133-NPCs. The uniformity of expression of these candidate EC genes was subsequently investigated across all samples, with the relative quantities of miRNAs, hsa-miR-144*, hsa-miR-21* and hsa-miR-923 assessed using five different normalisers, to determine the impact of EC reference gene selection on relative expression of individual miRNAs of interest.

Results
Inter-card variation for candidate small nuclear (sn) and small nucleolar (snoRNA) EC reference genes on the TLDA platform miRNA profiling using the TLDA system consists of two cards, A and B. Card A contains three proposed EC references genes, MammU6, RNU44 and RNU48, and card B contains six, MammU6, RNU44, RNU48, RNU24, RNU43 and RNU6B. To evaluate whether there was inter-plate variation between the sn/snoRNA candidate EC reference genes on TLDA card A and card B, Cq values for technical replicates of three candidate EC reference genes, MammU6, RNU48 and RNU44, present on separate TLDA cards were plotted and one way analysis of variance (ANOVA) was applied to assess interplate variation between the technical replicates ( Figure 1). Significant variation was observed between the technical replicates of MammU6 (P = 0.025) and RNU44 (P = 0.0030) on TLDA cards A and B ( Figure 2). In contrast, Cq values for the technical replicates of RNU48 remained relatively stable (P = 0.97). The observed differences between EC reference gene replicate Cq values on TLDA cards A and B suggests that the normalisation of miRNAs is best performed to EC reference genes on the same TLDA card. Based on these results, we assessed the suitability of EC reference genes separately for TLDA cards A and B. As minimal variation was observed for technical replicates of candidate EC reference genes on the same TLDA card, an average of the technical replicates from each TLDA card was employed for further analysis. Because this approach limited the maximum number of proposed sn/snoRNA candidate EC reference genes for each TLDA card, particularly for TLDA card A, we expanded the panel of potential candidate EC reference genes under investigation to include miRNAs that fulfil the appropriate selection criteria outlined in methods.
Expression of candidate EC reference genes in primary MB specimens, CD133+ NSCs and CD133-NPCs Suitable candidate EC reference genes must display consistent expression between normal and treatment and/or disease groups. Two sample t-tests were employed to assess whether candidate EC reference genes were differentially expressed between primary MB specimens and CD133+ NSC/ CD133-NPC populations. For TLDA card A, 12 genes were identified as potentially suitable EC reference genes (Table 1). Of the proposed sn/snoRNA candidate EC reference genes, only RNU44 was uniformly expressed across primary MB specimens and CD133+ NSC/CD133-NPC populations (P = 0.71; fold change (FC) = 1.02) ( Table 1). In contrast, MammU6 was overexpressed in primary MB specimens compared to NSC/ NPCs (P = 0.049; FC = 3.66), and while the over-expression of RNU48 did not reach statistical significance (P = 0.35), the degree of over-expression observed in primary MB specimens relative to NSCs/NPCs (FC = 2.35) suggested RNU48 was not an ideal EC reference in this context. In addition to RNU44, a number of candidate miRNAs were uniformly expressed in NSCs and NPCs and primary MB specimens (P ≥ 0.05; FC ≤ ±1.5), highlighting their potential as EC reference genes in this study.
Expression analysis identified six candidate EC reference genes of TLDA card B suitable for data normalisation (Table 1). Of the manufacturer-recommended   estimate of their uniformity of expression across CD133 + NSC/ CD133-NPCs and primary MB specimens, Cq values of each candidate EC reference gene were plotted and the coefficient of variation (CV) for each candidate EC reference gene calculated ( Figure 3). The mean, CV and range of Cq values for all candidate EC reference genes are shown in Table 2

Uniformity of candidate reference gene expression in primary MB specimens, CD133+ NSCs and CD133-NPCs
Candidate EC reference genes displaying similar expression levels in CD133+ NSC/ CD133-NPCs and primary MB specimens were further assessed for uniformity of expression across all samples using geNorm [36] and Norm-Finder [46] software. RNU48 was also included in this analysis, as although differential expression of this snoRNA, as measured by FC, was observed between primary MB specimens and CD133+ NSCs/ CD133-NPCs, this did not reach statistical significance. The chromosomal location of candidate EC reference genes was assessed to rule out any co-regulation of clustered miRNA genes transcribed as a single, polycistronic transcript (Table 1), as this would lead to an erroneous choice of an optimum EC gene reference pair using geNorm and NormFinder software. The ranking of the candidate EC reference genes determined by these programs is summarised in Table 3, and consistent results were obtained for both TLDA card A and card B. Normfinder and geNorm identified hsa-miR-301a and hsa-miR-339-5p as the two most uniformly expressed EC reference genes on TLDA card A, closely followed by hsa-miR-210 and RNU48. Normfinder identified the geometric mean of hsa-miR-339-5p and hsa-miR-301a as the most stable pair of EC reference genes, with a lower combined stability value of M = 0.116 when compared to the stability values of the EC reference genes alone (Table 3). This suggests that the combination of hsa-miR-339-5p and hsa-miR-301a should be used for data normalisation in preference to the single EC reference genes. Of the candidate EC reference genes on TLDA card B, hsa-miR-425* and RNU24 were the two most uniformly expressed EC reference genes, followed by RNU48 and hsa-miR-877 (Table 3). The combination of hsa-miR-425* and RNU24 displayed a much lower stability value (M = 0.078) than either of the candidate EC reference genes alone, indicating that the geometric mean of these EC reference genes is the most suitable for miRNA expression normalisation of TLDA card B. Although both programs listed RNU48 in the top four most uniformly expressed candidate EC reference genes of TLDA card A and card B, it was not considered ideal due to its differential expression between primary MB specimens and CD133+ NSCs/CD133-NPCs.   Figure 3 Variation in expression of all candidate EC reference genes of (TLDA) card A and B identified as uniformly expressed between primary medulloblastoma (MB) specimens and CD133+ NSCs/ CD133-NPCs. Quantification cycle (Cq) values for candidate EC reference genes were plotted for each sample. Cq values were also plotted for RNU48, as although differentially expressed, it did not reach statistical significance.

Impact of EC reference genes on the relative quantification of individual miRNAs
To demonstrate the impact of selected EC reference genes on the results, we measured the expression of miR-NAs relative to several candidate EC reference genes identified in this investigation. Specific candidate snoRNA EC reference genes on TLDA card B were highly differentially expressed in primary MB specimens relative to CD133+ NSCs and CD133-NPCs, and therefore, as proof of concept, the normalisation of individual miRNAs focused upon TLDA card B miRNAs and candidate EC reference genes. In primary MB specimens, the over-expression of three TLDA card B miRNAs, hsa-miR-144*, hsa-miR-21* and hsa-miR-923, was previously described [47]. For this analysis, normalisation of these three miRNAs was performed using top ranked candidate EC reference genes identified by geNorm and Normfinder, including hsa-miR-425*, RNU24, hsa-miR-877, and the EC reference pair, RNU24/ hsa-miR-425*. In addition, target miRNA expression was also established relative to the significantly over-expressed (~500 fold) candidate EC reference gene, RNU43. Normalisation using uniformly expressed candidate EC reference genes did not influence the directionality of differential expression of miRNAs, with consistent over-expression of each target miRNA observed in primary MB specimens relative to CD133 + NSCs/CD133-NPCs ( Figure 4). In contrast, down-regulation of all three miRNAs was observed when RNU43 was utilised for data normalisation, due to the significant over-expression of RNU43 in primary MB specimens relative to target miRNA expression. Taken together, these results emphasise that establishing target miRNA expression levels for a particular tissue and/or disease state is strongly influenced by the candidate EC reference gene chosen for data normalisation.

Discussion
The importance of validating suitable candidate EC reference genes in a cell and/or tissue-specific context is well documented, and a single universal reference gene for all tissue types is unlikely to exist [33][34][35][36][37]. Indeed, in the context of cancer, a disease in which the cell of origin is unknown or inadequately characterised in many cases, the selection of appropriate EC reference genes is particularly difficult. Although the cells of origin of the  at least four molecular sub-types of human MB have not been conclusively identified, the available murine data suggest that SHH-dependent MB may have multiple cells of origin including cerebellar granule precursor cells, NSCs [48,49], and cochlear nuclei of the lower rhombic lip [50]. Other mouse models suggest that Group 3 MB arise from CD133+ NSCs [11], and WNTdriven tumours originate from progenitor cells of the dorsal brainstem [51]. The cell of origin of Group 4 tumors has not yet been determined. On this basis, the normalisation of gene expression levels in primary MB specimens to levels in whole foetal or adult cerebellum, which represent heterogeneous tissues at different developmental stages, although generally accepted, is likely to be sub-optimal. Similarly, the normalisation of gene expression levels in primary human MB to those in ESC-derived NSCs as described here, may be more appropriate for the analysis of specific MB subtypes (e.g., SHH or Group 3 MB), or stem cell regulatory pathways in MB pathogenesis. The assessment of miRNA expression levels in a larger cohort of primary MB specimens relative to NSCs, and an improved understanding of MB cells of origin will be necessary to address these issues. In the meantime, the data reported here represent an important complementary resource for future MB miRNA profiling studies.
Normalisation based on predefined invariant EC genes such as sn/snoRNAs is a commonly utilised approach in miRNA RT-qPCR profiling analysis [52]. However, in the context of this study, the majority of pre-defined small nuclear and small nucleolar RNAs were not suitable as EC reference genes for miRNA data normalisation. Small non-coding RNAs other than miRNAs do not mirror the physiochemical properties of miRNA molecules, suggesting that normalisation of miRNA expression data should be performed with reference genes belonging to the same RNA class [33,36]. The selection of invariant miRNAs identified by algorithms specifically tailored to reference gene ranking by stepwise elimination of the least stable gene such as geNorm [36], or through statistical linear mixed-effects modelling such as Normfinder [46] have been previously identified as superior to sn/snoRNAbased normalisation [33,53]. Indeed, we identified various miRNAs that were more uniformly expressed across all samples and therefore represent more suitable EC reference genes than the sn/snoRNAs proposed by the manufacturer, a finding that was consistent with a number of previous studies [38,[54][55][56][57]. Thus, our data provide further evidence to suggest that miRNAs, in comparison to other classes of small non-coding RNAs, may be more suitable EC reference genes for the normalisation of miRNA expression data, providing their expression meets the general consensus of moderate abundance and consistency across all samples.
The overall uniformity of expression is a major determinant for an ideal EC reference gene [33]. Our initial statistical analyses identified a number of candidate EC reference genes uniformly expressed across experimental groups, with both NormFinder and geNorm identifying hsa-miR-339-5p and hsa-miR-301a as the most stable EC reference genes for TLDA card A, and hsa-miR-425* and RNU24 as the most stable for TLDA card B. The practical consequences of miRNA normalisation were then evaluated using TLDA card B candidate EC reference genes and miRNAs as a case study. As evident from our results, inappropriate use of EC reference genes can significantly impact upon target miRNA quantitation. With the use of suitable EC reference genes, including hsa-miR-425*, RNU24, hsa-miR-877 and EC reference pair, RNU24/hsa-miR-425*, over-expression of all three miRNAs (hsa-miR-21*, hsa-miR-144* and hsa-miR-923) was identified, as previously reported [47]. However, when an inappropriate EC reference gene, RNU43, was used for expression data normalisation, the down-regulation of all three miRNAs was observed. Several previously published studies in a range of other tissue types have reported similar misleading results when inappropriate EC reference genes were used [33,38,54,55,58], highlighting the importance of selecting appropriate and validated EC reference genes for miRNA Figure 4 Quantitative differences in miRNA expression in medulloblastoma (MB) normalising to different EC reference genes. The mean fold changes of hsa-miR-923, hsa-miR-144* and hsa-miR-21* expression in nine primary MB specimens in comparison to the mean expression of these miRNAs in CD133+ NSCs and CD133-NPCs. Relative expression was determined using the 2 -ΔCq method and log 2 transformed. Normalisation was carried out either to hsa-miR-425* (dark blue), RNU43 (red), hsa-miR-877 (green), RNU24 (purple) or geometric mean of hsa-miR-425* and RNU24 (light blue). The fold changes of all three miRNAs normalised against RNU43 was different from the fold changes obtained following the normalisation against hsa-miR-425*, hsa-miR-877, RNU24 and the geometric mean of hsa-miR-425* and RNU24. expression data normalisation. Although several previous MB miRNA profiling studies utilised RNU6B and RNU66 as an EC reference gene pair for normalisation, the validation of these reference genes was not reported [39,41,45]. It is important to note that these previous studies normalised miRNA expression to EC reference levels in human normal adult and foetal cerebellum, rather than CD133+ NSCs/CD133-NPCs profiled in this investigation. The findings obtained in this study indicated that RNU6B was significantly differentially expressed in primary MB specimens relative to CD133+ NSCs/CD133-NPCs, and therefore was not a suitable EC reference gene for normalisation. Unfortunately, the suitability of RNU66 could not be assessed in this investigation, as the RNU66 miRNA assay was not included in the V2.0 TLDA cards. An additional MB miRNA profiling study utilised RNU48 as a single reference gene for normalisation of miRNA expression data [44]. Differential expression of RNU48 was observed in this study, however this did not reach statistical significance and again, a direct comparison was not possible due to the different normal control tissues being profiled. Combined, these findings reiterate the importance of validating suitable candidate EC reference genes in the relevant cell and/or tissue under investigation.
The performance of miRNA profiling by the highthroughput TLDA miRNA expression profiling system has been evaluated in several studies, with high reproducibility observed for technical replicates of miRNA assays located on the same TLDA card [53,[59][60][61]. While we also observed minimal intra-card variability for all candidate EC reference genes, significant inter-card variability was apparent between replicates of MammU6 and RNU44 on TLDA cards A and B. Similar findings were obtained in a previous study, where differential expression of MammU6 and RNU44, but not RNU48, was observed between the same samples profiled on TLDA cards A and B [61]. For the TLDA system, two separate Megaplex primer pools are required for reverse transcription (RT) of RNA and pre-amplification of cDNA prior to application to cards A and B. We propose that the inter-card variability for replicates of MammU6 and RNU44 was perhaps a direct consequence of differential RT and pre-amplification efficiencies of these sn/snoR-NAs associated with the separate reactions. Previously, the variable expression of a subset of miRNAs was largely attributed to the altered efficiencies of the Megaplex RT and pre-amplification reactions [53]. Whilst a systematic evaluation of the inter-card reproducibility of the miRNA TLDA platform has not been reported, the findings obtained in this study have important implications for the normalisation of miRNA expression data obtained using this system. Future studies should carefully assess the potential for inter-card/plate variability of the associated platform prior to the assessment of candidate EC reference genes for data normalisation.

Conclusion
A panel of 18 potential EC reference genes that were not significantly differentially expressed between CD133+ NSCs/ CD133-NPCs and primary human MB specimens was identified. Based on our findings, EC reference gene pairs hsa-miR-301a and hsa-miR-339-5p and hsa-miR-425* and RNU24 are recommended for the normalisation of miRNAs on TLDA card A and card B, respectively, in primary MB specimens relative to ESC-derived NSCs/ NPCs. The top ranked EC reference genes described here should be validated in a larger cohort of specimens to verify their utility as controls for the normalisation of RT-qPCR data generated in MB miRNA expression studies. More broadly, inter-card variability observed between replicates of certain candidate EC reference genes has major implications for the accurate normalisation of miRNA expression data obtained using the miRNA TLDA platform.

Patient samples
Nine MB samples were collected from children treated at Princess Margaret Hospital (PMH) in Perth, Western Australia. Tumour tissue was embedded in optimal cutting temperature compound (OCT) and snap-frozen. The age of patients, gender distribution and molecular sub-type of each primary MB specimen have been previously described (Table 4) [47]. Written approval to undertake this study was obtained from the PMH human ethics committee. Written consent to use tumour material for research purposes was obtained from the parents of patients according to PMH ethics committee guidelines. All tumour material was deidentified to ensure patient anonymity.

Neurosphere maintenance and flow cytometry
Human NSCs propagated as neurospheres were derived from human embryonic stem cell (ESC) lines hES3 (WiCell Research Institute, Madison, WI, USA) and MEL1 (StemCore, Melbourne, Australia) using protocols described previously [62,63]. Dissociation of neurospheres and isolation of CD133+ NSCs by flow cytometry was performed as described previously [64]. Enrichment of CD133-NPCs was 100% for both hES3 and MEL1 ESC lines, with the enrichment of CD133+ NSCs being 81.1% and 97% for the hES3 and MEL1 ESC lines, respectively.

Small RNA isolation and enrichment
RNA enriched for small RNAs was isolated from primary specimens, cell lines and NSC/NPCs using the miRNeasy mini kit (Qiagen, Melbourne Australia), as described previously [47]. RNA quantity and purity was estimated by the ratio of absorbance at 260 nm to that at 280 nm (OD 260 :OD 280 ), with ratios of between 1.8 and 2.0 being considered optimal.

Statistical analysis
Statistics used for the filtering of candidate EC reference genes from TLDA cards were calculated in R statistical environment version 2.13.0 [65] and candidates were filtered according to three criteria: (1) candidate EC reference genes must be moderately to highly expressed across all samples, defined as a mean Cq of ≤ 25; (2) candidate EC reference genes must be consistently expressed between the combined CD133+ NSCs and CD133-NPCs, and primary MB specimens, with significantly differentially expressed miRNAs identified using a two sample t-test (P < 0.05) and a absolute differential expression fold change (FC) value of 1.5 ( |FC| ≥ 1.5); (3) candidate EC reference genes must be uniformly expressed across all samples, with a coefficient of variation (CV) < 10. The coefficient of variation (CV) is the ratio of the standard deviation to the mean expressed as a percentage and is a measure of expression variability of candidate EC reference genes across all samples. Candidate EC reference genes that met these criteria (see Table 2) were deemed suitable for subsequent analysis using the geNorm [36] and NormFinder [46] software. The geNorm algorithm calculates the gene expression stability measure (M) for a candidate EC reference gene based upon the average pairwise variation (V) for that gene against all other tested candidate EC reference genes [36]. NormFinder, a Microsoft Excel add-in, is based upon an analysis of variance (ANOVA) model which estimates an intra-and inter-group variation to provide a stability value for each candidate EC reference gene [46]. Normfinder provides the single most stable reference gene, in addition to an EC reference gene pair that has a stability value less than that of the single EC reference gene. Prior to NormFinder analyses, Cq values were converted to relative expression values using the 2 -Cq method [66]. In both programs, lower values indicate increased stability of EC reference genes, and therefore allow for the ranking of genes on this basis. Cq values of the target TLDA card B miRNAs hsa-miR-923, hsa-miR-144* and hsa-miR-21* were further normalised to each selected EC reference gene and relative expression was determined using the 2 -ΔCq method, where ΔCq = (Cq miR -Cq endogenous control gene ). To identify differential expression of individual miRNAs, the normalised means [log 2 (2 -ΔCq )] of expression levels for each miRNA in primary MB specimens were compared to the normalised means (log 2 (2 -ΔCq ) of expression for that miRNA in CD133+ NSCs and CD133-NPCs. For CD133+ NSCs and CD133-NPCs, normalised means (log 2 (2 -ΔCq ) of expression levels were obtained by averaging the expression of individual miRNAs from both hES3 and Mel1, therefore representing a pool of two ESC cell lines.