Computational analysis of expression of human embryonic stem cell-associated signatures in tumors

Background The cancer stem cell model has been proposed based on the linkage between human embryonic stem cells and human cancer cells. However, the evidences supporting the cancer stem cell model remain to be collected. In this study, we extensively examined the expression of human embryonic stem cell-associated signatures including core genes, transcription factors, pathways and microRNAs in various cancers using the computational biology approach. Results We used the class comparison analysis and survival analysis algorithms to identify differentially expressed genes and their associated transcription factors, pathways and microRNAs among normal vs. tumor or good prognosis vs. poor prognosis phenotypes classes based on numerous human cancer gene expression data. We found that most of the human embryonic stem cell- associated signatures were frequently identified in the analysis, suggesting a strong linkage between human embryonic stem cells and cancer cells. Conclusions The present study revealed the close linkage between the human embryonic stem cell associated gene expression profiles and cancer-associated gene expression profiles, and therefore offered an indirect support for the cancer stem cell theory. However, many interest issues remain to be addressed further.


Background
The development of human embryonic stem cell (hESC) is controlled by specific signatures, including specific transcription factors (TFs), pathways, microRNAs (miR-NAs) and core genes. These signatures determine the self-renewal or differentiation fate of hESCs. Cancer is one of the developmental diseases. The initiation, proliferation and metastasis of cancer are often associated with the abnormalities of developmental signatures. Like hESCs, cancer cells are endowed with the ability to selfrenew and proliferate indefinitely.
Based on accumulated evidence linking cancer cells to hESCs, some researchers proposed cancer stem cell (CSC) hypothesis [1]. A CSC is defined as "a cell within a tumor that possesses the capacity to self-renew and to cause the heterogeneous lineages of cancer cells that comprise the tumor [2] ". This hypothesis suggests that a small percentage of hESC-like CSCs are responsible for initiating and replenishing the tumor, and the dormant CSCs may account for cancer metastasis, chemoresistance and recurrence so that they become potential targets for improved cancer therapies. One type of evidence supporting the CSC model is the identification of surface markers of cancer-initiating cells (CICs; also known as cancer stem cells) in various human tumor types. Dick et al reported that only a subset of cells were able to transplant AML into recipient mice [3,4]. These tumorigenic cells were defined as CD34 + CD38 -, indicating a presence of CD34 proteins and a lack of CD38 proteins on their surface [5]. Dirks et al successfully isolated CSCs (CD133+ cells) from different phenotypes of brain tumors [6,7]. The CSCs were also identified in a list of the other tumor types including breast tumors [8], melanoma [9], ovarian cancer [10,11], prostate cancer [12], pancreatic cancer [13,14], sarcoma [15] and colon cancer [16,17]. Although the CSC theory is supported by some experimental evidences, much contention exists over whether these evidences are sufficiently valid or merely are some artifacts [18][19][20][21].
Some other types of evidence seems to lend support to the CSC theory, although they are not direct or absolutely convincing. For example, hESCs share cellular and molecular phenotypes with tumor cells and cancer cell lines [22]. Human induced pluripotent stem cells (HiPSCs) were first derived with four transcription factors: OCT4, SOX2, MYC and KLF4 [23] or OCT4, SOX2, NANOG, and LIN28 [24]. All these transcription factors have been reported to be highly expressed in various types of cancer [25][26][27][28][29]. Furthermore, silencing of tumor suppressor gene p53 significantly increased the reprogramming efficiency of human somatic cells [30]. Activation of telomerase is in part responsible for long lifespan of stem cells as well as anti-apoptosis of cancer cells [13,[31][32][33][34]. Cell cycle regulation plays a critical role in both stem cells and cancer cells [35][36][37][38][39].
The linkage between hESC-specific gene expression profiles and cancer-specific gene expression profiles may provide evidence in support of the CSC model. To this end, many studies have identified hESC-associated gene expression signatures (hESCGESs) [40][41][42][43][44], and several studies have examined the expression of hESCGESs in human cancer [45][46][47][48][49]. In [45], the authors provided first clinical evidence for the implication of a "glioma stem cell" or "self-renewal" phenotype in treatment resistance of glioblastoma. In [46], the authors found the hESCGESs that distinguished primary from metastatic human germ cell tumors. In [47], the authors identified a subset of hESC-associated transcription regulators that were highly expressed in poorly differentiated tumors. In [48], the authors revealed that an increased expression of some hESCGESs identified poorly differentiated lung adenocarcinoma. In [49], the authors compared the expression of pluripotency factors OCT4, SOX2, KLF4 and MYC in 40 human tumor types to that of their normal tissue counterparts using publicly available gene expression data, and found significant overexpression of at least one out of them in 18 out of the 40 cancer types investigated. Furthermore, they found that these genes were associated with tumor progression or bad prognosis. All together, these studies revealed that "stemness" gene expression signatures were associated with tumor malignancies, and therefore might be informative molecular predictors of cancer therapy outcome [50].
In this study, we investigated the linkage between hESCGESs and tumor malignancies by an extensive examination of the expression of hESCGESs in various human tumor types. We used 51 publicly available gene expression datasets, which involve 23 human tumor types [51].

Identification of human stem cell-associated gene expression signatures
The self-renewal and differentiation of hESCs are controlled by hESC-specific signal molecules in a signalingspecific manner. Through a substantial survey of related literatures, we collected four types of hESCGESs: genes, pathways, TFs and miRNAs.
We collected 24 hESC-associated gene sets which were classified into five groups (Table 1 and Additional  file 1, Table S1).
A number of developmental signal pathways, such as Wnt, Notch, Hedgehog and Bmi-1, are necessary for regulation of stem cell self-renewal and differentiation. We identified 54 signal pathways as the hESC-associated pathway signatures (Table 2).
We identified 189 key TFs involved in regulation of hESC self-renewal and differentiation including three core TFs OCT4, SOX2 and NANOG with essential roles in the transcriptional control of the regulatory circuitry underlying pluripotency [43,52]. Table 2 lists 30 "critical" TFs. The complete TF list is presented in Additional file 2, Table S2.
Recent research indicates that miRNAs have an important role in regulating stem cell self-renewal and differentiation [53]. We identified 114 hESC-associated miRNAs. Table 2 lists one part of them. The complete miRNA list is presented in Additional file 3, Table S3.

Identification of tumor-associated gene expression signatures
We identified differentially expressed genes among normal vs. tumor or good prognosis vs. poor prognosis phenotypes classes using univariate F-test for unpaired samples or t-test for paired samples at 0.05 significance level. This procedure was implemented with the class comparison between groups of arrays tool in BRB-Array-Tools, an integrated package developed by Simon et al for the visualization and statistical analysis of DNA microarray gene expression data [54]. The software can be freely downloaded from the website: http://linus.nci. nih.gov/BRB-ArrayTools.html.
We identified important pathways, TFs and miRNAs by analyzing gene sets for differential expression among predefined classes. The pre-defined phenotypes classes in the class comparison algorithm involved two types: normal vs. tumor and good prognosis vs. poor prognosis. The latter is concerned with tumor subtypes which exhibit different clinical outcome such as metastasis or not, relapse or disease free, drug or radio therapy sensitive or resistance etc., and different tumor progression grades. The LS or KS permutation test and Efron-Tibshirani's GSA maxmean test were used to determine the significant gene sets at 0.05 significance level. The pathways (BioCarta) related to the significant gene sets were identified. The TFs were identified by the gene sets, in each of which all genes were experimentally verified to be targets of the same transcription factor. Each miRNA potentially targeting all the genes in one of the gene sets was identified. The identification of important pathways, TFs and miRNAs was performed with the gene set expression class comparison tool in BRB-ArrayTools.
In addition, we used the survival analysis tool in BRB-ArrayTools to find genes, pathways, TFs and miRNAs related to survival for the partial datasets which provided related data. All the executive parameters were identical to those used in the class comparison.
We compared the identified gene sets, pathways, TFs and miRNAs to those in hESCGESs, and found their overlaps, respectively.

Materials
We analyzed 51 human gene expression datasets involving 23 tumor types (Table 3). For each dataset, we carried out class comparison and/or survival analysis algorithm to identify informative genes, pathways, TFs and miRNAs. A total of 75 class comparison and survival analysis were carried out ( Table 4). All the references relevant to Table 1, Table 2, Table 3 and Table 4 are presented in Additional file 4.

Overlaps between hESCGESs genes and tumor-associated genes
In the total of 75 class comparisons and survival analyses, we identified 72 sets of differentially expressed genes significant at 0.05 threshold level (Additional file 5, Table  S4). We analyzed the overlap between each of the 72 gene sets and each of the 24 hESC-associated gene sets. We found that they have considerable overlaps. For example, all the 379 genes in the hESC exp1 gene sets of Table 1 appeared in at least one of the 72 differentially expressed gene sets (DEGSs). Among them, 308 genes appeared in 10 or more DEGSs, and 120 genes appeared in 20 or more DEGSs. The most frequently overlapping gene was MTHFD2 (methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase), which occurred in one half the 72 DEGSs. The second most overlapping genes are MCM4 and MCM6 (34 overlaps), two members of the gene family encoding the mini chromosome maintenance complex. All the 40 genes in the hESC exp2 gene sets of Table 1 also occurred in at least one DEGS, and 26 genes occurred in no less than 10 DEGSs. Among them, MYBL2, a member of the MYB family of transcription factor genes involved in cell cycle progression, most frequently occurred in the DEGSs (31 times). Table 5 gives the number of the genes which have 10 or more overlaps and the top 10 overlapping genes in each of the 24 hESC-associated gene sets, suggesting that a large proportion of the hESC-associated genes are also related to cancer. Gene function enrichment analysis suggests that a substantial portion of the genes listed in Table 5 are involved in cell cycle regulation, DNA damage repair and replication, apoptosis, development and differentiation, cell adhesion and TF activity ( Table 6).
We carried out significance analyses of the overlapping gene sets between each of the 72 DEGSs and each of the 24 hESC-associated gene sets based on the hypergeometric test. Three heatmaps of hypergeometric p-values are presented in Figure 1, Figure 2 and 3, which visualize the significance of the overlap between the hESC-associated gene sets and the DEGSs among normal vs. tumor, good prognosis vs. poor prognosis phenotypes classes, and survival analysis, respectively (the detailed description of all the datasets related to each figure is provided in Additional file 6). These figures show that the targets of three core hESC-associated TF OCT4, SOX2 and NANOG have significant overlaps with most of the DEGSs. Two gene sets targeted by MYC also shows significant overlaps with most of the DEGSs. These results   suggest that key hESC-associated gene expression signatures have important implications in pathogenesis of cancer.

Overlaps between hESCGESs pathways and tumorassociated pathways
In the total of 75 class comparison and survival analyses, we identified 68 groups of pathways significant at 0.05 threshold level. Among the 54 hESC-associated signal pathways signatures, 26 pathways appeared at least in eight different groups and the other 28 pathways didn't appear in any group. The most frequent identified pathway was the Cell Cycle pathway, which appeared for 57 times (84% occurrence rate), and the next one was the MAPK pathway which was identified for 50 times (74% occurrence rate). Table 7 lists all the 26 pathways and their occurrence frequencies in the 68 groups of pathways significant in the cancer datasets. These pathways have been proven to play important roles in both maintenance of hESC function and tumorigenesis. Clearly, the Cell Cycle pathway plays an extremely important role in regulation of the self-renewal and   pluripotency process of hESCs [55][56][57][58][59]. The undifferentiated hESCs have a short G1 phase, and therefore show rapid cell cycle characteristic relative to differentiated somatic cells. The unorthodox G1/S phase transition feature in the hESC cell cycle is associated with the deregulated proliferation and differentiation blockades of tumor cells [39,[60][61][62][63][64][65]. The MAPK (Mitogen-Activated Protein Kinase) pathway regulates both the early embryonic development and the embryonic stem cell commitment from early steps of the process to mature differentiated cells [66]. The role of MAPK pathway in cancer is prominent as cancer can be perceived as a disease of communication between and within cells. The statistical significance analysis also shows that both the Cell Cycle pathway and MAPK pathway have important association with a majority of tumor types (see Additional file 7, Figure S1, Additional file 8, Figure S2 and Additional file 9, Figure S3). The importance of IGF signaling pathway for maintenance of hESCs has been proven [67][68][69][70]. This signaling pathway appears to play a crucial role in cancer and can be of potential interest in cancer therapy [71][72][73][74][75][76][77]  ERK pathway is active in the undifferentiation status of hESCs. Its activation is critical in maintenance of hESC self-renewal [78][79][80][81]. On the other hand, there has been accumulating evidence of ERK pathway (RAF-MEK-ERK signaling cascade) in oncogenesis to make it an attractive target for drug development [82]. Interestingly, almost all the widely-recognized hESCassociated pathways such as SHH, WNT, PRC2, Notch, PTEN and TGFβ have important linkage with cancer (see Table 7). The SHH (Sonic Hedgehog) signaling pathway is one of the key regulators of human embryonic development [83][84][85][86][87]. Activation of the pathway leads to an increased risk of the development of cancerous malignancies [87][88][89][90][91][92][93][94]. The WNT signaling pathway is a network of a number of proteins acting as a critical regulator of hESCs [43,56,59,69,79,84,85,[95][96][97][98][99][100][101][102][103]. However, the deregulation of the pathway has been closely associated with cancer [83,86,90,94,[103][104][105][106][107][108][109][110][111][112][113][114]. The PRC2 (Polycomb Repressive Complex 2) pathway is involved in control of the developmental regulators in hESCs [50,56,[115][116][117][118]. The expression of PRC2 components is upregulated in various cancers such as melanoma, lymphoma, and breast  and prostate cancer. The Notch signaling pathway plays a key role in the normal development of hESCs and many other cell types depending on the expression level and cellular context of the Notch receptors [84,85,101,119]. Its deregulation potentially contributes to cancer development in several different ways [111,[120][121][122][123][124][125][126]. The PTEN (PhosphaTase and Tensin Homolog) acts as a tumor suppressor gene involved in regulation of the cell cycle, preventing cells from growing and dividing too rapidly. This pathway is also critical for stem cell maintenance [59,69,83]. The TGFβ (Transforming Growth Factor β) signaling pathway is of central importance to the self renewal of hESCs [43,59,69,79,84,85,96,[98][99][100][101][102]115,127,128]. This signal pathway is involved in a wide range of cellular processes in both the adult organism and the developing embryo. It plays a role in both tumor suppression and tumor progression depending on cellular context [129][130][131][132].
Additional two important pathways involved in both hESCs function and tumorigenesis are p53 and telomerase pathways. They were identified for 21 and 22 times in our 68 class comparison or survival analysis (see Table 7). The p53 pathway can maintain the homeostasis of self-renewal and differentiation of hESCs [133][134][135]. Inactivation of this pathway in several cancer types may correlates with hESC-specific signatures [22,136,137]. Telomerase enzyme levels or activity has shown to be highly expressed in embryonic stem cells  [79]. On the other hand, telomerase is reactivated and serves to maintain telomere length in most advanced cancers [34]. Taken together, the high overlap between hESCGESs pathways and tumor-associated pathways reveals that there exist common mechanisms underlying cancerous malignancies and "stemness" of hESCs.

Overlaps between hESCGESs TFs and tumor-associated TFs
We identified 73 groups of targets of TFs significant at 0.05 threshold level. Among the 189 hESC-associated TF signatures, 42 TFs appeared at least in three different groups and the others didn't show in any group. The most frequently identified TF was MYC with 56% occurrence rate (41 occurrences), and the next one was MYB with 51% occurrence rate (37 occurrences). The complete 42 TFs accompanying with their occurrence frequencies are presented in Table 8.
From Table 8, we can see a number of "stemness" TFs identified as informative in tumors. Evidently, MYC is one of the most important TFs in both hESCs and Cancer cells [22,23,44,48,49,52,56,116,[138][139][140]. MYC represses differentiation and maintains the self-renewal of mouse and human pluripotent stem cells [138,141]. MYC regulatory networks may account for most of the transcriptional similarity between embryonic stem cells and cancer cells [139]. The statistical significance analysis also shows that MYC plays an important role in most of the tumor types analyzed (see Additional file 10, Figure  S4, Additional file 11, Figure S5 and Additional file 12, Figure S6).
Another extremely important TF is POU5F1 (OCT4), which is necessary for induction of pluripotent stem cells from human somatic cells [23,24]. OCT4 constitutes the core transcriptional regulatory circuitry in hESCs in combination with SOX2 and NANOG essentially responsible for the early development and propagation of undifferentiated hESCs [43,44,52,56,58,59,79,84,97,116,117,119,142,143]. OCT4 expression appears to be important in maintaining the undifferentiated state of embryonal carcinoma [86,144], as well as in other cancers [27,145].
Our analysis results suggest that several families of hESC-associated TFs like MYB, E2F, PAX, SMAD, STAT, POU, SP and GLI, are related to cancer (Table 8). For example, three members of the TF family MYB: MYB, MYBL1 and MYBL2, appear to be closely associated with cancer (Table 8). In fact, a substantial number of studies have revealed that they had important roles in regulation of stem cell self-renewal and differentiation [146,147], and the development of cancer [148,149]. E2F plays a crucial role in control of the cell cycle progression and regulating the expression of genes required for G1/S transition [150], and therefore is important for stem cell self-renewal and differentiation. The members of the family E2F1, 2, 3 and E2F4 have been reported to be associated with cancer [151]. PAX plays an essential role in regulating cell proliferation and self-renewal, resistance to apoptosis, migration of embryonic precursor cells, and the coordination of specific differentiation programs during embryonic development [59], as well as the development of cancer [152]. SMAD regulates cell proliferation and differentiation by activating downstream TGFß gene transcription. Its members play important roles in hESC fate determination [98], and cancerous pathogenesis [153]. STAT regulates cell growth, survival and differentiation via activation by JAK (Janus kinase). This pathway is critical for regulation of stem cell selfrenewal and differentiation [101]. Deregulation of this pathway is frequently observed in various tumor types [154]. POU mainly regulate the development of an organism, and are also involved in various cancers [155]. SP1 and SP3 are two members of the TF family SP (Specificity Protein) which binds GC-rich DNA sequences. Their roles in hESCs and cancer cells have been widely recognized [26]. GLI encompasses three members: GLI1, GLI2 and GLI3, all of which mediate the Hedgehog pathway and therefore are involved in hESC fate determination and cancerous pathogenesis [87].
In summary, the substantial overlap between the TFs involved in hESC fate determination and the TFs involved in cancerous pathogenesis suggests that hESCs and cancer cells may share essential regulatory mechanisms.

Overlaps between hESCGESs miRNAs and Tumorassociated miRNAs
We identified 67 groups of miRNA targets significant at 0.05 threshold level. Among the 114 hESC-associated miRNA signatures, 102 miRNAs appeared at least in eight different groups and the other 12 miRNAs didn't show in any group. The most frequently identified miRNA was miR-29c, which occurred for 34 times (51% occurrence rate), and the next one was miR-200b which occurred for 30 times (45% occurrence rate). Table 9 lists 50 miRNAs whose occurrence frequencies are no less than 20. The complete 102 miRNAs accompanying with their occurrence frequencies are presented in Additional file 13, Table S5.
Notably, there is a broad range of overlap between "stemness" miRNAs and oncogenic miRNAs. Most of the important "stemness" miRNAs are presented in Table 9 or Table S5. The miR-302 cluster miRNAs (miR-302a, miR-302a*, miR-302b, miR-302b*, miR-302c, miR-302c*, miR-302d) have been shown to regulate important cellular functions in hESCs, including cell proliferation and chromatin structure, and have been consistently reported to be overexpressed in hESCs [156]. All the seven members of this group appear in Table S5, and five of them are also presented in Table 9, indicative of their close linkage with cancer. Some literatures have reported the relatedness between miRNA-302 family and tumorigenecity [157][158][159][160]. Another group of miR-200 family miRNAs (miR-200a, miR-200b, miR-200c, miR-141 and miR-429) have been revealed to be hESC-specific, and upregulated in hESCs [156,161,162]. Three of them are presented in Table S5 and miR-200b and miR-200c are also listed in Table 9 with relatively high frequencies (30 and 26, respectively), strongly indicating their association with cancer. In fact, this miRNA family plays an important role in cancerous pathogenesis [163][164][165]. The miRNA-520 cluster on chromosome 19 was highly expressed in undifferentiated hESCs, and might be closely involved in hESC function [156,166]. Its eight members miRNA-520a-h show in Table S5 and six members miRNA-520a-f also show in Table 9, suggesting that the miRNA family has tight connection with cancer. Many studies have revealed the relatedness between its members and cancer [167][168][169][170]. The miR-518b, miR-518c, miR-519b and miR-519c have been consistently reported to be overexpressed in undifferentiated hESCs [156,166,171,172]. Our analysis outcomes suggest that they may be closely involved in the development of cancer (Table 9). This finding is supported by some studies [173,174]. In addition, the other miRNA families shown in Table 9 like miRNA-29, 19, 15, 20 and let-7 have been revealed to be involved in both hESC fate determination and cancerous pathogenesis [53,96,161,175].
The statistical significance analysis shows that some "stemness" miRNAs like miR-29 family member miR-29a, miR-29b and miR-29c are associated with a broad spectrum of tumor types (see Additional file 14, Figure  S7, Additional file 15, Figure S8 and Additional file 16, Figure S9).
Taken together, a number of miRNAs play crucial roles in both hESC fate determination and tumorigenicity.

Discussion
Although the evidence strongly supporting the CSC theory remains insufficient, and the fundamental experimental evidence for CSCs based on mouse xenograft models are controversial [21], the CSC model is attractive for it provides reasonable explanation of the development mechanisms underlying cancer, as well as a promise of improved cancer therapies. Therefore, any proof in favor of the CSC theory is valuable in the biology of cancer.
In this study, we provided an indirect evidence for the CSC theory using the computational biology approach. We found a strong linkage between hESCs and cancer cells by an examination of the similarity between the hESC-specific gene expression profiles and cancer-specific gene expression profiles. The hESC-specific gene expression signatures including genes, pathways, TFs and miR-NAs were generally differentially expressed among normal vs. tumor phenotypes, or among cancer subtypes with distinct clinical outcomes. The genes important in regulation of hESC self-renewal and differentiation such as SOX2 and MYB, were also closely involved in tumorigenicity. The signal pathways such as the Cell Cycle, MAPK, SHH, WNT, PRC2, Notch, PTEN and TGFβ involved in the hESC fate determination were also strongly associated with cancer genesis, progression and prognosis. The typical hESC-specific TFs like OCT4 and c-Myc (also known as MYC), appeared to be important in control of the undifferentiated state of cancer cells. The miRNAs overexpressed in undifferentiated hESCs like miRNA-302, 200 and 520 cluster miRNAs, were closely involved in the development of cancer.
Generally speaking, the cell cycle regulation mechanism mostly underlies the commonality between hESCs   96 20 and cancer cells. Differing from somatic cells, hESCs have an abbreviated G1 phase in cell cycle, which is critical for maintenance of hESC self-renewal and pluripotency. The abbreviated G1 phase is also largely responsible for the uncontrolled proliferation of tumor cells which escape from the programmed cell death during the G1 phase [62]. In fact, the hESC-associated signatures most frequently identified in tumors are mainly involved in regulation of cell cycle (see Table 6, Table 7, Table 8 and Table 9). Among them, the TF c-Myc is the core signature connecting hESCs with cancer cells. c-Myc binds genic and intergenic regions to regulate the expression of thousands of genes and noncoding RNAs throughout the genome [138]. c-Myc is involved in the cell cycle regulation by directly regulating cell cycle regulators [44,116,138], or regulating miRNAs which inhibit cell cycle regulators [96,138]. The role of c-Myc in linking hESCs with cancer has been recognized [138,139].
Here we identified differentially expressed genes at 0.05 significance level. A more stringent significance threshold of 0.001 would be more statistically reasonable if considering corrections of multiple hypotheses. Because the numbers of significant pathways, TFs and miRNAs identified by analyses of gene sets would be small for a majority of datasets if the significance threshold of 0.001 were used under which the number of differentially expressed genes were still often substantial, we selected the 0.05 significance level for all the differentially expressed analyses in order to keep consistency.
One limitation of this study was that the analyses were mainly based on the computational biology approach which needs experimental validation to corroborate these findings. In addition, some finer analyses such as grouping the overlaps of gene signatures between hESCs and tumors according to different tumor categories, separating the differentially expressed genes into the overexpressed and underexpressed genes etc., may contribute to a better understanding of the similarities between hESCs and tumor cells in gene expression profiles. Another limitation of this study was that we identified tumor-associated gene expression signatures based on whole tumor samples which might be derived from the majority of tumor cells, not necessarily from the minority of CSCs so that the overlapping signatures identified between hESCs and tumors might not be able to provide a strong support for the CSC model. If the tumor-associated gene expression signatures were identified by comparison between isolated CSCs versus non-CSCs fraction of the same tumor, the same results would be more reliable in support of the CSC model. These issues could be addressed in future research.
A further problem is the intertwined relationships between stem cell, cancer and ageing [176]. Cancer is actually an age-related disease as the incidence of cancer grows exponentially with ageing. Meanwhile, ageing is mostly caused by a decline in the replicative function of stem cell [177], and in turn aging has effects on the function of stem cell [178]. Thus, an in-depth investigation of the molecular mechanisms that connect stem cell, cancer and ageing will be necessary for postponing ageing and overcoming cancer.

Conclusions
The present results revealed the close linkage between the hESC-specific gene expression profiles and cancerspecific gene expression profiles, and therefore offered an indirect support for the CSC theory. However, many interest issues remain to be addressed further.

Availability of supporting data
The 51 human cancer gene expression datasets are available at the following website: http://linus.nci.nih. gov/~brb/DataArchive_New.html. All the other datasets supporting the results of this article are included within the article and its additional files.
Additional file 3: Table S3. 114 hESC-associated microRNAs. Additional file 7: Figure S1. Significance of overlap between hESC and tumor pathways by normal vs. tumor class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 8: Figure S2. Significance of overlap between hESC and tumor pathways by good vs. poor prognosis class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 9: Figure S3. Significance of overlap between hESC and tumor pathways by survival analysis. The detailed description of all the datasets is provided in Additional file 6.
Additional file 10: Figure S4. Significance of overlap between hESC and tumor TFs by normal vs. tumor class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 11: Figure S5. Significance of overlap between hESC and tumor TFs by good vs. poor prognosis class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 12: Figure S6. Significance of overlap between hESC and tumor TFs by survival analysis. The detailed description of all the datasets is provided in Additional file 6.
Additional file 13: Table S5. 102 miRNAs identified at least in eight different groups.
Additional file 14: Figure S7. Significance of overlap between hESC and tumor miRNAs by normal vs. tumor class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 15: Figure S8. Significance of overlap between hESC and tumor miRNAs by good vs. poor prognosis class comparison. The detailed description of all the datasets is provided in Additional file 6.
Additional file 16: Figure S9. Significance of overlap between hESC and tumor miRNAs by survival analysis. The detailed description of all the datasets is provided in Additional file 6.