Global gene expression profiling of oral cavity cancers suggests molecular heterogeneity within anatomic subsites

Background Oral squamous cell carcinoma (OSCC) is a frequent neoplasm, which is usually aggressive and has unpredictable biological behavior and unfavorable prognosis. The comprehension of the molecular basis of this variability should lead to the development of targeted therapies as well as to improvements in specificity and sensitivity of diagnosis. Results Samples of primary OSCCs and their corresponding surgical margins were obtained from male patients during surgery and their gene expression profiles were screened using whole-genome microarray technology. Hierarchical clustering and Principal Components Analysis were used for data visualization and One-way Analysis of Variance was used to identify differentially expressed genes. Samples clustered mostly according to disease subsite, suggesting molecular heterogeneity within tumor stages. In order to corroborate our results, two publicly available datasets of microarray experiments were assessed. We found significant molecular differences between OSCC anatomic subsites concerning groups of genes presently or potentially important for drug development, including mRNA processing, cytoskeleton organization and biogenesis, metabolic process, cell cycle and apoptosis. Conclusion Our results corroborate literature data on molecular heterogeneity of OSCCs. Differences between disease subsites and among samples belonging to the same TNM class highlight the importance of gene expression-based classification and challenge the development of targeted therapies.


Background
Head and neck squamous cell carcinomas (HNSCC) rank among the top ten most common cancers worldwide, with increasing rates in certain areas of the world [1]. They can occur at different subsites, often invading more than one, each one with their own particular problems regarding management. In this study, we focused on oral squamous cell carcinoma (OSCC), a frequent neoplasm, which is usually aggressive and has unpredictable biological behavior and unfavorable prognosis. The accumulation of genetic alterations during oral tumorigenesis, leading to qualitative and quantitative changes in gene expression, is currently known but still largely unexplored [2,3]. Unlike estrogen receptor or HER2 in breast cancer, no biomarkers are currently available for HNSCC prognosis, which depends largely on the stage at presentation, with the most important prognostic factor being the presence of neck node metastases [4]. Improvements in specificity and sensitivity of diagnosis and in disease monitoring depend on the elucidation of the biological and molecular mechanisms underlying tumor development. In this context, large-scale transcriptome analysis may be used to assess patterns of gene expression, providing an opportunity to investigate the complex cascade of molecular events leading to tumor development and progression [5,6].
Two recent publications have examined the global gene expression profiles of OSCC, both addressing the importance of understanding the molecular complexity of this malignancy [7,8]. Unlike those authors, we did not focus on molecular differences between OSCC and normal mucosa, but we compared gene expression profiles of OSCC samples from different anatomic subsites. The biologic behavior of OSCC cannot be predicted and clinical reports suggest that molecular heterogeneity of anatomic subsites could play an important role in this scenario. For instance, cancer of the tongue seems to grow faster than other oral cavity cancers, with cervical metastases occurring more frequently in such cancers [9]. Moreover, significant differences between floor of the mouth and tongue cancers in response to combined surgery and radiotherapy have been reported [10].
In order to evaluate molecular differences between OSCC anatomic subsites, we investigated samples of primary tongue and floor of the mouth tumors, and their corresponding surgical margins, in respect to their gene expression profile by means of DNA microarrays.

Methods
We selected samples of primary tongue and floor of the mouth tumors, and their corresponding surgical margins, from 9 male patients during surgery for squamous cell carcinomas. Samples consisted of either small tumors but already with evidences of metastasis (T2N+), or larger, non-metastatic samples (T3N0). Patients had a history of tobacco and alcohol abuse and no prior cancer treatment. A full description of the clinical data, including tumor stage is provided in Additional File 1. All participants provided written consent and the research protocol was approved by review boards of all involved institutions. Analysis of haematoxylin and eosin-stained sections indicated that each OSCC sample contained at least 70% tumor cells and the corresponding surgical margins were reported "tumor-free". Total RNA was extracted with TRI-ZOL Reagent (Invitrogen). Tissues adjacent to resected tumors were grouped into two pools of RNA for microarray analysis, corresponding to T2N+ and T3N0 samples, due to the limited amounts of tumor-free margin available in most cases.
Microarray experiments were carried out using CodeLink Whole Genome Bioarrays (GE Healthcare) and arrays were scanned on a GenePix 4000B Array Scanner (Axon Instruments), according to the recommended scanning procedures and settings. The data were treated with Code-Link feature extraction software v.4.0. A normalized signal for each transcript was obtained through quantile normalization [11]. The array design and raw data files are available at the Gene Expression Omnibus database (GEO) under the accession number GSE9792. For global gene expression visualization, we used hierarchical clustering and Principal Components Analysis (PCA). Hierarchical clustering was performed using the Euclidean distance and the average linkage algorithm. One-way ANOVA was used to identify significant differences in the dataset. All the above-mentioned analyses were generated using Partek ® software version 6.3 Copyright © 2007 Partek Inc., St. Louis, MO, USA. Functional annotations of differentially expressed genes were performed using Database for Annotation, Visualization and Integrated Discovery [12] using the parameters Gene Ontology (GO) Molecular Process term level 5 and KEGG Pathways (Kyoto Encyclopedia of Genes and Genomes) [13].
In order to corroborate our findings, two publicly available microarray datasets on OSCC were used in addition to our data (GEO accession number GSE3524 and ArrayExpress accession number TABM302). Both experiments were carried out using Affymetrix HG-U133 GeneChips; GSE3524 corresponds to the profiling of 7 tongue and 9 floor of the mouth samples, while TABM302 corresponds to the profiling of 18 tongue samples and 10 samples reported as oral cavity. Statistical analysis and functional annotation of the differentially expressed genes followed the procedures previously described for our dataset (Additional Files 2 and 3).

Results and Discussion
By analyzing the dendogram resulting from hierarchical clustering and the PCA, we observed that OSCC samples did not group according to their pathological stages (Figures 1 and 2, respectively). In agreement with previous reports [14,15], two major clusters reflected differences in global gene expression between non-tumoral tissues (cluster 1) and tumor samples (cluster 2). Tumor samples clustered mostly, but not exclusively, according to disease anatomic subsites. This result suggests molecular heterogeneity within TNM classes and corroborates previous data published by Ziober et al. [15] and Chung et al [16]. In the later study, there were no consistent differences in gene expression among HNSCC subsites, but samples from oral cavity seemed more heterogeneous than samples from other sites when expression patterns were evaluated. Few studies have aimed to understand the molecular background of OSCC. In the context of molecular target therapies, drugs are designed to interact with specific molecules present in certain types or subtypes of tumors. Due to the heterogeneity within HNSCC, a thoroughly comprehension of the molecular characteristics underlying its subsites is needed before efficient therapy can be achieved.

Differential gene expression profiles
Using hierarchical clustering and PCA, two main groups of tumor samples (2a and 2b) were observed ( Figure 1). Each group contained, mostly, either tongue or floor of the mouth samples. Differentially expressed genes between the two clusters were identified using ANOVA (Additional File 4). These genes (total number 1579) were functionally annotated using Gene Ontology (GO) and KEGG terms in an attempt to understand major molecular differences between the groups (Tables 1 and 2). Due to our small sample number, our results were compared with two additional OSCC microarray datasets, GSE3524 and TABM302. Despite broad differences in experimental Hierarchical clustering Figure 1 Hierarchical clustering. The analysis was performed using the Euclidean distance and the average linkage algorithm. N1: Surgical margins from T2N+ samples; N2: Surgical margins from T3N0 samples. Case1-9: Tumor samples described in Additional File 1.
design, probe sets, and possibly genetic background, some findings were consistent and are pointed out below.
Extensive differences in gene expression between clusters 2a and 2b were observed in respect to translation and mitosis-related GO terms (GO:0006412:translation, GO:0000087:M phase of mitotic cell cycle; GO:0007067:mitosis; GO:0000279:M phase). Translation initiation is regulated in response to mitogenic stimulation, thus coupled with cell cycle progression and cell growth. Several alterations in translational control occur in cancer, but there is still much to be discovered about their role in cancer development and progression [17]. Changes in the expression or availability of components of the translational machinery can lead to global changes, such as an increase in the overall rate of protein synthesis and translational activation of the mRNA molecules involved in cell growth and proliferation. In agreement, differences between the two clusters were enriched in GO terms Ribonucleotide Biosynthetic Process, RNA Process-   -07 C15ORF15, CEBPG, DARS2, DDX1, DENR, EEF1A1, EIF1, EIF1AY,  EIF2B1, EIF2B3, EIF2C4, EIF3S7, EIF4A1, EIF4EBP1, ETF1, GSPT1,  HARSL, HBS1L, IGF2BP2 ing and mRNA Metabolic Process as well as in KEGG Proteasome, Cell Cycle, TGFB Signaling and Ribosome Pathways. Components of the translation machinery and related pathways represent good targets for cancer therapy [18,19]. The differential expression of a number of genes involved in such processes suggests that the efficacy of oral cancer therapy should vary depending on the tumor subsite. Molecular mechanisms leading to differences in gene expression (i.e variations in mRNA sequences, changes in the availability of the translational machinery components, and activation of translation through abnormally activated signal transduction pathways) deserve to be investigated.

Three-dimensional Principal Components Analysis
Differences in the expression of cell cycle progressionrelated genes were also observed in both GEO (GSE3524) and ArrayExpress (TABM302) datasets. In respect to the former, several genes associated with the KEGG MAPK Signaling Pathway presented differential expression when tongue samples were compared with floor of the mouth samples. Similar results were observed for GO terms concerning transcriptional regulation and apoptosis (Additional File 5). In agreement with our results, TABM302 dataset showed differential expression in genes associated with the GO terms Maintenance of Chromatin Architecture and DNA Packaging, both essential in cell cyclerelated processes, and with the KEGG Ribosome Pathway (Additional File 6). Proliferation and cell death are also regulated by genes associated with the GO terms Small GTPase Mediated Signal Transduction and Regulation of Ras Protein Signal Transduction, both represented several times in the list of differentially expressed genes of the TABM302 dataset.
With respect to cell cycle and apoptosis, we also observed the over-expression in cluster 2b of three genes involved in the regulation of p27 phosphorylation during cell cycle progression (CKS1B, RBX1 and SKP1A) and genes from the MCM complex (minichromosomal maintenance genes), including MCM7 (Additional File 4). MCM7 protein is a marker for proliferation and it is upregulated in different tumors including neuroblastoma, prostate, cervical and hypopharyngeal carcinomas [20]. SKP1A, MCM3, MCM6 and MCM7 were also differentially expressed in TABM302 dataset (Additional File 3), however, their overexpression was observed in tongue samples.
Widespread differences were also observed in oxidative phosphorylation pathways (Tables 2 and 3). Genes belonging to KEGG Oxidative Phosphorylation and Citrate Cycle (TCA cycle) Pathways were over-expressed in cluster 2b. Oxidative phosphorylation of ADP to ATP accompanies the oxidation of a metabolite through the operation of the respiratory chain. According to Warburg [21], alterations of respiratory machinery should result in compensatory increase in glycolytic ATP production and lead to carcinogenesis. In fact, malignant cells meet their energy (ATP) needs by glycolysis rather than through oxydative phosphorylation, probably an adaptation to hypoxia that develops as tumor grows [22]. In accordance with our results, some authors have recently found a bioenergetic signature in several human tumors [23]. Our observations suggest that distinct OSCC subsites differ on their glycolytic phenotype, a matter that deserves further investigation and may impact tumor progression management as well as therapy outcome. Genes related to KEGG Oxidative Phosphorylation Pathway also showed differences in expression in GSE3524 dataset when subsites were compared; the same was observed for genes related to the GO term Organelle ATP Synthesis Coupled Electron Transport (Additional File 5).
The expression of several cytochrome oxidases and ATPases was also distinct between subsites in TABM302 (Additional File 3).
KEGG Pathway analysis revealed genes over-expressed in cluster 2a which are associated with gap junctions and adherens junctions, as well as cytoskeleton organization and biogenesis (Table 4), including ACTN, DOCK1, GSN, ITG, LIMK, RAS, SOS, SSH and TIAM1. In agreement with our data, genes related to KEGG Regulation of Actin Cytoskeleton and Adherens Junction Pathways were also found differentially expressed in TABM302 (Additional File 6). Cancer therapy has focused on such pathways. For example, microtubule-targeted drugs interfere with the ability of the cancer cells to divide and multiply by disrupting microtubules [24]. Gene-silencing may reduce expression of proteins involved in regulating the actin cytoskeleton, as LIMK, rendering cells more responsive to anticancer agents [25]. Therefore, molecular differences between OSCC subsites may affect sensitivity to chemotherapy.

Conclusion
Gene expression profiling of lymph node positive and lymph node negative HNSCC has been addressed in the literature [14,15,26,27]. Besides the differences in study designs, the lack of consistency concerning HNSCC gene expression signatures could be due to gene expression variation related to subsites. This is a preliminary study and as such presents results on a small number of samples.  However, the idea of molecular heterogeneity within HNSCC is not new. We corroborate previously reported literature data and we suggest that differences between anatomic OSCC subsites could impact drug response and should be considered during the development of targeted therapies.