CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles

Ramos, Thaís A. R.; Maracaja-Coutinho, Vinicius; Ortega, J. Miguel; do Rêgo, Thaís G.

doi:10.1186/s13104-020-05171-6

Research note
Open access
Published: 14 July 2020

CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles

Thaís A. R. Ramos^1,2,
Vinicius Maracaja-Coutinho^1,2,3,
J. Miguel Ortega⁴ &
…
Thaís G. do Rêgo^1,5

BMC Research Notes volume 13, Article number: 338 (2020) Cite this article

2369 Accesses
3 Citations
2 Altmetric
Metrics details

Abstract

Objective

Data normalization and clustering are mandatory steps in gene expression and downstream analyses, respectively. However, user-friendly implementations of these methodologies are available exclusively under expensive licensing agreements, or in stand-alone scripts developed, reflecting on a great obstacle for users with less computational skills.

Results

We developed an online tool called CORAZON (Correlations Analyses Zipper Online), which implements three unsupervised learning methods to cluster gene expression datasets in a friendly environment. It allows the usage of eight gene expression normalization/transformation methodologies and the attribute’s influence. The normalizations requiring the gene length only could be performed to RNA-seq, meanwhile the others can be used with microarray and/or NanoString data. Clustering methodologies performances were evaluated through five models with accuracies between 92 and 100%. We applied our tool to obtain functional insights of non-coding RNAs (ncRNAs) based on Gene Ontology enrichment of clusters in a dataset generated by the ENCODE project. The clusters where the majority of transcripts are coding genes were enriched in Cellular, Metabolic, Transports, and Systems Development categories. Meanwhile, the ncRNAs were enriched in the Detection of Stimulus, Sensory Perception, Immunological System, and Digestion categories. CORAZON source-code is freely available at https://gitlab.com/integrativebioinformatics/corazon and the web-server can be accessed at http://corazon.integrativebioinformatics.me.

Introduction

Gene expression is the process by which information encoded in a particular genomic region is transcribed in a functional gene product. These products can be coding or non-coding RNAs, i.e. transcripts that do not encode a protein but are functional important players in the cellular regulation in organisms from all domains of life [1,2,3,4,5,6]. Microarrays and RNA sequencing (RNA-seq) are large-scale technologies commonly used to measure transcript expression levels [7,8,9,10,11,12]. Both technologies generate a final expression matrix, containing the raw values for all biological samples in a study, which will be subsequently used in order to obtain the set of differentially expressed transcripts in studied samples and conditions.

The values of gene expression can be influenced by different variables (i.e. biological conditions, expression technology, sequencing library length, RNA quality), disproportionating the number of reads/hybridizations associated with particular samples, affecting the real expression values of studied samples. For a proper and reliable interpretation of quantitative gene expression measurements, a normalization is necessary to correct expression bias generated by these variables. Different data normalization approaches have been described so far. For instance, in many studies, a single housekeeping gene is used for normalization. However, no unequivocal single reference gene or non-coding RNA (with a proven invariable expression between cells and conditions) has been described yet [13]. As an alternative, the mean expression of multiple genes can be used for normalization [13, 14]. In RNA-seq, gene expression values are normally normalized by the size of the library.

The large quantity of biological data generated in large-scale genomics and transcriptomics projects thrived an intense demand to use computational techniques provided by artificial intelligence [15,16,17,18]. Unsupervised learning is the machine learning task of inferring a function to describe the hidden structure from unlabeled data. The inference of the function is performed with the analysis of gene expression, in which commonly, genes with the same expression patterns at the same time points and conditions can be participating on the same biological processes. Unsupervised methods transform the gene expression data on coordinates of a point in a given space and cluster them according to their similarities. The method uses the examples provided and tries to determine if some of them can be grouped in any way, forming clusters. Gene expression clustering has the goal to subdivide sets of expressed transcripts in such a way that those with similar expression patterns fall into the same cluster, while those with different expression patterns fall into different clusters [19]. It allows a deeper exploration of the data. For instance, transcripts co-expressed in a set of different experiments or conditions tend to be part of the same biological pathways and may possess similar gene ontology categories [20,21,22,23,24,25]. It is helpful in the functional assignation of transcripts without any functional annotation, as well as on the identification of co-regulated transcripts.

Packages available in R, Perl or Python libraries provide normalization and clustering methods that can be used for gene expression analysis. However, to use these tools it is necessary prior knowledge in these programming languages, reflecting in a great obstacle for users with less computational or bioinformatics backgrounds. Here, we introduce a tool called CORAZON (Correlation Analyses Zipper Online), a user-friendly web server, developed to facilitate expression data normalization and clustering in a streamlined way, and applied it to obtain functional insights of ncRNAs based on their expression patterns and gene ontology enrichment.

Main text

Materials and methods

CORAZON implementation and clustering methods validation using simulated data sets

CORAZON web server was developed with eight normalization/transformation methodologies (https://corazon.integrativebioinformatics.me/documentation.html): Trimmed Mean of M-values (TMM) [26], Median Ratio Normalization (MRN) [27], Fragments Per Kilobase Million (FPKM), Transcripts Per Million (TPM), Counts Per Million (CPM), base-2 log, instance normalization and normalization by the highest attribute value for each instance. The normalizations which demand the transcript size (e.g. FPKM and TPM), we assumed that the 2^nd column will have this value. Moreover, three unsupervised machine learning algorithms (Mean Shift, K-Means and Hierarchical) adopting Euclidean distance a measure of similarity, and a strategy to observe the attributes influence in the results were incorporated.

Normalizations, the clustering algorithms K-Means and Mean Shift and the web server application were implemented using Python. Hierarchical clustering was implemented using R. MySQL language was used to store and query the job results, as well as to perform the communication and interaction with the web page. The interface was developed using HTML, CSS, Bootstrap, and Javascript. CORAZON source code with a Docker platform is freely available at https://gitlab.com/integrativebioinformatics/corazon and the web server can be accessed at http://corazon.integrativebioinformatics.me.

Implemented algorithms had their performances evaluated through five models commonly used to validate clustering methodologies. Simulated models were built based on the work of [28, 29]. For each model, we generated 50 datasets and applied the three algorithms implemented.

Application using expression data of human coding and non-coding transcripts

We used our tool to study an RNA-seq dataset of 13 different tissues extracted from ENCODE [30]. Our goal was to obtain functional insights for ncRNAs, through the exploration of gene ontology functional categories of protein-coding genes co-expressed with ncRNAs. The expression matrix for all 13 tissues was extracted from [30]. Data were normalized using TPM and log₂, and clustered using the three available algorithms.

Results

CORAZON web server overview and usage

CORAZON is a streamlined web server that facilitates data normalization and uses machine learning to cluster transcripts according to their expression patterns. It receives as input an expression matrix, which can be used for different tasks, according to user preference. Briefly, the user can use the tool for only normalize their expression data, clustering the transcripts according to their expression patterns or both. Figure 1 shows the workflow of CORAZON tool.

Algorithms performance evaluation using simulated data

The implemented clustering algorithms had their performances evaluated through five models commonly used to validate clustering methodologies [28, 29]. The first model was the creation of 200 points in 10 dimensions; in the second we created 3 clusters in 2 dimensions; the third consists of generating 4 clusters in 3 dimensions; in the fourth we produced 4 clusters in 10 dimensions; and in the last model we had 2 elongated clusters in 3 dimensions. Thus, we generated 50 datasets and applied the three algorithms implemented in CORAZON web server. The algorithms presented accuracies ranging between 92 and 100%.

Functional insights of non-coding RNAs based on their expression patterns and gene ontology enrichment

We applied CORAZON to obtain functional information of ncRNAs based on the Gene Ontology enrichment of protein coding genes clustered together with ncRNAs, using a dataset composed of 13 RNA-seq assays from different human tissues generated by the ENCODE project. To select the best number of clusters for K-means and hierarchical algorithms, we used the Bayesian information criterion (BIC) [31], followed by the derivative of the discrete function and Silhouette [32]. In the hierarchical method, we tested 8 linkage criteria and adopted Ward’s method [33]. In total, we analyzed 41,283 transcripts (19,912 coding; 21,371 non-coding), which were clustered in 10 (K-means and hierarchical) and 13 (mean shift) clusters (Additional file 1: Table S1). The analysis using the three implemented algorithms identified sets of clusters represented mostly (more than 70%) by non-coding RNAs. Thus, GO enrichment analysis of the clusters composed in its majority by coding genes were usually enriched in cellular, metabolites, detection of stimulus, sensory perception, and systems development categories. The clusters composed in its majority by ncRNAs were enriched in coding genes associated with reproduction, development, immunological system, neurological system, localization, and digestion categories. An example of these results for hierarchical clustering can be found in Fig. 2. Results for K-means and mean shift can be found in Additional file 1: Figures S1 and S2, respectively.

To gain further insights on the putative biological relevance of ncRNAs with correlated expression levels with coding genes, we used the three implemented algorithms to generate clusters of highly correlated transcripts (i.e. Spearman > 0.95). The correlation analysis revealed a set of 17,732 correlated transcripts (4829 coding genes and 12,903 non-coding RNAs). Hierarchical and K-means algorithms generated three clusters, meanwhile mean shift generated four (Additional file 1: Table S2). The algorithms generated two clusters composed mainly by non-coding RNAs (more than 50%). The gene ontology enrichment analysis revealed that these clusters were associated with coding genes related to different metabolic processes, localization and inflammatory and defense responses (Fig. 3).

Discussion

CORAZON implemented normalization/transformation methodologies that can be used in RNA-seq, microarray and/or NanoString nCounter. It is worth to note that microarray and NanoString can only use the normalization methods that do not requires the transcript size. Those methodologies can normalize gene expression taking into account the different characteristics of the data (i.e. sequencing depth, transcript length, samples with disproportionate expression values). We successfully applied the tool to characterizing the expression patterns of coding and non-coding genes from 13 different tissues generated by the ENCODE project. Co-expressed transcripts are normally part of common biological pathways and functional GO categories, or they can be regulated by similar mechanisms [20,21,22,23,24,25]. Firstly, all 41,283 expressed coding (19,912) and non-coding (21,371) transcripts were clustered according to their expression values, using the three unsupervised clustering algorithms incorporated in CORAZON. This analysis revealed 10 clusters for hierarchical and K-means algorithms and 13 clusters for the mean shift algorithm. GO analysis revealed that most of the clusters generated by the three algorithms are enriched with similar biological process categories, associated with key general processes from the cell (i.e. metabolic processes, transport, systems development, detection of stimulus, RNA processing, sensory perception, immunological system, digestion, reproduction, synaptic signaling, neurological system and defense response). Thus, the similarity in the results (from hierarchical to partition methods) of the clusters enrichment analysis, strengthens the hypothesis that these transcripts actually have similar biological processes.

Furthermore, we observed that clusters enriched with coding genes (i.e. composed by more than 80% of coding genes) are related to GO terms associated with general metabolic processes, development, and cell adhesion. Clusters enriched with ncRNAs (i.e. more than 70% of non-coding genes) are related to coding genes associated with reproduction, immunological system, neurological system, localization, and digestion. Those results suggest that the set of ncRNAs clustered together with coding genes that are associated with the functional categories listed above could also be part of biological cellular processes directly linked to these mechanisms. The performance of ncRNAs in most of these processes have been widely studied, revealing its role in regulating proper cell functioning or disease (i.e. neurological disorders and cancers) [34,35,36,37,38,39,40,41]. For instance, [42] used the enrichment of functional GO annotations of coding genes located in the vicinity to ncRNAs, and noted that the two groups with the highest number of ncRNAs were associated with “synaptic transmission” (47 non-coding RNAs) and “generation of male gametes” (20 ncRNAs). This finding is consistent with previous studies and reinforce that ncRNAs are particularly active in the brain or during embryonic development.

Using CORAZON to cluster highly correlated transcripts (i.e. Spearman > 0.95), each algorithm generated two clusters represented in its majority by ncRNAs (more than 50%). Those clusters were associated with different metabolic processes, localization, inflammatory and defense responses. It was also observed that other clusters had specificities in cellular, metabolic, localization, transport and response processes. Finally, it was observed that clusters composed in its majority by coding genes (i.e. more than 82%) were related to metabolic processes. It was also observed that hierarchical cluster 1 (with 93.33% of coding genes) and K-means cluster 2 (with 93.69% of coding genes) were almost identical.

In summary, CORAZON simplifies gene expression normalization and unsupervised clustering. The results obtained in this study illustrate the potential of the tool and the possibilities of obtaining functional insights from clusters through the use of predictive associations between ncRNAs and the functional categories of clustered together coding genes. There are other methodologies for gene expression data normalization available in literature (e.g. quantile and RMA for microarrays; RLE for RNA-seq [43, 44]) that are not yet incorporate in our tool, but we intend to implement in the close future.

Limitations

CORAZON architecture works with a process queue, resulting in a potential long-time waitlist for the user if we have hundreds of users at the same time. We are currently working on the parallelization of the tool to avoid this issue.

Availability of data and materials

Some of the data analysed during this study were obtained from the article: Lin S, Lin Y, Nery JR, Urich MA, Breschi A, Davis CA, Dobin A, Zaleski C, Beer MA, Chapman WC, Gingeras TR, Ecker JR, Snyder MP: Comparison of the transcriptional landscapes between human and mouse tissues. Proceedings of the National Academy of Sciences of the United States of America. 2014, 48:17224-17229. https://doi.org/10.1073/pnas.1413624111

Abbreviations

BIC:: Bayesian Information Criterion
CORAZON:: Correlations Analyses Zipper Online
CPM:: Counts Per Million
CSS:: Cascading Style Sheets
FPKM:: Fragments Per Kilobase
HTML:: Hypertext Markup Language
ID:: Job Identifier Number
MRN:: Median Ratio Normalization
mRNA:: Messenger RNA
MySQL:: My Structured Query Language
ncRNA:: Non-coding RNA
RNA:: Ribonucleic acid
RNA-Seq:: RNA sequencing
TMM:: Trimmed Mean of M-values
TPM:: Transcripts Per Million

References

Mattick JS. The central role of RNA in the genetic programming of complex organisms. An Acad Bras Ciênc. 2010;82:933–9. https://doi.org/10.1590/S0001-37652010000400016.
Article CAS PubMed Google Scholar
Oliveira KC, et al. Non-coding RNAs in schistosomes: an unexplored world. An Acad Bras Ciênc. 2011;83:673–94. https://doi.org/10.1590/S0001-37652011000200026.
Article CAS PubMed Google Scholar
Storz G, et al. Regulation by small RNAs in bacteria: expanding frontiers. Mol Cell. 2011;43:880–91. https://doi.org/10.1016/j.molcel.2011.08.022.
Article CAS PubMed PubMed Central Google Scholar
Gomes-Filho JV, et al. Sense overlapping transcripts in IS1341-type transposase genes are functional non-coding RNAs in archaea. RNA Biol. 2015;12:490–500. https://doi.org/10.1080/15476286.2015.1019998.
Article PubMed PubMed Central Google Scholar
Tycowski KT, et al. Viral noncoding RNAs: more surprises. Genes Dev. 2015;29:567–84. https://doi.org/10.1101/gad.259077.115.
Article CAS PubMed PubMed Central Google Scholar
Orell A, et al. A regulatory RNA is involved in RNA duplex formation and biofilm regulation in Sulfolobus acidocaldarius. Nucleic Acids Res. 2018;46:4794–806. https://doi.org/10.1093/nar/gky14.
Article CAS PubMed PubMed Central Google Scholar
Schena M, et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–70. https://doi.org/10.1126/science.270.5235.467.
Article CAS PubMed Google Scholar
Schena M. Microarray biochip technology. Eaton Publishing: Sunnyvale; 2000. ISBN: 1881299376, 9781881299370.
Tarca AL, et al. Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol. 2006;195:373–88. https://doi.org/10.1016/j.ajog.2006.07.001.
Article CAS PubMed PubMed Central Google Scholar
Clark TA, et al. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–10. https://doi.org/10.1126/science.1069415.
Article CAS PubMed Google Scholar
Nagalakshmi U, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–9. https://doi.org/10.1126/science.1158441.
Article CAS PubMed PubMed Central Google Scholar
Wang Z, et al. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. https://doi.org/10.1038/nrg2484.
Article CAS PubMed PubMed Central Google Scholar
de Kok J, et al. Normalization of gene expression measurements in tumor tissues: comparison of 13 endogenous control genes. Lab Invest. 2005;85:154–9. https://doi.org/10.1038/labinvest.3700208.
Article CAS PubMed Google Scholar
McCarthy DJ, et al. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. 2017;33:1179–86. https://doi.org/10.1093/bioinformatics/btw777.
Article CAS PubMed PubMed Central Google Scholar
Aloisio, G. et al. Progengrid: A Grid Framework for Bioinformatics. In: Apolloni B, Marinaro M, Tagliaferri R, eds. Biological and Artificial Intelligence Environments. Springer: Dordrecht; 2005. ISBN: 978-1-4020-3432-9.
Ezziane Z. Applications of artificial intelligence in bioinformatics: a review. Expert Syst Appl. 2006;30:2–10. https://doi.org/10.1016/j.eswa.2005.09.042.
Article Google Scholar
De Brito DM, et al. A novel method to predict genomic islands based on mean shift clustering algorithm. PLoS ONE. 2016. https://doi.org/10.1371/journal.pone.0146352.
Article PubMed PubMed Central Google Scholar
Chakraborty I, Choudhury A. Artificial intelligence in biological data. J Inform Tech Softw Eng. 2017. https://doi.org/10.4172/2165-7866.1000207.
Article Google Scholar
D’haeseleer P. How does gene expression clustering work? Nat Biotechnol. 2005;23:1499–501. https://doi.org/10.1038/nbt1205-1499.
Article CAS PubMed Google Scholar
Fachel A, et al. Expression analysis and in silico characterization of intronic long noncoding RNAs in renal cell carcinoma: emerging functional associations. Mol Cancer. 2013;12:1–23. https://doi.org/10.1186/1476-4598-12-140.
Article CAS Google Scholar
Necsulea A, et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature. 2014;505:635–40. https://doi.org/10.1038/nature12943.
Article CAS PubMed Google Scholar
Hao Y, et al. Prediction of long noncoding RNA functions with co-expression network in esophageal squamous cell carcinoma. BMC Cancer. 2015;15:1–10. https://doi.org/10.1186/s12885-015-1179-z.
Article CAS Google Scholar
Wu W, et al. Tissue-specific Co-expression of Long non-coding and coding RNAs associated with breast cancer. Sci Rep. 2016;6:1–13. https://doi.org/10.1038/srep32731.
Article CAS Google Scholar
Li S, et al. Exploring functions of long noncoding RNAs across multiple cancers through co-expression network. Sci Rep. 2017. https://doi.org/10.1038/s41598-017-00856-8.
Article PubMed PubMed Central Google Scholar
Russo P, et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinform. 2018;19:1–13. https://doi.org/10.1186/s12859-018-2053-1.
Article CAS Google Scholar
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. https://doi.org/10.1186/gb-2010-11-3-r25.
Article CAS PubMed PubMed Central Google Scholar
Maza E, Frasse P, Senin P, Bouzayen M, Zouine M. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: a matter of relative size of studied transcriptomes. Commun Integr Biol. 2013;6:e25849. https://doi.org/10.4161/cib.25849.
Article CAS PubMed PubMed Central Google Scholar
Tibshirani R, et al. Estimating the Number of Clusters in a Dataset via the Gap Statistic. J Roy Stat Soc. 2001;63:411–23. https://doi.org/10.1111/1467-9868.00293.
Article Google Scholar
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 2002;3:1–21. https://doi.org/10.1186/gb-2002-3-7-research0036.
Article Google Scholar
Lin S, et al. Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci. 2014;111:17224–9. https://doi.org/10.1073/pnas.1413624111.
Article CAS PubMed PubMed Central Google Scholar
Zhao Q, et al. Knee Point Detection on Bayesian Information Criterion. In: 2008 20th IEEE international conference on tools with artificial intelligence, Dayton; 2008, p. 431–38. https://doi.org/10.1109/ictai.2008.154.
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.
Article Google Scholar
Murtagh F, Legendre P. Ward’s Hierarchical Agglomerative clustering method: which algorithms implement ward’s criterion? J Classif. 2014;31:274–95. https://doi.org/10.1007/s00357-014-9161-z.
Article Google Scholar
Taylor DH, et al. Long non-coding RNA regulation of reproduction and development. Mol Reprod Dev. 2005;82:932–56. https://doi.org/10.1002/mrd.22581.
Article CAS Google Scholar
Liu KS, et al. Advances of Long Noncoding RNAs-mediated Regulation in Reproduction. Chin Med J. 2018;131:226–34.
Article PubMed PubMed Central Google Scholar
Chen YG, et al. Gene regulation in the immune system by long noncoding RNAs. Nat Immunol. 2017;18:962–72. https://doi.org/10.1038/ni.3771.
Article CAS PubMed Google Scholar
Matamala JM, et al. Genome-wide circulating microRNA expression profiling reveals potential biomarkers for amyotrophic lateral sclerosis. Neurobiol Aging. 2018;64:123–38. https://doi.org/10.1016/j.neurobiolaging.2017.12.020.
Article CAS PubMed Google Scholar
Roberts TC, et al. The role of long non-coding RNAs in neurodevelopment, brain function and neurological disease. Philos Trans R Soc Long B Biol Sci. 2014. https://doi.org/10.1098/rstb.2013.0507.
Article Google Scholar
Salta E, De Strooper B. Noncoding RNAs in neurodegeneration. Nat Rev Neurosci. 2017;18:627–40. https://doi.org/10.1038/nrn.2017.90.
Article CAS PubMed Google Scholar
Wang GY, et al. The functional role of long non-coding RNA in digestive system carcinomas. Bull Cancer. 2014;9:E27–31. https://doi.org/10.1684/bdc.2014.2023.
Article Google Scholar
Zhou DD, et al. Long non-coding RNA PVT1: emerging biomarker in digestive ystem cancer. Cell Prolif. 2017. https://doi.org/10.1111/cpr.12398.
Article PubMed PubMed Central Google Scholar
Liao Q, et al. Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic Acids Res. 2011;39:3864–78. https://doi.org/10.1093/nar/gkq1348.
Article CAS PubMed PubMed Central Google Scholar
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. https://doi.org/10.1186/gb-2010-11-10-r106.
Article CAS PubMed PubMed Central Google Scholar
Maza E. In Papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA-seq experimental design. Front Genet. 2016;7:164. https://doi.org/10.3389/fgene.2016.00164.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank Dr. Savio Torres de Farias for the helpful discussions during the preparation of this manuscript.

Funding

This work was funded in part by grants from Fondecyt Iniciación, Comisión Nacional de Investigación Científica y Tecnológica (CONICYT), Chile, grant 11161020; Programa Nacional de Inserción de Capital Humano Avanzado en la Academia, PAI-CONICYT, Chile, grant PAI79170021; Fondo de Financiamiento de Centro de Investigación en Áreas Prioritárias (FONDAP), CONICYT, grant 15130011; Programa de Bienes Públicos Estratégicos para la Competitividad, Corporación de Fomento de la Producción (CORFO), Chile, grant 16BPE-62321; Subsidio Semilla de Asignación Flexible (SSAF), CORFO, grant 14-SSAF-27061-9; and Programa Start-Up Chile, CORFO, grant SUP12-13791. TARR received a Master degree fellowship from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil.

Author information

Authors and Affiliations

Programa de Pós-Graduação em Bioinformática, Bioinformatics Multidisciplinary Environment (BioME), Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, Brazil
Thaís A. R. Ramos, Vinicius Maracaja-Coutinho & Thaís G. do Rêgo
Advanced Center for Chronic Diseases (ACCDiS), Facultad de Ciencias Químicas y Farmacéuticas, Universidad de Chile, Santiago, Chile
Thaís A. R. Ramos & Vinicius Maracaja-Coutinho
Instituto Vandique, João Pessoa, Brazil
Vinicius Maracaja-Coutinho
Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
J. Miguel Ortega
Departamento de Informática, Centro de Informática, Universidade Federal da Paraíba, João Pessoa, Brazil
Thaís G. do Rêgo

Authors

Thaís A. R. Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Vinicius Maracaja-Coutinho
View author publications
You can also search for this author in PubMed Google Scholar
J. Miguel Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Thaís G. do Rêgo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TARR wrote the tool’s scripts and developed the web server. TARR, VMC, JMO and TGR wrote and reviewed the manuscript. VMC, JMO and TGR conceived and supervised the research. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Vinicius Maracaja-Coutinho, J. Miguel Ortega or Thaís G. do Rêgo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1.

Additional figures and tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ramos, T.A.R., Maracaja-Coutinho, V., Ortega, J.M. et al. CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles. BMC Res Notes 13, 338 (2020). https://doi.org/10.1186/s13104-020-05171-6

Download citation

Received: 17 March 2020
Accepted: 03 July 2020
Published: 14 July 2020
DOI: https://doi.org/10.1186/s13104-020-05171-6

CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles