No3CoGP: non-conserved and conserved coexpressed gene pairs

Mal, Chittabrata; Aftabuddin, Md; Kundu, Sudip

doi:10.1186/1756-0500-7-886

Technical Note
Open access
Published: 08 December 2014

No3CoGP: non-conserved and conserved coexpressed gene pairs

Chittabrata Mal¹,
Md Aftabuddin² &
Sudip Kundu^1,3

BMC Research Notes volume 7, Article number: 886 (2014) Cite this article

1371 Accesses
2 Citations
5 Altmetric
Metrics details

Abstract

Background

Analyzing the microarray data of different conditions, one can identify the conserved and condition-specific genes and gene modules, and thus can infer the underlying cellular activities. All the available tools based on Bioconductor and R packages differ in how they extract differential coexpression and at what level they study. There is a need for a user-friendly, flexible tool which can start analysis using raw or preprocessed microarray data and can report different levels of useful information.

Findings

We present a GUI software, No3CoGP: Non-Conserved and Conserved Coexpressed Gene Pairs which takes Affymetrix microarray data (.CEL files or log2 normalized.txt files) along with annotation file (.csv file), Chip Definition File (CDF file) and probe file as inputs, utilizes the concept of network density cut-off and Fisher’s z-test to extract biologically relevant information. It can identify four possible types of gene pairs based on their coexpression relationships. These are (i) gene pair showing coexpression in one condition but not in the other, (ii) gene pair which is positively coexpressed in one condition but negatively coexpressed in the other condition, (iii) positively and (iv) negatively coexpressed in both the conditions. Further, it can generate modules of coexpressed genes.

Conclusion

Easy-to-use GUI interface enables researchers without knowledge in R language to use No3CoGP. Utilization of one or more CPU cores, depending on the availability, speeds up the program. The output files stored in the respective directories under the user-defined project offer the researchers to unravel condition-specific functionalities of gene, gene sets or modules.

Findings

Background

Analysis of differential expression of a gene, coexpression of gene pairs and set of genes in one condition (class) compared to the other condition helps the researchers to unravel condition-specific functionalities of gene, gene sets or modules [1]. Some of the important biological phenomena which can be unravelled from this analysis, include the loss of function of a protein complex, identifying functional gene modules in a diseased condition [2] and the combinatorial regulation of genes [3] etc. Several algorithms and software, available for this purpose, mainly vary in how those quantify coexpression [reviewed in [4]], differential coexpression [5] and what biological questions those want to address. While some detect the coexpressed gene pairs using a measure of correlation (e.g., Pearson Correlation Coefficient), others like ARACNE [6] and CLR [7] use mutual information theory. Among the tools used to identify differential coexpression, DICER can identify the set of genes showing significantly higher coexpression in one class than other [8]. Very recently developed C3D can detect both common and condition specific clusters based on sophisticated statistical algorithm; however it is MATLAB dependent [9]. Within the R-packages, while the previous version of DCGL can differentiate significant changes in coexpression among different conditions, the latest version (DCGL 2.0) can identify the differential regulation [3].

Here, we have developed a user friendly, flexible, standalone software, No3CoGP which takes Affymetrix microarray data (.CEL files or log2 normalized.txt files) along with annotation file (.csv file), Chip Definition File (CDF file) and probe file as inputs and can identify four possible types of gene pairs based on their coexpression relationships. To extract the biologically significant gene pairs, we have implemented different levels of filterings including the concept of network density cut-off suggested by Aoki et al. [10]. Among the four possible types of gene pairs two are non-conserved gene pairs - (i) showing coexpression in one class but not in other (differentially coexpressed non-conserved gene pair, DCNCGP), (ii) positively coexpressed in one class but negatively coexpressed in the other class (contra-coexpressed non-conserved gene pair, CCNCGP); other two are conserved gene pairs - (iii) positively coexpressed in both the classes (positively coexpressed conserved gene pair, PCCGP) and (iv) negatively coexpressed in both the classes (negatively coexpressed conserved gene pair, NCCGP). Finally, modules of different types of coexpressed genes can be generated.

Implementation

The Java-based program has been written to utilize one or more CPU cores available during extensive calculations and tested in LINUX and Windows environments. However, run-time will depend on the size of input data. The GUI consists of top menu bar, project steps, job status, parameters used box and message box (Figure 1). In the console the user can view various parameters, status and results. Since the computational algorithm of No3CoGP is layered into multiple steps, the user can halt the process after the completion of any intermediate step and can resume the job later. The work-flow of the software is given in Figure 2.

Results and discussion

The functionality and applicability of No3CoGP have been expedited in a case study. We have tested and analyzed microarray data (samples from series GSE19326) of three tomato tissues (e.g., leaf, peel and flesh) to find out conserved and non-conserved gene pairs. Using ‘UniGene’ as annotation id, ‘GCRMA’ as normalization method and Pearson Correlation as correlation measure we have found several gene pairs and modules at different significant levels (see Additional file 1. Coexpressed gene pairs and modules). Further analysis is required to identify the significant relationships among the genes present in the modules and to understand the underlying principle of gene regulation.

Any tool analyzing the co-expression of larger microarray data must meet few challenges including (i) reduction of the execution time, (ii) extraction of biologically relevant information and (iii) generation of different levels of information using the same tool. To effectively reduce the runtime, we have implemented few tricks. To reduce the network size as well as to retain biologically significant information as much as possible, No3CoGP first determines the network density cut-off value from the initial reference co-expression network. Based on this cut-off value, the co-expressed gene pairs have been filtered to construct a new network for further analysis. Moreover, the flexibility lies in the setting of parameters according to the depth of the study. The users can choose other two filtering options. Briefly, one is the selection of gene pairs having higher r-values and another is the selection of statistically significant r-values within two classes. Moreover, No3CoGP can utilize multiple CPU cores depending on the availability at user’s end. Our tool implements affymetrix annotation files and data files for a wide range of species to identify differentially co-expressed genes and gene modules. Utilizing biologically and statistically significant r-cutoff value for gene coexpression study it can identify different types of conserved and non-conserved genes and gene modules which are functionally significant.

Traditional differential expression analysis may not detect the changes in regulatory pattern of a gene. On the contrary, changes in the coexpression network structure may have predictive power to identify the candidate disease genes [11]. To unravel the complex dysregulations, one must peep into the differentially coexpressed genes at a systems biology level [reviewed in [12]]. Choi et al. [2] observed correlation between increased (or decreased) coexpression interactions and enhancement (or inactivation) of functional interactions. However, an increase or decrease in the correlation of a gene pair may be due to the up- or down-regulation of other genes in the same functional category. The modular coexpression analysis can potentially highlight the novel disease-causing biomarkers. Here, the gene modules identified by IPCA algorithm would be useful to find out potential functions of the genes and also the genetic dysregulations among the classes. Further, the output files may be used to calculate various node topology statistics like the node degree, the betweenness, the closeness and other network properties using Cytoscape [13], Pajek [14], NeAT [15] etc. The user can visualize the graphs, compare those and also perform functional analysis.

No3CoGP key functionalities

Data import

Affymetrix microarray data (.CEL files or log2 normalized.txt files) and the annotation files of interest have to be kept in the respective directories of the project. CDF file and probe file for each different project must be pre-installed. A text file containing the class names should be created and given as input. (See online documentation for details).

Probe mapping

Any one of the available annotation ids among ‘UniGene ID’, ‘Gene Symbol’, ‘Ensembl’ and ‘Entrez Gene’ can be used for analysis. Affy probeset ids will be mapped with their annotation ids. Probesets which will map to single id will only be retained.

Data preprocessing

User can use the raw microarray data or log2 normalized data as input. If raw data is used as input, either RMA or GCRMA of R-package may be used for data normalization. On the other hand, the above step will be skipped if log2 normalized data itself is provided as input. When multiple probes map to the same gene and if the expression values of these probes vary in a particular condition, we have followed the method used by Dozmorov et al. [16] to set the expression value of that gene. Let us consider a case that the three probes (P1, P2 and P3) map to the gene G1 and we are considering expression of the gene G1 in two different samples (Sample 1 and Sample 2). Further we consider that the expression of these three probes vary within a sample. If the expression of probe P1 is highest in Sample 1, then the expression of the gene G1 is set to the expression value of the probe P1. On the other hand, if the expression of probe P3 is highest in Sample 2, then we set the expression of gene G1 to the value of expression of probe P3 in Sample 2.

Filtering based on coexpression values

Coexpression values of gene pairs will be calculated using Pearson Correlation Coefficient (r) or Spearman’s rank Correlation Coefficient (ρ). The formulae are as follows $r = \frac{n \sum x_{i} y_{i} - \sum x_{i} \sum y_{i}}{\sqrt{n \sum x_{i}^{2} - {(\sum x_{i})}^{2}} \sqrt{n \sum y_{i}^{2} - {(\sum y_{i})}^{2}}}$ where r = Pearson correlation coefficient, x_i = values in first set of data, y_i = values in second set of data and n = total number of values and $ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}$ where ρ = Spearman’s rank correlation coefficient, n = number of paired ranks and d_i = difference between the paired ranks. Considering genes as nodes and giving a link between them, if |r| > 0.6, a coexpression network will be constructed [10, 17, 18]. The density (D) of the network for each r value will be calculated according to the formula D=2E/K(K-1), where E is the number of actual links and K is the number of non singleton nodes. To extract useful information from gene coexpression data, we have followed the method described in [10]. Thus, the gene pairs which satisfy |r| > r_bs will only be considered in next steps. The r_bs is the r-value where the network density of the coexpression network is minimum. User, further, can choose statistically significant (any of top 1%, 5%, 10%, 50% or all) gene pairs (from those with |r| > 0.6) for their analysis. The P-values of r-values have been determined by calculating t statistic using $t = \frac{r \sqrt{d - 2}}{\sqrt{1 - r^{2}}}$ where d=degrees of freedom. Its significance level P, is given by Student’s Distribution Probability Function [19]: P = A(t|d).

Identification of conserved and non-conserved gene pairs

A gene pair is defined as conserved if it is coexpressed in both the classes with same sign. It can be classified into two types (i) PCCGP (if r-value is positive) and (ii) NCCGP (if r-value is negative). On the contrary, the gene pair which is coexpressed only in one class, or coexpressed in both the classes but with different signs is defined as non-conserved gene pair. Those are termed as DCNCGP or CCNCGP, respectively. When the coexpression values of a gene pair in two classes will be compared, only the gene pair having significantly different r-values in two classes will be considered. Gene pairs will be identified at the user defined significance level (1%, 5% or 100%) by Fisher’s z-test [18]. If there are two correlations with sample sizes n₁ and n₂, both of these are transformed into Fisher’s Z values, $Z = \frac{1}{2 [\frac{ln (1 + r)}{1 - r}]}$ . Under the null hypothesis that the population correlations are equal, the Z value, $Z = \frac{| Z_{1} - Z_{2} |}{\sqrt{\frac{1}{n_{1} - 3} + \frac{1}{n_{2} - 3}}}$ , has an approximately normal distribution.

Identification of modules

The gene modules will be identified using IPCA algorithm [20] which uses a combination of subgraph diameter and subgraph density. Li et al. [20] showed that the algorithm IPCA works better to identify complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM. IPCA algorithm requires a threshold (T_in ranging between 0 and 1) and a diameter (d) which is a positive integer.

Results output

Following the dialog windows, users can obtain a full list of gene pairs coexpressed in a specific class; positively, negatively or contra-coexpressed gene pairs, their r-values and P-values. The result files will be stored in respective subdirectories of ‘Output’ directory under user-defined project. Similarly, information of the gene modules will be saved in two different files under the respective ‘Modules’ sub-directories. One of them contains module number along with its participating genes; the other contains two column data (related gene pairs) which can be utilized in other network analyzing software like Cytoscape [13], Pajek [14], NeAT [15] etc.

Conclusion

Starting from the Affymetrix raw data or log2 normalized data, No3CoGP provides a user friendly platform to identify both differentially conserved and non-conserved gene pairs and to generate the modules of genes. Within the conserved category, user can get positively and negatively coexpressed gene pairs separately. Again, within the non-conserved category, it can classify the contra-coexpressed gene pairs and gene pairs which are coexpressed in only one class. Using up-to-date annotation files provided by the users and biologically and statistically significant r-value as cutoff identified by the tool itself, No3CoGP expedites extracting useful information. It can utilize single or multiple CPU cores depending on the availability. The output files stored in the respective directories can be used for further analysis.

Availability and requirements

Project name: No3CoGPProject home page:http://www.bioinformatics.org/no3cogp/Operating system(s): Linux/Unix, Windows (32 bit and 64 bit)Programming language: JavaOther requirements: Java JRE 1.7 or higher, R, Bioconductor packages (see online documentation)License: NAAny restrictions to use by non-academics: None.

Availability of supporting data

Sample data: http://www.bioinformatics.org/no3cogp/downloads/sample\_data.zipAdditional file: http://www.bioinformatics.org/no3cogp/downloads/additional\_file.xls

Abbreviations

CDF:: Chip description file
GUI:: Graphical user interface
CPU:: Central processing unit.

References

de la Fuente:From ‘differential expression’ to ‘differential networking’–identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010, 26 (7): 326-333. 10.1016/j.tig.2010.05.001.
Article Google Scholar
Choi JK, Yu U, Yoo OJ, Kim S:Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics. 2005, 21 (24): 4348-4355. 10.1093/bioinformatics/bti722.
Article PubMed CAS Google Scholar
Yang J, Yu H, Liu B-H, Zhao Z, Liu L, Ma L-X, Li Y-X, Li Y-Y:DCGL v2.0: An R package for unveiling differential regulation from differential co-expression. PloS One. 2013, 8 (11): 79729-10.1371/journal.pone.0079729.
Article Google Scholar
Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, Chow A, Steinhauser D, Persson S, Provart NJ:Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ. 2009, 32 (12): 1633-1651. 10.1111/j.1365-3040.2009.02040.x.
Article PubMed CAS Google Scholar
Amar D:Using differential co-expression for dissecting biological processes and revealing disease specific gene regulation. PhD thesisTel-Aviv University. 2012,
Google Scholar
Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A, Margolin A A:ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf. 2006, 7 (Suppl 1): 7-10.1186/1471-2105-7-S1-S7.
Article Google Scholar
Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS:Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007, 5 (1): 8-10.1371/journal.pbio.0050008.
Article Google Scholar
Amar D, Safer H, Shamir R:Dissection of regulatory networks that are altered in disease via differential co-expression. PLoS Comput Biol. 2013, 9 (3): 1002955-10.1371/journal.pcbi.1002955.
Article Google Scholar
Xiao X, Moreno-Moral A, Rotival M, Bottolo L, Petretto E:Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules. PLoS Genet. 2014, 10 (1): 1004006-10.1371/journal.pgen.1004006.
Article Google Scholar
Aoki K, Ogata Y, Shibata D:Approaches for extracting practical information from gene co-expression networks in plant biology. Plant Cell Physiol. 2007, 48 (3): 381-390. 10.1093/pcp/pcm013.
Article PubMed CAS Google Scholar
Zhang B, Gaiteri C, Bodea L-G, Wang Z, McElwee J, Podtelezhnikov AA, Zhang C, Xie T, Tran L, Dobrin R, Fluder E, Clurman B, Melquist S, Narayanan M, Suver C, Shah H, Mahajan M, Gillis T, Mysore J, MacDonald ME, Lamb JR, Bennett DA, Molony C, Stone DJ, Gudnason V, Myers AJ, Schadt EE, Neumann H, Zhu J, Emilsson V:Integrated systems approach identifies genetic nodes and networks in late-onset, alzheimer’s disease. Cell. 2013, 153 (3): 707-720. 10.1016/j.cell.2013.03.030.
Article PubMed CAS PubMed Central Google Scholar
Gaiteri C, Ding Y, French B, Tseng GC, Sibille E:Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes, Brain and Behavior. 2014, 13 (1): 13-24. 10.1111/gbb.12106.
Article CAS Google Scholar
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T:Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504. 10.1101/gr.1239303.
Article PubMed CAS PubMed Central Google Scholar
Batagelj V, Mrvar A:Pajek-program for large network analysis. Connections. 1998, 21 (2): 47-57.
Google Scholar
Brohée S, Faust K, Lima-Mendez G, Sand O, Janky R, Deville Y, van Helden J, Vanderstocken G:NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res. 2008, 36 (suppl 2): 444-451.
Article Google Scholar
Dozmorov MG, Giles CB, Wren JD:Predicting gene ontology from a global meta-analysis of 1-color microarray experiments. BMC Bioinf. 2011, 12 (Suppl 10): 14-10.1186/1471-2105-12-S10-S14.
Article Google Scholar
Elo LL, Lahesmaa R, Aittokallio T, Järvenpää H:Systematic construction of gene coexpression networks with applications to human t helper cell differentiation process. Bioinformatics. 2007, 23 (16): 2096-2103. 10.1093/bioinformatics/btm309.
Article PubMed CAS Google Scholar
Fukushima A, Nishizawa T, Hayakumo M, Hikosaka S, Saito K, Goto E, Kusano M:Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches. Plant Physiol. 2012, 158 (4): 1487-1502. 10.1104/pp.111.188367.
Article PubMed CAS PubMed Central Google Scholar
Shannon CE, Weaver W: The Mathematical Theory of Communication (Urbana, IL). 1949, University of Illinois Press IL,
Google Scholar
Li M, Hu B, Chen G, Chen J-e:Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinf. 2008, 9 (1): 398-10.1186/1471-2105-9-398.
Article Google Scholar

Download references

Acknowledgements

Authors are grateful to http://www.bioinformatics.orgfor hosting No3CoGP tool and Distributed Information Center (DIC), University of Calcutta for computational facilities. CM acknowledges Department of Science and Technology - Promotion of University Research and Scientific Excellence (DST-PURSE), Government of India and SK acknowledges Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase II) for financial support.

Author information

Authors and Affiliations

Department of Biophysics, Molecular Biology & Bioinformatics, University of Calcutta, 92, A.P.C. Road, Kolkata, 700009, India
Chittabrata Mal & Sudip Kundu
West Bengal University of Technology, BF-142, Salt Lake, Sector I, Kolkata, 700064, India
Md Aftabuddin
Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase II), University of Calcutta, Kolkata, India
Sudip Kundu

Authors

Chittabrata Mal
View author publications
You can also search for this author in PubMed Google Scholar
Md Aftabuddin
View author publications
You can also search for this author in PubMed Google Scholar
Sudip Kundu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudip Kundu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

CM and SK conceived the study and drafted the manuscript. MA and CM performed the designing, coding and debugging. CM carried out software testing and manuscript preparation. All authors have read and approved the final manuscript.

Electronic supplementary material

13104_2014_3391_MOESM1_ESM.xls

Additional file 1:Coexpressed gene pairs and modules. The.xls file contains results of test data of tomato. Index sheet describes other eight different sheets. (XLS 10 MB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Mal, C., Aftabuddin, M. & Kundu, S. No3CoGP: non-conserved and conserved coexpressed gene pairs. BMC Res Notes 7, 886 (2014). https://doi.org/10.1186/1756-0500-7-886

Download citation

Received: 05 August 2014
Accepted: 17 November 2014
Published: 08 December 2014
DOI: https://doi.org/10.1186/1756-0500-7-886

No3CoGP: non-conserved and conserved coexpressed gene pairs

Abstract

Background

Findings

Conclusion

Findings

Background

Implementation

Results and discussion

No3CoGP key functionalities

Data import

Probe mapping

Data preprocessing

Filtering based on coexpression values

Identification of conserved and non-conserved gene pairs

Identification of modules

Results output

Conclusion

Availability and requirements

Availability of supporting data

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

13104_2014_3391_MOESM1_ESM.xls

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

About this article

Cite this article

Keywords

BMC Research Notes

Contact us

No3CoGP: non-conserved and conserved coexpressed gene pairs

Abstract

Background

Findings

Conclusion

Findings

Background

Implementation

Results and discussion

No3CoGP key functionalities

Data import

Probe mapping

Data preprocessing

Filtering based on coexpression values

Identification of conserved and non-conserved gene pairs

Identification of modules

Results output

Conclusion

Availability and requirements

Availability of supporting data

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

13104_2014_3391_MOESM1_ESM.xls

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Research Notes

Contact us