Characterization of an Atlantic cod (Gadus morhua) embryonic stem cell cDNA library
© Olsvik et al; licensee BioMed Central Ltd. 2009
Received: 27 November 2008
Accepted: 06 May 2009
Published: 06 May 2009
The Atlantic cod is an ecologically and economically important North Atlantic fish species and also an emerging aquaculture species. To study gene expression in Atlantic cod embryonic stem (ES) cells, our goal was to generate and analyze expressed sequence tags (ESTs) from an ES cell cDNA library of mRNA consisting of approximately 3,900 ESTs.
We sequenced 3,935 EST clones using a directional cDNA library made from pooled ES cells harvested at the blastula stage. Quality filtering of these ESTs allowed identification of 2,719 high-quality sequences with an average length of 442 bp containing 368 contigs and 1,276 singletons (1,644 unique sequences). BLASTX searches produced 889 significant (E-value < 10-3) hits, of which 698 (42.5%) were annotated with Gene Ontology terms (E-value < 10-6). The number of unknown unique sequences was 946 (57.5%). All the high-quality EST sequences have been deposited in GenBank (GenBank: 2,719 sequences in UniGene library dbEST id: 22,021). Gene discovery and annotations are presented and discussed.
This set of ESTs represents one of the first attempts to describe mRNA in ES cells from a marine cold-water fish species, and provides a basis for gene expression studies of Atlantic cod ES cells.
Embryonic stem (ES) cells in culture and their development into different lineages is a unique model system that provide means to identify extracellular factors that influence embryonic cell differentiation and proliferation. ES cells are unique in their capacity to self-renew and to differentiate into multiple cell types. During differentiation, specific transcription factors activate the expression of genes that are required for each cell lineage. In addition, epigenetic regulation also appears to be a key mechanism for maintaining pluripotency and determining lineage specification . ES and embryoid bodies (partly differentiated cells) can be utilized to identify and characterize factors or genes including nutrients, growth factors and hormones that may affect cell proliferation, lineage differentiation and the expression of specific genes and proteins during developmental differentiation. In fish, mRNA in oocytes and early blastocyte embryos are thought to be exclusively of maternal origin; expression of embryonic genes occurs later, at the late-blastula or gastrula stage, the exact time for this transition varies between species [2–5].
The development of gene sequence knowledge (e.g. gene annotation) is of crucial importance within the field of functional genomics. Expressed sequence tags (EST) analysis is one of the most effective means for gene discoveries, gene expression profiling, and is also one of the most efficient ways for identification of differentially expressed genes. Sequencing of ESTs from cDNA libraries specially designed from ES cells and their application to fish exposed to dietary undesirables and environmental toxicants is of importance to determine how mal-nutrition and toxicants affect cell differentiation and proliferation. EST sequencing is also the first step on the road to proteomics, a core element in functional genomics, which includes methods for detecting protein expression and for detecting protein-protein interactions. Cell-based assays designed for the detection of nutritionally or chemically induced cellular stress are of particular interest for the analysis of differentially expressed genes at different developmental stages of fish. Short-term in vitro assays could also be applied to study the mechanistic basis of toxicity and could offer a rapid and economically inexpensive bioassay for fish. The Atlantic cod is a key ecological species in the North Atlantic, but is also an important commercial species. In recent years many North Atlantic stocks have plummeted as a result of overfishing, triggering efforts to establish a cod aquaculture industry. Sequencing of genes, transcription factors and receptors would be of high importance to better understand how nutrition's like vitamin A and fatty acids affect cell differentiation in early life stages of fishes, contributing to the nutritional aspect of a healthy and successful cod aquaculture.
The aim of this study was therefore to generate and analyze a cDNA library for the study of gene expression in cultures of developing Atlantic cod ES cells, and to generate an EST resource for this increasingly important aquaculture species. Here we report sequencing of 3,935 EST clones, and generation of 2,719 high quality EST sequences from Atlantic cod ES cells. A brief examination of these EST sequences indicates that most of them are involved in binding (protein, DNA, RNA), catalytic (oxidoreductase), structural molecule (ribosomal) and transporter (transferase) activities.
2.1. Sampling of cells
RNA harvested from ES cells was used for cDNA library construction. Newly fertilized eggs of Atlantic cod were obtained from Marin Harvest Cod, Øygarden. Eggs in the blastula stage (22–36 hour post fertilization (hpf)  were carefully crushed and the ES cells harvested. Cells were carefully washed and the cell pellet stored at -80°C until required. To assess changes in EST expression during differentiation by qRT-PCR cells were harvested from two separate batches of fertilized cod eggs stored at 6°C at; 22–24 hpf (day 1), 24–48 hpf (day 2) and 132–144 hpf (day 6). Newly hatched yolk-sac larvae at 1 day past hatching (dph) and larvae at 6 dph were also pooled, homogenized in Trizol and stored at -80°C for qRT-PCR analysis.
2.2. RNA extraction
Total RNA was extracted using phenol chloroform extraction (TRIZOL, Invitrogen, USA) and residual genomic DNA was removed by DNase treatment using DNA-free (Ambion, Austin TX, USA) according to the manufacturers' instructions. mRNA for construction of the cDNA library was isolated using Dynabeads mRNA Purification Kit (Dynal Biotech ASA, Oslo, Norway) according to the manufacturers' instructions. Quality and quantity of total RNA and mRNA was determined using NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE). RNA integrity was assessed using the RNA 6000 Nano LabChip kit and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, USA).
2.3. cDNA library construction, sequencing and data processing
A directional un-normalized library was constructed using pBluescript® II XR Library Construction Kit (Stratagene Cloning Systems, La Jolla, USA) and (dT) 18 primers. The finished cDNA was inserted into the vector in a sense orientation. Plasmid vectors were ligated into E-coli cells. A total of 3,935 EST clones were sequenced from their 3'-ends using T3 primer (T3: 5'-AATTAACCCTCACTAAAGGGA-3'). The inserts were sequenced using the MegaBACE 1000 platform using DYEmamic ET dye terminators (GE Healthcare). All ESTs were sequenced at a commercial facility (BGI LifeTech Co. Ltd., Beijing, China) after construction of the cDNA library.
The chromatogram files obtained were processed using Phred for base calling and vector sequences from UniVec were removed using cross match . Clustering, assembly and quality filtering of resulting contigs and singletons was carried out using the assembly pipeline developed at the Computational Biology Unit of BCCS at the University of Bergen. This pipeline includes repeat masking using RBR and assembly using CAP3 [7, 8]. BLAST alignments of quality filtered contig and singleton sequences were then carried out against the GenBank non-redundant protein and nucleotide databases and dbEST (BLASTX and BLASTN, respectively, default parameters).
2.3. Sequence analysis
High-quality sequences deposited in the GenBank after base calling and vector trimming as described above were imported into the Vector NTI Advance 10 and Blast2GO softwares and further analyzed. Assembly and clustering were done using Vector NTI with default settings to obtain contigs and singletons. Gene ontology (GO) annotations were assigned using Blast2GO . All 1,644 unique sequences were compared to the GenBank database as of February 2009 using BLASTX. The cut-off for sequence similarity was E-value < 10-3 for the BLAST searches and <10-6 for the annotation step. From these annotations, pie charts of sequence distribution were made using a 2nd level analysis and filter cut-off of 30 sequences based on biological process, molecular function, and cellular component.
2.4. Real-time qRT-PCR
PCR primers, amplicon sizes and function of genes selected for qRT-PCR analysis.
Forward primer (5' – 3')
Reverse primer (5' – 3')
Function (from mammalian homologs)
BMI1 polycomb ring finger oncogene
Component of the Polycomb group (PcG) multiprotein PRC1 complex, a complex required to maintain the transcriptionally repressive state of many genes, including Hox genes, throughout development. PcG PRC1 complex acts via chromatin remodeling and modification of histones; it mediates monoubiquitination of histone H2A 'Lys-119', rendering chromatin heritably changed in its expressibility. In the PRC1 complex, it is required to stimulate the E3 ubiquitin-protein ligase activity of RNF2/RING2
One-eyed pinhead protein
Involved in the correct establishment of the left-right axis. May play a role in mesoderm and/or neural patterning during gastrulation
Ovary-expressed homeobox protein (shows similarity to Nanog)
Transcription regulator involved in inner cell mass and embryonic stem (ES) cells proliferation and self-renewal. Imposes pluripotency on ES cells and prevents their differentiation towards extraembryonic endoderm and trophectoderm lineages. Blocks bone morphogenetic protein-induced mesoderm differentiation of ES cells by physically interacting with SMAD1 and interfering with the recruitment of coactivators to the active SMAD transcriptional complexes (by similarity). Acts as a transcriptional activator or repressor (by similarity). Binds optimally to the DNA consensus sequence 5'-[CG] [GA] [CG]C [GC]ATTAN [GC]-3' (by similarity). When overexpressed, promotes cells to enter into S phase and proliferation (by similarity)
Pre-mRNA-processing factor 19 (PRP19)
Plays a role in DNA double-strand break (DSB) repair and pre-mRNA splicing reaction. Binds double-stranded DNA in a sequence-nonspecific manner. Acts as a structural component of the nuclear framework. May also serve as a support for spliceosome binding and activity. Essential for spliceosome assembly in a oligomerization-dependent manner and might also be important for spliceosome stability. May have E3 ubiquitin ligase activity. The PSO4 complex is required in the DNA interstrand cross-links (ICLs) repair process. Overexpression of PRPF19 might extend the cellular life span by increasing the resistance to stress or by improving the DNA repair capacity of the cells
Results and discussion
cDNA library overview
A summary of the EST analysis.
Total number of sequences
Number of high-quality sequences
Number of contigs
Number of clones included in the contigs
Number of singletons
Number of unique sequences
Known unique sequences
No BLASTX hits
Unknown unique sequences
The most abundant ESTs detected from the 2719 high-quality clones in the cDNA library.
# of sequences
% of total
H3 histone, subunit 3B
Cold-inducible RNA binding protein
Histone H2A family member ZA
Cytochrome c oxidase subunit II
28S ribosomal RNA gene
High-mobility group box 2
Chromobox protein homolog 3
Histone H2A family member X
THAP-domain containing protein 9
Cytochrome c oxidase subunit I
Small nuclear ribonucleoprotein polypeptide D1
Ribonucleoside-diphosphate reductase subunit M2
Small nuclear ribonucleoprotein polypeptide D3
H3 histone, subunit 3A
18S ribosomal RNA gene
18S ribosomal RNA gene
The authors like to thank Anders Lanzen, BCCS, University of Bergen, for help with large-scale annotation of sequences. Kai K. Lie, Natalie Larsen and Hui-Shan Tung are thanked for technical and analytical help and David Boyle (all NIFES) is thanked for proofreading the manuscript. We also like to thank Marine Harvest Cod, Øygarden, for providing the eggs and larvae. This project was financed by the Norwegian Research Council 165233/S40 and 173534/I30 grants.
- Efroni S, Duttagupta R, Cheng J, Dehghani H, Hoeppner DJ, Dash C, Bazett-Jones DP, Le Grice S, Mckay RDG, Buetow KH, et al: Global transcription in pluripotent embryonic stem cells. Cell Stem Cell. 2008, 2 (5): 437-447. 10.1016/j.stem.2008.03.021.PubMed CentralView ArticlePubMedGoogle Scholar
- Hall TE, Smith P, Johnston IA: Stages of embryonic development in the Atlantic Cod Gadus morhua. J Morphol. 2004, 259: 255-270. 10.1002/jmor.10222.View ArticlePubMedGoogle Scholar
- Aizawa K, Shimada A, Naruse K, Mitani H, Shima A: The medaka midblastula transition as revealed by the expression of the paternal genome. Gene Expression Patterns. 2003, 3 (1): 43-47. 10.1016/S1567-133X(02)00075-3.View ArticlePubMedGoogle Scholar
- O'boyle S, Bree RT, McLoughlin S, Grealy M, Byrnes L: Identification of zygotic genes expressed at the midblastula transition in zebrafish. Biochem Biophys Res Comm. 2007, 358 (2): 462-468. 10.1016/j.bbrc.2007.04.116.View ArticlePubMedGoogle Scholar
- Xu HY, Li MY, Gui JF, Hong YH: Cloning and expression of medaka dazl during ernbryogenesis and gametogenesis. Gene Expression Patterns. 2007, 7 (3): 332-338. 10.1016/j.modgep.2006.08.001.View ArticlePubMedGoogle Scholar
- Ewing B, Green P: Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 1998, 8 (3): 186-194.View ArticlePubMedGoogle Scholar
- Huang XQ, Madan A: Cap3: A DNA sequence assembly program. Genome Res. 1999, 9 (9): 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Malde K, Schneeberger K, Coward E, Jonassen I: RBR: library-less repeat detection for ESTs. Bioinformatics. 2006, 22 (18): 2232-2236. 10.1093/bioinformatics/btl368.View ArticlePubMedGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.View ArticlePubMedGoogle Scholar
- Vandesompele J, Preter KD, Pattyn F, Poppe B, Roy NV, Paepe AD, Speleman F: Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002, 3 (7): research0034.1-0034.11. 10.1186/gb-2002-3-7-research0034.View ArticleGoogle Scholar
- Andersen CL, Jensen JL, Orntoft TF: Normalization of real-time quantitative reverse transcription-PCR data: A model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Research. 2004, 64 (15): 5245-5250. 10.1158/0008-5472.CAN-04-0496.View ArticlePubMedGoogle Scholar
- Olsvik PA, Søfteland Liv, Lie Kai K: Selection of reference genes for qRT-PCR examination of wild populations of Atlantic cod Gadus morhua. BMC Res Notes. 2008, 1: 47-10.1186/1756-0500-1-47.PubMed CentralView ArticlePubMedGoogle Scholar
- UniGene Atlantic cod Embryonic Stem Cell cDNA library. [http://www.ncbi.nlm.nih.gov/UniGene/library.cgi?ORG=Gmr&LID=22021]
- UniGene Atlantic cod cDNA libraries. [http://www.ncbi.nlm.nih.gov/UniGene/lbrowse2.cgi?TAXID=8049&log$=breadcrumbs]
- GeneCards Database. [http://www.genecards.org/index.shtml]
- Zhao XD, Ruan YJ, Wei CL: Tackling the epigenome in the pluripotent stem cells. J Genet Genomics. 2008, 35 (7): 403-412. 10.1016/S1673-8527(08)60058-2.View ArticlePubMedGoogle Scholar
- Tipsmark CK: Identification of FXYD protein genes in a teleost: tissue-specific expression and response to salinity change. Am J Physiol Regul Integr Comp Physiol. 2008, 294: R1367-R1378.View ArticlePubMedGoogle Scholar
- Gene Ontology Consortium. [http://www.geneontology.org/]
- Willems E, Mateizel I, Kemp C, Cauffman G, Sermon K, Leyns L: Selection of reference genes in mouse embryos and in differentiating human and mouse ES cells. Int J Develop Biol. 2006, 50 (7): 627-635. 10.1387/ijdb.052130ew.View ArticleGoogle Scholar
- Sun Y, Li H, Yang H, Rao MS, Zhan M: Mechanisms controlling embryonic stem cell self-renewal and differentiation. Crit Rev Eukaryot Gene Expr. 2006, 16 (3): 211-231.View ArticlePubMedGoogle Scholar
- Liang J, Wan M, Zhang Y, Gu PL, Xin HW, Jung SY, Qin J, Wong JM, Cooney AJ, Liu D, et al: Nanog and Oct4 associate with unique transcriptional repression complexes in embryonic stem cells. Nature Cell Biol. 2008, 10 (6): 731-739. 10.1038/ncb1736.View ArticlePubMedGoogle Scholar