De novo assembly of the Brown trout (Salmo trutta m. fario) brain and muscle transcriptome: transcript annotation, tissue differential expression profile and SNP discovery

Objectives The Brown trout is a salmonid species with a high commercial value in Europe. Life history and spawning behaviour include resident (Salmo trutta m. fario) and migratory (Salmo trutta m. trutta) ecotypes. The main objective is to apply RNA-seq technology in order to obtain a reference transcriptome of two key tissues, brain and muscle, of the riverine trout Salmo trutta m. fario. Having a reference transcriptome of the resident form will complement genomic resources of salmonid species. Data description We generate two cDNA libraries from pooled RNA samples, isolated from muscle and brain tissues of adult individuals of Salmo trutta m. fario, which were sequenced by Illumina technology. Raw reads were subjected to de-novo transcriptome assembly using Trinity, and coding regions were predicted by TransDecoder. A final set of 35,049 non-redundant ORF unigenes were annotated. Tissue differential expression analysis was evaluated by Cuffdiff. A False Discovery Rate (FDR) ≤ 0.01 was considered for significant differential expression, allowing to identify key differentially expressed unigenes. Finally, we have identified SNP variants that will be useful tools for population genomic studies.


Objective
Brown trout (Salmo trutta) has been extensively studied by its commercial and biological importance. From the sixty-six species in this family, S. trutta is a species native to Europe with a wide distribution area that includes Atlantic and Mediterranean European basins, as well as northern Africa and western Asia basins [1, 2].The specie has been introduced in North and South America and Australia by its commercial exploitation for sport fishing, as well as farmed for food and game fish, extending their actual geographical distribution as discontinuous populations on all continents except Antarctica [3].
Life history traits of Brown trout populations include resident forms such as riverine (S. trutta m. fario) and migratory forms such as anadromous (S. trutta m. trutta) ecotype [4,5]. Anadromous and non-anadromous forms coexist in the same river being apparently genetically indistinguishable [6,7]. An extended literature on Brown trout research has been produced that includes physiological, ecological and genetic aspects [8-10]. As a contribution to this global effort, here we provide a comprehensive transcriptome data set derived from brain and muscle tissues of Salmo trutta  . fario ecotype by using RNA-seq technology. We also evaluated differential transcript expression among these two tissues identifying key differentially expressed unigenes. Finally, we applied an in-silico pipeline that allow us to discover SNP variants useful for population genomic studies. The generated data could provide new valuable genomic resources for population genetic and genomic studies that can help to answer opened questions about the live history traits of riverine S. trutta m. fario as well as differences among S. trutta ecotypes.

Data description
Salmo trutta m. fario. brain and muscle tissues were collected from 25 wild type individuals (15 females) captured at the Falmisell river (Lleida, Catalonia). RNA pools from brain (10.2 µg) and muscle (11.4 µg) tissues were obtained with equimolar concentration from each subject. The TruSeq ™ RNA sample Prep Kit (Illumina, Madrid, Spain) was used to build cDNA libraries according to manufacturer instructions (Table 1, Data file 1). FASTQ sequence reads were assembled using Trinity [11] run on the paired end sequences with the We retained the longest ORF predicted for each contig sequence with a minimum of 100 amino acids long. Transcript redundancy was further reduced by CDhit [12], obtaining a final set of 35,189 non-redundant ORF unigenes as best cluster representatives ( Similarity search by Blast2GO renders a total of 28,132 (80%) unigenes with GO annotation. GO term were then simplified using a generic GOSlim vocabulary [13] (Table 1, Data file 14). The ten top GO terms among the Cellular Component (18,071, 64%), Molecular Function (20,691, 74%) and Biological Process (23,954, 85%) ontology at level 2 are shown in Table 1 (Data file 4). Mapping unigenes to the reference canonical pathways in the KEGG database, yields a total of 13,957 (39.8%) ORF unigenes assigned to 3421 KEGG terms (KO) defining a total of 386 pathways (Table 1, Data file 15).
Tissue specific transcriptome expression analysis was performed by normalization of raw reads (FPKM, fragments per kilobase of exon per million fragments) obtained from both tissues (

Limitations
The use of pooled RNA samples does not allow us to detect sex or individual specific transcript expression profiles as well as limit our capability to detect transcripts expressed at low level in a specific individual. In addition, pooled samples avoid us to resolve SNP frequency distribution, being this parameter indirectly estimated according to the observed SNP sequence coverage in the pooled sample.