Construction of a high-coverage bacterial artificial chromosome library and comprehensive genetic linkage map of yellowtail Seriola quinqueradiata

Background Japanese amberjack/yellowtail (Seriola quinqueradiata) is a commonly cultured marine fish in Japan. For cost effective fish production, a breeding program that increases commercially important traits is one of the major solutions. In selective breeding, information of genetic markers is useful and sufficient to identify individuals carrying advantageous traits but if the aim is to determine the genetic basis of the trait, large insert genomic DNA libraries are essential. In this study, toward prospective understanding of genetic basis of several economically important traits, we constructed a high-coverage bacterial artificial chromosome (BAC) library, obtained sequences from the BAC-end, and constructed comprehensive female and male linkage maps of yellowtail using Simple Sequence Repeat (SSR) markers developed from the BAC-end sequences and a yellowtail genomic library. Results The total insert length of the BAC library we constructed here was estimated to be approximately 11 Gb and hence 16-times larger than the yellowtail genome. Sequencing of the BAC-ends showed a low fraction of repetitive sequences comparable to that in Tetraodon and fugu. A total of 837 SSR markers developed here were distributed among 24 linkage groups spanning 1,026.70 and 1,057.83 cM with an average interval of 4.96 and 4.32 cM in female and male map respectively without any segregation distortion. Oxford grids suggested conserved synteny between yellowtail and stickleback. Conclusions In addition to characteristics of yellowtail genome such as low repetitive sequences and conserved synteny with stickleback, our genomic and genetic resources constructed and revealed here will be powerful tools for the yellowtail breeding program and also for studies regarding the genetic basis of traits.


Background
Species of yellowtail (family Carangidae) are widely distributed in the world's ocean and are major target species for fisheries and aquaculture. The Japanese amberjack/yellowtail (Seriola quinqueradiata) is one of the most popular fish for consumption in Japan, where about 150,000 tons of farmed fish are produced each year. Although there is a huge market demand, seeds of this fish mostly rely on wild catch and hence artificial seed production is required for stable cultivation and breeding as well as reducing the negative effects of large-scale sampling of seed fish on natural stock. It is well known that using cultured brood fish for seed production reduces the environmental impact and allows the selection of commercially important traits and in such a case, marker-assisted selection (MAS) breeding based on studies regarding quantitative trait locus (QTL) is powerful and cost effective choice. Indeed, QTL studies have been performed in several fishes so far to improve production and lifehistory traits such as disease resistance and enhance growth rate [1]. To enable the QTL studies, linkage maps are required. In yellowtail, although a female linkage map has been constructed with 180 microsatellite markers [2,3], the number of markers is not sufficient for fine QTL mapping and/or MAS in yellowtail breeding programs. Therefore, a higher-density linkage map is still required.
To isolate simple sequence repeats (SSRs) such as microsatellites and to further investigate the genetic basis of the traits, genomic information is essential. The sequences were isolated from genomic library and of the genomic library, one using bacterial artificial chromosome (BAC) system, called the BAC library, has been frequently used such as to generate whole-genome physical maps by DNA fingerprinting [4], to develop sequencetagged connectors [5], and to sequence the genome itself [6] because of their insert size capacity, reproducibility and stability as the DNA sample [7]. By integrating BAC clones into linkage maps using BAC-derived sequences such as BAC-end sequences (BESs), BAC library also play important roles in genetic studies and subsequent positional cloning [8]. BAC libraries have been developed in several domestic animals, e.g. cattle [9], pig [10] and sheep [11], and in fishes, salmon [12], catfish [13], rainbow trout, carp, tilapia [14,15], European sea bass [16,17] and barramundi [18] but in yellowtails a BAC library has not yet been constructed.
In this study, to advance yellowtail genomic and genetic resources and for understanding of the genetic basis of several traits, we constructed a high-coverage BAC library, obtained BESs for preliminary survey of the genomic content and constructed comprehensive genetic linkage map of yellowtail.

BAC library construction and BAC end sequencing
The yellowtail genomic DNA content, represented as C-value, was estimated to be 0.7 pg/cell (data not shown) using flow cytometric analysis and hence the genome size was calculated to be approximately 685 Mb. Of 100 randomly selected BAC clones, 71 (71%) contained inserts, indicating that approximately 78,520 (71% of 110,592 clones) clones had an insert. The size distribution of the 71 clones with inserts was from about 20 kb to 220 kb and average insert length was 140.7 kb (data not shown). Therefore, it is estimated that the total length of the yellowtail BAC library insert DNA was approximately 11 Gb and was 16-times larger than the yellowtail genome. It is known that a minimum of 5-10 × coverage across the entire genome is required for a BAC library to be useful for positional cloning, physical mapping, and genome sequencing [19]. Therefore, the yellowtail BAC library is sufficient for further genomic/ genetic analysis except for studies regarding W-linked genes because of our ZZ male derived DNA source [20].
By sequencing both ends of randomly-selected 2,960 BAC clones, a total of 5,920 raw reads were obtained, and of those reads, 4,956 reads (2,471 in the SP6 side and 2,485 in the T7 side) were qualified for subsequent repeat identification and BLAST search (GA867436 -GA872391). Total length of the qualified BESs was 3,074,133 bp with an average size of 620 bp, representing approximately 0.45% of the yellowtail genome. The GC content was estimated to be 41.36%, which is almost the same as other fishes ( abundance of the repetitive sequence in yellowtail genome is lower than the majority of teleost fishes studied so far such as rainbow trout (59.5%) [21], common carp (17.3%) [15], channel catfish (11.9%) [22] and Nile tilapia (14.0%) [23] and comparable to that in Tetraodon (6.2%) and in fugu (4.3%) [24]. A total of 1,845 simple sequence repeats (SSRs) were identified from the BESs (Table 1). Of the SSRs, dinucleotide repeats, particularly AC/GT repeats including CA/TG repeats, were the most abundant (Table 2).

Homology to other teleost genomes
To identify the homology between yellowtail and other fishes, the yellowtail qualified BESs were subjected to BLASTx and BLASTn searches against eight teleost proteomes and genomes respectively. The highest number of top hits, highest average bit score and % identity were observed in yellowtail-Nile tilapia in both BLAST results (Table 3). Total length of the queries in BLASTx hits between yellowtail and Nile tilapia was estimated to be 198,090 bp indicating that 6.4% of the qualifed BESs was protein coding sequence. The high sequence similarity between yellowtail and Nile tilapia can be explained by their phylogenic positions where they are both assigned in the order Perciformes [25]. In the BLASTn result, the second-highest number of top hit was observed in the yellowtail-stickleback comparison ( Table 3). The high sequence similarity between stickleback and species in Perciformes such as striped bass and gilthead seabream has been reported and therefore our data is consistent with the previous observations [26,27].

Genetic linkage map
Out of the 743 primer pairs designed from the qualified BESs, 373 primer pairs (27 mononucleotide repeats, 285 dinucleotide repeats, 31 trinucleotide repeats, 26 tetranucleotide repeats, 3 pentanucleotide repeats and 1 hexanucleotide repeat) produced amplicons. In addition to the 464 microsatellite markers derived from the genomic library Ohara et al. developed [2,3], 837 markers in total were included in the yellowtail genetic linkage maps (Additional file 1). No segregation distortion was observed in any markers and hence lethal allele-linked markers were not included in our marker set.
Resultant yellowtail female and male genetic map consists of 715 and 702 markers including 232 and 271 framework markers, spanning 1,026.65 and 1,057.83 cM Kosambi with an average interval 4.96 and 4.32 cM on 24 linkage groups respectively (Table 4, Figure 1). The number of chromosomes in yellowtail has been reported to be 2n = 48 and hence the SSR markers we developed are distributed throughout the yellowtail genome [28]. The "gaps" observed in Squ21 and 24 in male and both map respectively might be caused by "recombination hot-spots" where recombination occurs frequently ( Figure 1). The genome length was estimated to be 1,274.64 (L 1 ) and 1,284.34 (L 2 ) cM in the female and 1,282.35 (L 1 ) and 1,285.45 (L 2 ) cM in the male map by the two different methods respectively (see Materials and Methods). Using formula c = 1e -2dn/L and estimated genome length L, coverage of the female and male map is estimated to be 83.3 to 83.9% respectively (Table 4). Considering the average interval less than 10 cM and the genome coverage, we concluded that the yellowtail genetic map was sufficient for further QTL studies [29].

Identification orthologous chromosomes with other fishes
In addition to the BAC or whole genome sequence, comparative genome analysis especially conserved synteny would be helpful for fine-scale QTL analyses and/ or understanding the genetic basis of the traits [30,31]. BLAST searches of the 818 mapped yellowtail loci against medaka, Tetraodon, stickleback, fugu and zebrafish proved that 25.7, 23.0, 42.2, 24.4 and 9.4% of the loci were mapped to each genome sequence. Oxford grids showed that eighteen linkage group pairs between yellowtail and stickleback retained a one-to-one relationship, and another three stickleback and six yellowtail    linkage groups had a one-to-two relationship, implying that chromosomal fusions or breakages occurred after divergence from ancestor of both species (Figure 2). Nevertheless, the result suggests conserved synteny between yellowtail and stickleback and hence the stickleback genome data would be useful as a reference of yellowtail genome.

Conclusions
We herein constructed a high-coverage BAC library and comprehensive genetic linkage map including BESderived SSR markers of yellowtail (Seriola quinqueradiata). A survey of BESs showed a low frequency of repetitive sequences as much as that of Tetraodon and fugu. BLAST searches and Oxford grids against five fish genomes clearly showed conservation between yellowtail and stickleback genome. Generally, a high repetitive sequence frequency hampers chromosome walking and makes the positional cloning difficult [32]. A low frequency of repetitive sequences and relatively small genome size suggest that yellowtail would be an ideal species to study the genetic basis of economically important traits. In addition, conserved genome architecture with stickleback would be helpful for synteny-based identification of new genetic markers and genes in the target genomic segments. We have already started studies regarding several traits such as sex determination and disease resistance [20,33]. We anticipate that the genomic and genetic resources we constructed will be powerful tools for further studies of these traits.

Ethics statements
Field permits are not required for this species in Japan. Since all fish treatments were performed in Goto Branch of Seikai National Fisheries Research Institute of Fisheries Research Agency, fish handling, husbandry and sampling methods were approved by Institutional Animal Care and Use Committee of National Research Institute of Aquaculture (IACUC-NRIA No. 03).

BAC construction and BAC-end sequencing
The BAC library was constructed according to Katagiri et al. with some modifications [14]. Briefly, at first, approximately 5 × 10 7 frozen sperm cells taken from one male yellowtail were embedded in agarose plugs, digested with proteinase K overnight at 37°C and stored in 0.5 M EDTA following proteinase K inhibitor treatment until use. The plugs were dialyzed in 0.5 × TE, partially digested with MboI and size fractionated by pulsefield electrophoresis. The fraction containing 150 to 250 kb genomic DNA was excised from the gel and was recovered as high molecular weight (HMW) genomic DNA. The HMW genomic DNA was then integrated into BamHI site of pBACe3.6 vector and reactions were transfected to E. coli DH10B strain. Finally, a total of 110,592 recombinant BAC clones were picked and stored in 288 384-well microtiter plates. The length of the insert DNAs was estimated by analyzing 100 BAC inserts digested with NotI. The BESs were obtained from eight 384-well plates containing 3,072 clones. The BAC DNAs extracted by conventional alkaline lysis method were sequenced from SP6 and T7 sides with BigDye Terminator v3.1 Cycle Sequencing Kit (Life Technologies) following the manufacturer's instructions and reactions were electrophoresed with Applied Biosystems 3730 DNA Analyzer (Life Technologies). All raw reads were processed using PHRED software with default parameters except for the trimming error probability was set at 0.01 [34,35], and vector and bacterial sequences were masked by CROSS_MATCH implemented in PHRAP software. The masked BESs of more than 100 bp in length, hereafter called "qualified BES", were extracted using our in-house perl script. The GC content of the extracted BESs was estimated using the geecee program included in the EMBOSS package [36].

Sequence data analysis
Repetitive DNA elements in the qualified BESs, such as transposable elements and SSRs, were identified and masked using Crossmatch search engine (v1.090518), "teleostei" repeat database implemented in Repbase RepeatMasker Edition (20120418) and RepeatMasker program (see http://www.repeatmasker.org/ for details).
BLAST searches were performed with qualified BESs as query with cut-off e-value e −9 . The top hit querysubject pairs were extracted using in-house perl script under the criteria in which if multiple query-subject pairs were observed and were overlapped each other, only the most significant pair was considered significant.

Development of SSR markers
The SSR motifs and primer pairs in the qualified BESs were searched by WebSat online application (http:// wsmartins.net/websat/) with default settings except for Figure 2 Oxford grids between yellowtail and five model fish genomes. Numbers in boxes indicate the number of orthologous gene pairs. Boxes containing more than ten, seven and five orthologous gene pairs are highlighted in red, yellow and blue respectively. product size, which was set to 100-200 bp. For each BAC clone, the SP6-side BES was at first analyzed and if no SSR motif or primer-binding site was found in the sequence, T7-side BES of the same clone was alternatively used. In both sides, the SSR motifs containing over six repeats were considered as real SSRs. In the case where more than two SSR motifs were found in one read, the longest one was used as a representative. Finally, 743 SSRs (89 mono-, 550 di-, 68 tri-, 31 tetra-, 4 penta-and 1 hexa-nucleotide repeats) were selected for primer design.
In addition to the BES-derived SSRs, we also developed microsatellite markers from the genomic library constructed by Ohara et al. [2,3]. The microsatellites containing CA/GT repeat motifs were isolated according to the protocol of Ohara et al. and primers were designed as described above [2,3].

Mapping panel
The mapping panel consists of ninety progenies produced by artificial fertilization. Parent fish were caught off Goto Island, Nagasaki Prefecture, Japan and reared in a sea cage until they were matured with approximate body weight 7 kg. Human chorionic gonadotropin (ASKA Pharmaceutical) was intramuscularly administered to the parent fish at 600 IU/kg body weight and eggs and sperm were taken at 45 hours after administration. Fertilized eggs were kept in 500 L seawater at approximately 19°C with 0.5-1 L/min aeration until hatching. The juvenile fish were reared in 500 L seawater at 20-25°C until their body length reached 10 cm. The caudal fin was partially clipped from each progeny as the DNA source and kept in absolute ethanol until use. Genomic DNA of each fish was extracted using DNeasy Blood and Tissue Kit (Qiagen) according to manufacturer's instructions.

Data acquisition
Genotyping was performed in an 11 μl reaction volume containing 0.5 pmol/μl of unlabelled primer, 0.05 pmol/ μl of fluorescence-end-labeled primer with [5'-TET], 1 × buffer, 2.0 mM MgCl 2 , 0.2 mM dNTP, 1.1 μg of BSA, 0.025 U of EX Taq DNA polymerase (Takara) and 25 ng of template DNA. PCR was performed on a GeneAmp® PCR System 9700 (Applied Biosystems), and the program conditions were 95°C for 2 min for initial denaturation, followed by 30 cycles of 30 sec at 95°C, 1 min at the annealing temperature (52-55°C), 1 min at 72°C and 10 min at 72°C for final extension. Amplification products were mixed with an equal volume of loading buffer (98% formamide, 10 mM EDTA, 0.05 w/v% bromophenol blue), heated for 10 min at 95°C and then immediately cooled on ice. 2 μl of each sample was loaded onto a 6% PAGE-PLUS gel (Amresco) containing 8 M urea and 0.5 × TBE buffer. Electrophoresis was performed in 0.5 × TBE buffer, and after electrophoresis, the gel was scanned and imaged using a FLA-9000 image scanner (GE Healthcare).

Linkage map construction
Genotype data obtained above were subjected for linkage analysis for the male and the female meiosis independently. Marker genotypes were analyzed with LINKMFEX ver. 2.3 (http://www.uoguelph.ca/~rdanzman/software.htm). Linkage analysis was performed using genotype data converted to a backcross format. As the grandparent genotypes were unknown, pairwise analyses were performed, and markers were sorted in linkage groups at a minimum LOD score of 4.0. A goodness-of-fit for Mendelian segregation distortion was tested for all alleles using the chi-square test (p < 0.05, d.f. = 1). Finally, the marker order was determined and double recombination events were checked with MapManagerQTX version 2.0 [37]. The resultant genetic map was visualized using MapChart version 2.2 [38].
The genome length L was estimated using two different methods following Fishman et al. [39]. In the first method (L 1 ), average marker interval was estimated by dividing the summed length of all linkage groups by the number of intervals, and twice the average marker interval was added to each linkage group. In the second method (L 2 ), the length of the each linkage group was multiplied by the factor (m + 1)/(m -1), where m is the number of framework markers on the linkage groups. Finally, genome coverage c of the linkage map was estimated by calculating c = 1e -2dn/L , where d is the average interval of markers, n is the number of markers, and L is the genome length estimated above.

Identification orthologous chromosomes with other fishes
The flanking sequences obtained from all SSR markers assigned to the yellowtail linkage groups were used for the BLASTn search against genomic sequences of medaka, Tetraodon, stickleback, fugu and zebrafish with a cut-off e-value of 0.01. The top hit query-subject was extracted using in-house perl script. In the case where multiple hits were obtained, we defined orthology as follows; let us consider only the first, second and third top hit, if query position of the first and second/third top hit is overlapped each other and quotient of e-value of the first hit divided by that of the second/third hit is greater than 10 −3 , the hit is considered to be an unclear orthologous pair and rejected. The substantial hits were processed for constructing Oxford grid using Grid Map ver. 3.0a (http://cbr.jic.ac.uk/dicks/software/Grid_Map/).

Availability of supporting data
All the supporting data are included as additional files.