- Research article
- Open Access
No evidence for viral sequences in five lepidic adenocarcinomas (former “BAC”) by a high-throughput sequencing approach
BMC Research Notesvolume 8, Article number: 782 (2015)
The hypothesis of an infectious etiology of the formerly named bronchiolo-alveolar carcinoma (BAC) has raised controversy. We investigated tumor lung tissues from five patients with former BAC histology using high-throughput sequencing technologies to discover potential viruses present in this type of lung cancer. Around 180 million single reads of 100 bases were generated for each BAC sample.
None of the reads showed a significant similarity for Jaagsiekte sheep retrovirus (JSRV) and no other viruses were found except for endogenous retroviruses.
In conclusion, we have demonstrated the absence of JSRV and other known human viruses in five samples of well-characterized lepidic adenocarcinoma.
The bronchiolar-alveolar cancer (BAC) in its past definition (WHO classification 1999) is a rare form of lung adenocarcinoma (ADC). The international WHO 2015 classification recommends distinguishing adenocarcinoma in situ (AIS, formerly non-mucinous BAC) from invasive mucinous adenocarcinoma (IMA, formerly mucinous BAC) and non-mucinous lepidic predominant invasive adenocarcinoma of the lungs . In many such patients, the tumor progression respects the pulmonary architecture and develops mainly in the terminal respiratory unit (lepidic growth).
The etiology of these cancers still remains unclear. Interestingly, lung adenocarcinoma cancers with predominant lepidic pattern can be distinguished from other pulmonary non-small-cell carcinomas by an increased frequency of onset in young subjects, women, non-smokers and Asians. Unlike other lung adenocarcinomas, invasive mucinous adenocarcinomas (IMA) the first and more frequent variant of adenocarcinomas (WHO 2015) are multifocal or diffuse and the death is generally due to the bilateral pulmonary spread rather than the onset of metastases.
Given the fact that these cancers are rarely metastatic, a replacement of the cancerous lung by an allograft is possible. Some cases of recurrence of the cancer in a transplanted lung (initially healthy) have led us to question the existence of an infectious agent capable of recolonizing the transplant itself. The hypothesis of an infectious etiology of the formerly named “BACs” has raised controversy, which has been reopened by the observation that the ovine pulmonary adenocarcinoma, which shows some strong clinical and histological similarities with human “BACs”, is associated causally to the infection by the JSRV retrovirus (Jaagsiekte sheep retrovirus) [2–5]. However, in humans, molecular approaches mainly based on PCR technology aiming to reveal the genome of the JSRV virus in the patients with a bronchiolar-alveolar cancer have nevertheless been mostly found negative [6, 7].
The aim of this study was to explore more broadly the hypothesis of a viral etiology of invasive adenocarcinoma with lepidic growth by benefiting from high-throughput sequencing technologies. During the last decade, these approaches have enabled to discover and characterize some new viruses associated to chronic and/or acute human diseases [8, 9]. This is exemplified by the identification in 2008, of the first human oncogenic polyomavirus and its association with the Merkel cell carcinoma, a rare but aggressive skin cancer .
Frozen tumor lung tissues were retrospectively collected from five patients with former “BAC” histology treated at the Institut Mutualiste Montsouris (Paris, France) between 2007 and 2011. Patient characteristics are summarized in Table 1. Briefly, among these, three were female patients, two were never smokers and three were former smokers. As usually observed in this disease, the maximum standardized uptake values (SUVmax) during PET/CT scan examination of the patients showed generally low fixation values. Two of the cases with former BAC diagnosis showed invasive mucinous histology whereas three were non-mucinous. All cases were pathologically reviewed (EB) according to the recent WHO 2015 classification of lung adenocarcinoma. Two supplementary cases of squamous cell carcinoma from two female smokers, obtained at the Centre Chirurgical Marie-Lannelongue (CCML, Le Plessis-Robinson, France) were used as negative controls in the bioinformatic analysis. The study was approved by the local ethics committee and written informed consents were obtained for all patients included.
Samples associated with known human oncogenic viruses
The sensibility of the high-throughput sequencing assay used here has been evaluated in a pilot study thanks to the detection of HTLV-1 and HHV-8, two very different known oncogenic viruses associated causally to the adult T cell leukemia/lymphoma (ATLL) for HTLV-1 and to the primary effusion lymphoma (PEL) for HHV-8. These two oncogenic viruses were chosen because they are quite different according to their size (around 9 kb for HTLV-1 vs 200 kb for HHV-8), for their genomic form in the cancer cells (integrated in the genome DNA for HTLV-1 and latent episomal for HHV-8) and more importantly for their number of copies present in the tumors. Indeed, in ATLL, one or two copies of HTLV-1 viral genome are integrated into each cancer cells whereas the viral load is much higher in PEL with around 100–300 HHV-8 copies by cell. Two frozen samples of uncultured ex vivo tumors cells were used. The first one originated from an ATLL associated with HTLV-1, the second one was a PEL associated with HHV-8. The preparation of the nucleic acid, the sequencing process and the bio-informatics analyses are described in the Additional file 1.
Methodology of data analysis
Data generated for the pilot study on ATLL and PEL, on one hand, and for the five adenocarcinoma samples, on the other hand, were subjected to a series of analyses described in Fig. 1. These analyses are distributed between a common process, described in green and applied to both studies, and control processes, outlined in pink, specific to each study. The purpose of the common process was to assign taxonomies to the reads coming from each sample. The count of these taxonomies should help to identify (1) the expected viruses for the pilot study, (2) an infectious agent present in AIS nucleic acid in the main study. More details about the methodology of analysis are given in an Additional file 1.
For each of the nine samples (the ATLL and the PEL for the pilot study and the five AIS), around 180 million single reads of 100 bp were obtained while around 150 million were obtained for the two for RNA samples (the ATLL and the PEL). The quality of these reads was verified and considered very good for all samples. The reads of each sample were filtered against the human genome (hg19).
Pilot study: sensibility of the detection of two known oncogenic viruses (HHV-8 and HTLV-1) in ex vivo tumor samples
Control specific process
The presence of both HTLV-1 and HHV-8 was searched in each of the four studied samples (Table 2). This was done either by mapping reads to reference genomes using bowtie2 , or by searching a similarity between the contigs/singlets and the reference genomes using blastn  (Table 2; Fig. 1a). Concerning the PEL sample, numerous reads originating from the DNA or the RNA corresponded to the HHV-8 reference genome. Indeed, a total of 176,000 (31 %) to 228,000 (45 %) reads were identified as deriving from the HHV-8 genome (Table 2). In contrast, in this PEL sample, not a single HTLV-1 read was found (Table 2). The HHV-8 genome is covered at 85–92 % by the contigs stem from the assembly step (Additional file 1: Figures S1 and S2). Concerning the ATLL sample, the results were very different as only 215 (0.06 %) and 312 (0.14 %) reads were identified, as originating from the HTLV-1 genome, in the DNA and RNA samples respectively. Contigs from these samples cover only 45–60 % of the retroviral genome (Additional file 1: Figures S3 and S4).
The taxonomic assignment performed by BLASTN (against the EMBL database) on the contigs (similar to those used for above in the control process) of each sample demonstrated the ability of this process to identify the two viruses. For the PEL sample, HHV-8 presence was obvious since it represents 41 % to 80 % of assignments in the RNA and DNA samples respectively, after filtering. The background of the analysis is minimal since the taxonomy “Rhadinovirus” represents 100 % of viral taxonomies (Additional file 1: Figures S5 and S6). For ATL, HTLV-1 was represented by only a few number of small contigs, however, the analysis process also leads to the detection of its presence. Although viral taxonomies represent only 0.08–0.7 % of the assignments (Additional file 1: Figures S7 and S8), in this viral branch, HTLV-1 represents 86–98 % of the reads (Additional file 1: Figures S9 and S10). These differences between HHV-8 and HTLV-1 results were in agreement with the theoretical prediction that takes into account the size of the two genomes and their number of copies per cell.
In conclusion, long contigs representing almost all the viral genome can be obtained with the reads generated with the DNA (Additional file 1: Figures S1–S4). The RNA data bring more information on the expression of the viral genes in the tumor but are less “informative” for its detection and characterization because only small contigs representing small parts of the genome are obtained. Based on these data, the use of genomic DNA in the IMA study was considered as a good approach for the research of a potential infectious agent present in these tumors, especially if we consider that such agent could be a DNA virus. Indeed, up to know, almost all oncogenic viruses are DNA viruses with the exception of HCV and HTLV-1.
Similarities were initially searched using blastn, between some known viruses and all reads of each sample without any prior filtering, especially against hg19. These viruses were the Jaagsiekte sheep retrovirus (JSRV, Acc NC_001494.1), the Human herpesvirus 6B (HHV6B, Acc AB021506), Epstein–Barr virus (EBV, Acc M80517) and the Human endogenous retrovirus K113 (HERV-K113, Acc AY037928). None of the reads in the five samples showed a significant similarity for JSRV, HHV6 and EBV. In contrast and, as expected, HERV was present in all the samples, with more than 50,000 reads per samples overlapping the entire HERV genome.
For control process, the filtering of five tumor samples allowed us to significantly reduce the number of human reads and contaminants (Table 1). Indeed, 62–91 % of the reads, remaining after hg19 filtering, were removed by this second filter using the two control samples (Table 3). Although the percentage of filtered reads is high, the number of remaining reads for each sample were variable with a maximum range from 0.3 × 105 reads for sample AT5 to 2 × 105 for sample AT1 (Table 3).
An overall assembly of all these filtered reads allowed us to obtain 596 contigs (global assembly Table 3). Similarities between these contigs and the sequences of the EMBL database (http://www.ebi.ac.uk/ena/home) were searched for using blast. A total of 434 significant similarities led to a taxonomic assignment for the same number of contigs. These taxonomies, depicted by the krona software  were overwhelmingly eukaryotic (96 %) and more specifically hominidae (76 %) (Additional file 1: Figure S11). The remaining sequences consist of contigs, shorter than 200 bases, with small similarities (30–40 bases) with sequences annotated as vectors (in green in Additional file 1: Figure S11). The taxonomic assignment provided by blastx against Uniprot databases (http://www.uniprot.org/) were almost identical (data not shown).
For 16 contigs, all from the AT2 sample, taxonomic assignment obtained by similarities close to 100 % is unexpected. These taxonomies refer to sequences from Mus musculus and never described in human sequences. A total of 12 contigs covering the entire CDS and the 3′ untranslated region of a LINE-1 sequence (L1spa Acc AF016099.1, L1Md-tf23 Acc AF081110); three contigs partially overlap the Gag/pol region of an intracisternal A-particle sequence (MIA14, Acc M17551.1) and one contig, the only one with high coverage, is identical to a sequence described as a major satellite repeat sequence (Acc EF028077). To our knowledge, these three sequences have not been interconnected in a database sequence or in a publication but all three are annotated as mobile elements.
Discussion and conclusion
The goal of our study was to search for the presence of a virus in well-characterized ex vivo samples of human lung adenocarcinoma, especially in the bronchiolar-alveolar subtype. We specifically focused our study on these relatively rare tumors because: (1) Based on clinical aspects, the hypothesis of an infectious etiology has been raised in such tumors. (2) The ovine pulmonary adenocarcinoma which exhibits some similarities with mucinous BAC (now IMA) is considered causally associated with the JSRV retrovirus. (3) The published data concerning the detection by molecular means and/or by immuno-histochemistry of such a retrovirus in mucinous BAC still remained controversial. Indeed, the most recent report focusing specifically on the JSRV candidate in histological sections of different lung cancers, including BAC, found minor evidence of JSRV env-specific immunoreactivity and JSRV-like env and gag sequence amplification. The authors suggest that a JSRV-like virus might infect human lungs and have oncogenic properties in several subtypes of lung cancer . Conversely, other studies have failed at showing any association between lung cancer and JSRV [6, 7, 14]. Moreover, a large study recently investigated 225 cases of lung adenocarcinomas using RNA-Seq found no DNA virus transcript .
Thus, firstly in order to try to resolve the above-discussed controversy concerning the association of JSRV with mucinous BAC and secondly to uncover in such specific lung tumors any other known or unknown viruses, we used the high-throughput sequencing method. Such assay is based on random sequencing of nucleic acid present in a given sample. This very broad approach method is highly powerful, and can detect any known or unknown agent with a sensitivity level roughly equivalent to current specific PCR methods . Furthermore, this technological approach with a sufficient depth of sequencing, as performed here, is able to confirm the absence of a suspected infectious agent in targeted pathologies [15, 17, 18]. Lastly, a previous study showed the effectiveness of such a method on samples artificially infected with low levels of known viruses .
The first step of the study was to show the effectiveness of the use of our methodology for two human cancer (ATLL or PEL) samples naturally infected by two different oncogenic viruses respectively (HTLV-1 and HHV-8). The entire pilot project demonstrated a high sensitivity to our approach, which was thus subsequently applied to BAC samples. Indeed, our method allowed us to detect but also identify nearly the entire sequence, or large portions, of these two known viral genomes in the specific cancer samples. Furthermore, our data also showed good specificity with, on one hand, a low background noise at both viral and bacterial level and, on the other hand, a lack of contamination between samples. Indeed, none of the HHV-8 reads could be detected by bowtie2 or by BLASTN in the ATLL samples despite the fact that the samples were handled together and sequenced in the same flow cell.
Our study has several limitations:
This study was conducted in only 5 European patients.
There were no AIS (pure lepidic growth pattern) included in this study.
Some interesting microbial sequences, especially of viral origin but wrongly considered as belonging to (or assigned to) Homo sapiens, could have been eliminated through the filtering process.
A relatively high number of contigs (162), representing more than 300,00 bp, remained without any taxonomic assignation at the end of our analyses. It could be hypothesized that one, or few of them, could correspond to a part of a yet unknown infectious agent. Clarifying this situation will need further in depth studies searching for possible viral specific structure in such contigs.
Finally, the absence of any viral sequences in the findings could be linked to the possibility of a very low viral load of the searched virus in only a small proportion of the tumor cells. Such unlikely situation has however been raised in some cases of Merkel cell carcinoma .
In conclusion, we have demonstrated the absence of JSRV and other known human viruses, before and after any filtering, in these five samples of well-characterized lepidic adenocarcinomas.
International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society
adenocarcinoma in situ
invasive mucinous adenocarcinoma
non mucinous lepidic invasive adenocarcinoma
epidermal growth factor receptor
v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog
Jaagsiekte sheep retrovirus
standardized uptake value
positron emission tomography–computed tomography
human T-lymphocyte virus type 1
human herpesvirus type 8
adult T cell leukaemia/lymphoma
primary effusion lymphoma
hepatitis C virus
human endogen retrovirus
basic local alignment search tool
lepidic lung adenocarcinoma
Travis WD, Brambilla E, Noguchi M, Nicholson AG, Geisinger K, Yatabe Y, Powell CA, Beer D, Riely G, Garg K, et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society: international multidisciplinary classification of lung adenocarcinoma: executive summary. Proc Am Thorac Soc. 2011;8(5):381–5.
Cousens C, Minguijon E, Garcia M, Ferrer LM, Dalziel RG, Palmarini M, De las Heras M, Sharp JM. PCR-based detection and partial characterization of a retrovirus associated with contagious intranasal tumors of sheep and goats. J Virol. 1996;70(11):7580–3.
Demartini JC, Rosadio RH, Lairmore MD. The etiology and pathogenesis of ovine pulmonary carcinoma (sheep pulmonary adenomatosis). Vet Microbiol. 1988;17(3):219–36.
Palmarini M, Cousens C, Dalziel RG, Bai J, Stedman K, DeMartini JC, Sharp JM. The exogenous form of Jaagsiekte retrovirus is specifically associated with a contagious lung cancer of sheep. J Virol. 1996;70(3):1618–23.
York DF, Vigne R, Verwoerd DW, Querat G. Isolation, identification, and partial cDNA cloning of genomic RNA of Jaagsiekte retrovirus, the etiological agent of sheep pulmonary adenomatosis. J Virol. 1991;65(9):5061–7.
Hiatt KM, Highsmith WE. Lack of DNA evidence for Jaagsiekte sheep retrovirus in human bronchiolo-alveolar carcinoma. Hum Pathol. 2002;33(6):680.
Yousem SA, Finkelstein SD, Swalsky PA, Bakker A, Ohori NP. Absence of Jaagsiekte sheep retrovirus DNA and RNA in bronchiolo-alveolar and conventional human pulmonary adenocarcinoma by PCR and RT-PCR analysis. Hum Pathol. 2001;32(10):1039–42.
Feng H, Shuda M, Chang Y, Moore PS. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science. 2008;319(5866):1096–2000.
Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, Conlan S, Quan PL, Hui J, Marshall J, et al. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med. 2008;358(10):991–8.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a web browser. BMC Bioinform. 2011;12:385.
Linnerth-Petrik NM, Walsh SR, Bogner PN, Morrison C, Wootton SK. Jaagsiekte sheep retrovirus detected in human lung cancer tissue arrays. BMC Res Notes. 2014;7:160.
Hopwood P, Wallace WA, Cousens C, Dewar P, Muldoon M, Norval M, Griffiths DJ. Absence of markers of betaretrovirus infection in human pulmonary adenocarcinoma. Hum Pathol. 2010;41(11):1631–40.
Khoury JD, Tannir NM, Williams MD, Chen Y, Yao H, Zhang J, Thompson EJ, Meric-Bernstam F, Medeiros LJ, Weinstein JN, et al. Landscape of DNA virus associations across human malignant cancers: analysis of 3775 cases using RNA-Seq. J Virol. 2013;87(16):8916–26.
Cheval J, Sauvage V, Frangeul L, Dacheux L, Guigon G, Dumey N, Pariente K, Rousseaux C, Dorange F, Berthet N, et al. Evaluation of high-throughput sequencing for identifying known and unknown viruses in biological samples. J Clin Microbiol. 2011;49(9):3268–75.
Dereure O, Cheval J, Du Thanh A, Pariente K, Sauvage V, Manuguerra JC, Caro V, Foulongne V, Eloit M. No evidence for viral sequences in mycosis fungoides and Sezary syndrome skin lesions: a high-throughput sequencing approach. J Investig Dermatol. 2013;133(3):853–5.
Li R, Faden DL, Fakhry C, Langelier C, Jiao Y, Wang Y, Wilkerson MD, Pedamallu CS, Old M, Lang J et al. Clinical, genomic, and metagenomic characterization of oral tongue squamous cell carcinoma in patients who do not smoke. Head Neck. 2015;37(11):1642–9. doi:10.1002/hed.23807
NB, KO, JCS and AG conceived and designed the experiments. NB and ND performed the experiments of molecular biology. LF performed the bioinformatics analysis. AG and J-CS co-managed this work. CB performed the sequencing. EB, PG, PV and EF performed characterization of adenocarcinoma samples. NB, KO, AG, LF and JCS analyzed the data. NB, LF, KO and AG wrote the manuscript. All authors read and approved the final manuscript.
This program was supported by the Institut Pasteur and Institut Gustave Roussy, sponsorship by the Elisabeth Taub award from the Académie Nationale de Médecine and the Laboratory of Excellence, Integrative Biology of Emerging Infectious Diseases (LABEX). High throughput sequencing has been performed on the Genomics Platform, member of “France Génomique” consortium (ANR10-INBS-09-08). We thank the PIRC (Pôle Intégré de Recherche Clinique) at the Institut Pasteur for their help in the biomedical regulatory aspects of the project. We thank Magali Tichit for his technical help and Heidi Lançon for the English revision. The funders had no role in study design, data analysis or preparation of the manuscript.
The authors declare that they have no competing interests.
Nicolas Berthet, Lionel Frangeul and Ken André Olaussen equally contributed to this work