Skip to main content
  • Research note
  • Open access
  • Published:

Shotgun metagenomic analysis reveals new insights into bacterial community profiles in tempeh



Amplicon sequencing targeting 16S ribosomal RNA (rRNA) has been widely used to profile the microbial community from fermented food samples. However, polymerase chain reaction (PCR) steps on amplicon sequencing analysis and intragenomic heterogeneity within 16S rRNA are believed to contribute to bias in estimating microbial community composition. As potential paraprobiotics sources, a comprehensive profiling study of tempeh microbial ecology could contribute to tempeh product development. This study employed a shotgun metagenomic approach, where metagenome fragments from tempeh samples were sequenced directly for taxonomic and functional profiling analysis.


Taxonomic profiling showed that Proteobacteria, Firmicutes, and Bacteroidetes were the dominant phyla from the shotgun metagenomic analysis in all tempeh samples. In terms of composition, this shotgun metagenomic study revealed that Proteobacteria was the most abundant phylum. Functional profiling showed that iron complex outer-membrane recepter protein (KEGG ID: K02014) was the most transcribed gene based on this metagenomic analysis. The metagenome-assembled genomes (MAGs) results from the binning pipeline could reveal almost complete whole genome sequence of Lactobacillus fermentum, Enterococcus cecorum, Escherichia coli, Klebsiella pneumoniae, and Acinetobacter baumannii.


Tempeh is fermented food originated from Indonesia. The biochemical changes of soybean during microbial fermentation increased nutritional values and health-promoting bioactive compounds in tempeh. Compared to other indigenous soybean-based fermented food such as nato, miso (Japan), kinema (Nepal), and douchi (China), which used Bacillus spp. as inoculum, tempeh used Rhizopus spp. in the production [1]. The nature of tempeh production processes creates consortia of microorganisms not only from tempeh inoculum but also from production materials and environment [2]. Over the past decade, useful tools of next-generation sequencing (NGS) such as metagenomics has been applied to study microbial consortia from fermented food microbial ecology [3]. Previous metagenomic studies of the microbial community during tempeh production were conducted by employing amplicon sequencing targeting the V4 region of the 16S rRNA gene. These studies focused on the dynamic taxonomic profile of the microbial community from tempeh metagenome samples and indicated that Firmicutes was the predominant phylum. Some genera such as Lactobacillus, Streptococcus, and Weisella from Firmicutes phylum known as probiotics are also reported in this study [4,5,6]. Tempeh is generally cooked before consumption and might act as an inactivated probiotics source. Previous tempeh nutrigenomic studies showed that cooked tempeh supplementation could enhance the immune system by increasing IgA production in human intestinal tracts [7]. Recently, studies reported cooked tempeh consumption for 6 months, improving global cognitive function respondents aged 60 years or over with mild cognitive impairment [8]. Inactivated probiotics were previously mentioned in literature with the term paraprobiotics. Recently paraprobiotics are defined as non-viable either intact or broken cells of microbial and crude extract of cells that could positively affect when administered in a sufficient amount [9]. There are around 80,000 tempeh producers categorized as small and medium-scale home industries in Indonesia [10]. The tempeh microbial ecology formed during production may vary among these producers. As potential paraprobiotics sources, a comprehensive profiling study of tempeh microbial ecology could contribute to tempeh product development. Besides for taxonomic profile and composition study, shotgun metagenomics could also be used for the functional study of microbial communities based on more objective analysis through direct whole-genome sequencing [11, 12]. Therefore, this study aims to study tempeh microbial ecology employing shotgun metagenomic analysis.

Main text

Materials and methods


Samples were collected from two local traditional tempeh producers in Bogor, Indonesia, designated as EMP and WJB. The samples from these producers have been used as sources for microbial community analysis on tempeh for many years [5]. EMP and WJB were representatives for the different methods of soybean boiling process on tempeh productions. The EMP employs one boiling process, while the WJB employs two boiling processes. Five tempeh samples from the same production batch were randomly picked from the EMP or WJB producers. The plastic was used as a wrapper in the tempeh production process. On the same day, each of these tempeh samples small diced and pooled in a sterile container for further step, total microbial DNA genome extraction.

Total DNA extraction

The extraction process was adapted from a previous study [13]. One hundred-gram of fresh tempeh sample was homogenized in 300 mL of phosphate buffer saline (PBS) using the Philip HR2061 blender (Koninklijke Philips, Amsterdam, Netherland) for 30 s. The homogenate was centrifuged at 1000×g for 10 min. Supernatants were collected and centrifuged at 10,000×g for 3 min. The pellets were subjected to total microbial DNA extraction employing ZymoBIOMICS DNA/RNA Mini Kit (Zymo Research, California, USA) protocols.

Metagenome sequencing

The whole metagenome library preparation and sequencing process used services from NovogeneAIT Genomics Singapore Pte Ltd. The whole microbial DNA was sheared to produce fragment libraries using restriction enzymes with a minimum of 1 µg of DNA as input. Microbial DNA was precisely quantified using Qubit 2.0 (Thermo Fischer Scientific, United States). The purity and degradation assessment for microbial DNA was done employing NanoDrop (Thermo Fischer Scientific, United States) and gel electrophoresis. The paired-end sequencing library was prepared using the TruSeq DNA PCR-Free Prep Kit (Illumina, United States). The prepared library was sequenced on the NovaSeq 6000 platform (2 × 150 bp chemistry) (Illumina, United States).

Shotgun metagenomic sequencing data analysis

The sequenced reads (raw reads) were filtered from reads containing adapters, reads containing N (the base cannot be determined) > 10%, and reads containing low-quality (Qscore ≤ 5) base, which is over 50% of the total base, by using NovogeneAIT Genomics Singapore Pte Ltd pipeline to produce high-quality paired-end reads (clean reads). Rhizopus spp. reads contamination was removed from the clean reads by employing Read QC module from MetaWRAP pipeline [14] using the mix R oryzae 99892 (PRJNA186020), R. microsporus ATCC 52813 (PRJNA430271, PRJNA205957), R. delemar RA 99-880 (PRJNA13066), R. stolonifer B9770 (PRJNA184886) and R. azygosporus (PRJNA418064) whole-genome sequence as a reference. The SqueezeMeta pipelines [15] were employed for assembly, taxonomic, functional, and bin analyses. The pipelines used the co-assembly mode option where reads from all samples were pooled before the assembly using the Megahit [16] step was performed. The SQMtools package on R version 4.0.3 [17] was used to analyze both taxonomic and functional profiling data generated from SqueezeMeta pipelines [18].


Metagenome sequencing and assembly statistic

The total microbial DNA extracted from tempeh samples collected from two different tempeh producers in Bogor, Indonesia were subjected to Illumina whole metagenome sequencing pipelines. The average effective rate of clean reads from two raw reads of metagenomic data after the quality trimming was 99.93%. A total of 29,030,144 (36.74%) reads from 78,995,980 EMP clean reads and 17,146,676 (15.86%) reads from 108,045,092 WJB clean reads were mapped to the Rhizopus spp. genome reference. The number of contigs resulting from the co-assembly steep data was 293,961. The longest contigs were 485,167 bp, and the N50 value was 1994.

Taxonomic and functional profiling

The taxonomic assignment of contigs is based on individual genes taxonomic assignment. The SqueezeMeta pipeline implements a fast-last common ancestor (LCA) algorithm to analyze each query gene hit results as the Diamond [19] search query against the GenBank nr database. The contigs are annotated to a consensus of the taxon to which most of their genes belong. The selected hits must pass a minimum amino acid identity (AAI) level for assignment to taxonomic ranks. For the phylum and genus, the threshold was 40% and 60% [20]. The metagenome reads will map onto contigs using Bowtie2 [21] to estimate each gene and contig abundance. Contigs with phylum annotation were 263,096 (89.5%). Proteobacteria was a relatively abundant phylum in EMP (74.54%) and WJB (85.38%) metagenome. In the genus level, 211,603 (72%) contig were annotated. Novosphingobium was a relatively abundant genus in the EMP metagenome (27.16%) and the Proteobacteria phylum of EMP (26.81%). Enterobacter was a relatively abundant genus in the WJB metagenome (34.39%) and the Proteobacteria phylum of WJB (33.93%). Firmicutes phylum in the EMP metagenome (10.07%) was relatively more abundant than the WJB metagenome (3.23%). Leuconostoc (3.83%), Enterococcus (2.99%), and Lactobacillus (1.98%) were the top three genera in the Firmicutes phylum of EMP. In the Firmicutes phylum of WJB, Enterococcus (1,17%) was the most abundant genus. The abundance of Bacteroidetes phylum was relatively similar in EMP (1.38%) and WJB (1.84%) metagenome samples (Fig. 1). Functional profiling used the latest publicly available version of the KEGG database for KEGG ID annotation. Iron complex outer membrane recepter protein (KEGG ID: K02014) was the most transcripted expression in the metagenome from tempeh samples (Fig. 2).

Fig. 1
figure 1

The taxonomic abundance of the microbial community of tempeh samples at the rank of phylum and genus

Fig. 2
figure 2

The functional profile of the tempeh metagenome samples using KEGG annotation in TPM (transcripts per kilobase million)

Binning and bin check

The total number of bins obtained from the co-assembly of EMP and WJB metagenome samples results from the DAS tool [22] was 25. According to the CheckM [23] result, eleven bins were categorized as good-quality bins, whose completeness was more than 75% with less than 10% contamination (Table 1). Among good-quality bins, four bins were categorized as high-quality bins, whose completeness was more than 90%.

Table 1 Good-quality bins (> 75% completion, < 10% contamination) obtained by co-assembly mode of EMP and WJB metagenome


The relative abundance of each gene and contigs on the shotgun metagenomic data required normalization because of the genome length differences. The SqueezeMeta pipeline develops a custom script to compute the normalization gene and contig abundance [15]. Study of the microbial community of EMP and WJB tempeh samples using metagenomic16S rRNA sequencing analysis [4,5,6] is consistent with the result of shotgun metagenome sequencing analysis in this study, i.e., Proteobacteria, Firmicutes, and Bacteroidetes were the three most abundant phyla. For the EMP and WJB tempeh sample, Firmicutes were reported as the most abundant phylum in the previous amplicon metagenomic study [4, 5]. In contrast, in this study, the most abundant phylum was Proteobacteria. The previous amplicon metagenomic study on EMP and WJB sample revealed the second-times soybean boiling process on the WJB tempeh production might contribute to tempeh microbial community profile [5]. Firmicutes phylum on the EMP (92%) sample reported relatively more abundant compare to the WJB (88%). This similar trend was also reported in this study. At the genus level from this phylum, both metagenomic analyses revealed Enterococcus relatively most abundant genus in the EMP samples. In the WJB samples, this study also reports Enteroccous was the relatively most abundant genus, while in amplicon metagenomic study was genus Lactobacillus. In terms of taxa prediction and abundance estimation, a recent study reported shotgun metagenomic analysis produced more accurate results compared to 16S rRNA gene-based metagenomic analysis from an artificial skin-associated microbial community. This artificial microbial community contains four species from phylum Proteobacteria, eight species from phylum Firmicutes, and three species from phylum Actinobacteria [24]. Taxonomic profiling solely based on the 16S rRNA sequence could generate bias in bacterial cell count from metagenome samples because many bacteria possess more than one different copy number of this gene [25]. Different PCR primers used on PCR protocols preferentially amplify different taxa sets and generate bias for amplicon metagenomic analysis [26]. A study on the bacterial community in EMP tempeh employing culturable technique also reported that Proteobacteria was the most dominant phylum [5]. The functional profiling of tempeh metagenome samples showed that the most annotated functions were transporter and transposase related genes. It has been reported that the bioavailability of minerals such as iron during tempeh production was significantly elevated. Tempeh has been known to reduce the level of chelating agents, such as phytic acid typically present in soybean [27]. For most bacteria, iron is an essential micronutrient. Gram-negative bacteria such as species in phylum Proteobacteria use the classical iron transport system Iron complex outer-membrane recepter protein (KEGG ID: K02014) is part of it [28]. Previous study demonstrated that transposase was the most abundant gene in environmental metagenome. Transposase catalyzes `cut-and-paste’ or ‘copy-and-paste’ reactions promoting DNA segments mobility of to new sites. Transposase may mobilize or activate genes that enhance their hosts fitness [29]. The binning pipeline was able to produce good quality (> 75% completion and < 10% contamination) the metagenome-assembled genomes (MAGs). Among these MAGs, five were in the species level of taxonomic rank. Some K. pneumoniae isolates from EMP and WJB tempeh fully sequenced and subjected to genomic comparison to pathogenic strain in the previous study [30]. K. pneumoniae is known to produce vitamin B12 during tempeh fermentation [31]. Data analysis from the amplicon sequencing of metagenomic samples of EMP and WJB tempeh in the previous study [4, 5] failed to reveal this important genus. The binning generated draft genome of Lactobacillus fermentum. Previous study showed that lactic acid bacteria (LAB) were common microbial communities found during tempeh production [32]. Genetic diversity employing enterobacterial repetitive intergenic consensus-polymerase chain reaction (ERIC-PCR) of E. coli isolated from tempeh samples were reported [33]. This study showed that E. coli from tempeh samples are genetically different from medical isolates. Partial complete of E. coli genome draft in this study will significantly contribute further comparative genomic study, especially for E. coli isolates derived from tempeh. Beneficials tempeh supplementation as a paraprobiotics source has been published [7, 8], it is reported the whole non-viable might affect human immune and cognitive system. Detail contribution from the tempeh microbial community has not yet been explored. The knowledge from this study might enhance tempeh as a potential paraprobiotics source development.


This study design only employed samples from two producers that could generate bias of taxonomic and functional profiling. While we did not analyze blank controls in this study, we utilized a sterile technique to minimize the effects of laboratory contamination.

Availability of data and materials

The tempeh metagenomic raw reads used for this study were deposed in publicly accessible NCBI’s Sequence Read Archive (SRA) under the accession number: PRJNA605305 (



Ribosomal RNA


Polymerase Chain Reaction


The metagenome-assembled genomes


Next-Generation Sequencing


Phosphate Buffer Saline


Last Common Ancestor


Amino acid identity


Kyoto Encyclopedia of Genes and Genomes


Lactic acid bacteria


Enterobacterial repetitive intergenic consensus-polymerase chain reaction


Transcripts per kilobase million


  1. Astuti M, Meliala A, Dalais FS, Wahlqvist ML. Tempe, a nutritious and healthy food from Indonesia. Asia Pac J Clin Nutr. 2000.

    Article  PubMed  Google Scholar 

  2. Tamang JP, Watanabe K, Holzapfel WH. Review: diversity of microorganisms in global fermented foods and beverages. Front Microbiol. 2016.

    Article  PubMed  PubMed Central  Google Scholar 

  3. De Filippis F, Parente E, Ercolini D. Metagenomics insights into food fermentations. Microb Biotechnol. 2016.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Radita R, Suwanto A, Kurosawa N, Wahyudi AT, Rusmana I. Metagenome analysis of tempeh production: where did the bacterial community in tempeh come from? Malays J Microbiol. 2017.

    Article  Google Scholar 

  5. Radita R, Suwanto A, Kurosawa N, Wahyudi AT, Rusmana I. Firmicutes is the predominant bacteria in tempeh. Int Food Res J. 2018;25(6):2313–20.

    CAS  Google Scholar 

  6. Pangastuti A, Alfisah RK, Istiana NI, Sari SLA, Setyaningsih R, Susilowati A, Purwoko T. Metagenomic analysis of microbial community in over-fermented tempeh. Biodiversitas. 2019.

    Article  Google Scholar 

  7. Stephanie S, Ratih NK, Soka S, Suwanto A. Effect of tempeh supplementation on the profiles of human intestinal immune system and gut microbiota. Microbiology Indonesia. 2017.

    Article  Google Scholar 

  8. Handajani YS, Turana Y, Yogiara Y, Widjaja NT, Sani TP, Christianto GAM, et al. Tempeh consumption and cognitive improvement in mild cognitive impairment. Dement Geriatr Cogn Disord. 2020.

    Article  PubMed  Google Scholar 

  9. Akter S, Park J-H, Jung HK. Potential health-promoting benefits of paraprobiotics, inactivated probiotic cells. J Microbiol Biotechnol. 2020.

    Article  PubMed  Google Scholar 

  10. Wiloso EI, Sinke P, Setiawan AAR, Sari AA, Waluyo J, et al. Hotspot identification in the Indonesian tempeh supply chain using life cycle assessment. Int J Life Cycle Assess. 2019.

    Article  Google Scholar 

  11. Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017.

    Article  PubMed  Google Scholar 

  13. Seumahu CA, Suwanto A, Rusmana I, Solihin DD. Comparison of DNA extraction methods for microbial community analysis in Indonesian tempe employing amplified ribosomal intergenic spacer analysis. Hayati. 2012.

    Article  Google Scholar 

  14. Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Tamames J, Puente-Sánchez F. SqueezeMeta, a highly portable, fully automatic metagenomic analysis pipeline. Front Microbiol. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015.

    Article  PubMed  PubMed Central  Google Scholar 

  17. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2018. Accessed 19 Nov 2020.

  18. Puente-Sánchez F, García-García N, Tamames J. SQMtools: automated processing and visual analysis of’omics data with R and anvi’o. Cold Spring Harbor Laboratory. 2020.

    Article  Google Scholar 

  19. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2014.

    Article  PubMed  Google Scholar 

  20. Luo C, Rodriguez-R LM, Konstantinidis KT. MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Res. 2014.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Khachatryan L, de Leeuw RH, Kraakman MEM, Pappas N, te Raa M, Mei H, de Knijff P, Laros JFJ. Taxonomic classification and abundance estimation using 16S and WGS—A comparison using controlled reference samples. Forensic Sci Int Genet. 2020.

    Article  PubMed  Google Scholar 

  25. Louca S, Doebeli M, Parfrey LW. Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem. Microbiome. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  26. McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. eLife. 2019.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kasaoka S, Astuti M, Uehara M, Suzuki K, Goto S. Effect of indonesian fermented soybean tempeh on iron bioavailability and lipid peroxidation in anemic rats. J Agric Food Chem. 1997.

    Article  Google Scholar 

  28. Clarke T, Tari L, Vogel H. Structural biology of bacterial iron uptake systems. Curr Top Med Chem. 2001.

    Article  PubMed  Google Scholar 

  29. Aziz RK, Breitbart M, Edwards RA. Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res. 2010.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Cesrany M, Yulandi A, Rusmana I, Suwanto A. Whole genome analysis of Klebsiella: unique genes associated with isolates from Indonesian tempeh. Malays J Microbiol. 2017.

    Article  Google Scholar 

  31. Yulandi A, Sugiokto FG, Febrilina A, Suwanto A. Genomic Sequence of Klebsiella pneumoniae IIEMP-3, a Vitamin B12-Producing Strain from Indonesian Tempeh. Genome Announc. 2016.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Efriwati A, Suwanto A, Rahayu G, Nuraida L. Population dynamics of yeasts and lactic acid bacteria (LAB) during tempeh production. Hayati. 2013.

    Article  Google Scholar 

  33. A’Yun Q, Suwanto A, Barus T. Genetic profiles of Escherichia coli isolated from Indonesian tempeh based on enterobacterial repetitive intergenic consensus-polymerase chain reaction (ERIC-PCR). Microbiol Indones. 2015.

    Article  Google Scholar 

Download references


Not applicable.


The research is financially supported by the Ministry of Research, Technology, and Higher Education and the Lembaga Pengelola Dana Pendidikan (Indonesia Endowment Fund for Education) for funding the present study in the form of the BUDI-DN (Beasiswa Unggulan Dosen Indonesia-Dalam Negeri, The Indonesian Superior Lecturer Scholarship-Domestic) scholarship.

Author information

Authors and Affiliations



All authors contributed to the design of the study. AY performed data collection and data analysis. Writing of the draft manuscript was performed by AY, which under supervised by AS, DEW, and ATW. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Antonius Suwanto.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yulandi, A., Suwanto, A., Waturangi, D.E. et al. Shotgun metagenomic analysis reveals new insights into bacterial community profiles in tempeh. BMC Res Notes 13, 562 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: