Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet

Qu, Yingwei; Dudhate, Ambika; Shinde, Harshraj Subhash; Takano, Tetsuo; Tsugama, Daisuke

doi:10.1186/s13104-023-06305-2

Data Note
Open access
Published: 20 March 2023

Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet

Yingwei Qu¹,
Ambika Dudhate²,
Harshraj Subhash Shinde³,
Tetsuo Takano¹ &
…
Daisuke Tsugama¹

BMC Research Notes volume 16, Article number: 38 (2023) Cite this article

1671 Accesses
1 Altmetric
Metrics details

Abstract

Objectives

Pearl millet (Pennisetum glaucum) is a cereal crop that is tolerant to a high temperature, a drought and a nutrient-poor condition. Characterizing pearl millet proteins can help to improve productivity of pearl millet and other crops. Transcription factors in general are proteins that regulate transcription of their target genes and thereby regulate diverse processes. Some transcription factor families in pearl millet were characterized in previous studies, but most of them are not. The objective of the data presented was to characterize amino acid sequences for most transcription factors in pearl millet.

Data description

Sequences of 2395 pearl millet proteins that have transcription factor-associated domains were extracted. Subcellular and suborganellar localization of these proteins was predicted by MULocDeep. Conserved domains in these sequences were confirmed by CD-Search. These proteins were classified into 85 families on the basis of those conserved domains. A phylogenetic tree including pearl millet proteins and their counterparts in Arabidopsis thaliana and rice was constructed for each of these families. Sequence motifs were identified by MEME for each of these families.

Peer Review reports

Objective

Pearl millet (Pennisetum glaucum) is a staple cereal crop that is tolerant to a high temperature, a drought and a poor-nutrient condition and that is produced in semi-arid regions [1]. Characterization of pearl millet genes can help to better understand pearl millet stress tolerance and to improve productivity of pearl millet and other crops. The whole genome sequence of pearl millet was released previously [2]. On the basis of this sequence, pearl millet gene or protein families such as a WRKY transcription factor (TF) family, an NAC (NAM, ATAF and CUC) TF family, a GRAS TF family and a MYB TF family have been identified and characterized [3,4,5,6]. However, most pearl millet protein families are uncharacterized. TFs in general regulate transcription of multiple genes and thus can act as hubs for diverse processes. TFs can therefore be useful as either a transgene in genetic modification or a target of genome editing for improving plant performance. The objective of the data presented was to characterize amino acid sequences of most pearl millet TFs.

Data description

Amino acid sequences for all pearl millet proteins deduced from its whole genome sequence [2] were downloaded from the International Pearl Millet Genome Sequencing Consortium website [7]. Hidden Markov models (HMMs) for protein families in the Pfam database [8] were downloaded from an InterPro website [9]. HMMs in those amino acid sequences were detected by the hmmscan program in HMMER (version 3.3) [10, 11]. On the basis of the detected HMMs, 2395 sequences were regarded as the sequences for putative pearl millet TFs and these were classified into 85 families. Conserved domains in these TFs were confirmed by Batch CD-Search [12, 13]. Subcellular and suborganellar localization of these TFs was predicted by MULocDeep [14, 15]. Amino acid sequences of rice (Oryza sativa ssp. japonica) and Arabidopsis thaliana TFs were downloaded from a PlantTFDB website [16,17,18]. For the families that were not available in PlantTFDB, amino acid sequences of all rice (O. sativa ssp. indica) and Arabidopsis proteins were downloaded from an Ensembl Plants website [19, 20] and used for hmmscan as described above to identify proteins in those families. For each of these families except the 13 families which contain less than five members, the sequences from pearl millet, rice and Arabidopsis were aligned by ClustalW [21] and a phylogenetic tree file was obtained with the neighbor-joining method on the MEGA X software [22]. The phylogenetic tree was visualized on the Interactive Tree of Life (iTOL) online tool (version 6) [23, 24]. For each of the 84 families identified, motifs in the pearl millet amino acid sequences were identified de novo by the MEME program (version 5.5.0) [25]. Data obtained by these analyses were deposited in the figshare repository (Table 1) [26].

Table 1 Overview of data files/data sets

Full size table

Limitations

Previous studies on protein family characterization [e.g., 3, 4, 5, 6] were not integrated in the data presented.
Most protein families other than the TF families in pearl millet are still uncharacterized.

Data availability

The data described in this Data note can be freely and openly accessed on figshare under https://doi.org/10.6084/m9.figshare.21623829. Please see Table 1 and references [26] for details and links to the data.

Abbreviations

TF:: Transcription factor
HMM:: Hidden Markov model

References

Basavaraj G, Rao PP, Bhagavatula S, Ahmed W. Availability and utilization of pearl millet in India. J SAT Agrirc Res. 2010;8:1–6.
Google Scholar
Varshney RK, Shi C, Thudi M, Mariac C, Wallace J, Qi P, et al. Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat Biotechnol. 2017;35:969–76.
Article CAS PubMed PubMed Central Google Scholar
Chanwala J, Satpati S, Dixit A, Parida A, Giri MK, Dey N. Genome-wide identification and expression analysis of WRKY transcription factors in pearl millet (Pennisetum glaucum) under dehydration and salinity stress. BMC Genomics. 2020;21:231.
Article CAS PubMed PubMed Central Google Scholar
Dudhate A, Shinde H, Yu P, Tsugama D, Gupta SK, Liu S, Takano T. Comprehensive analysis of NAC transcription factor family uncovers drought and salinity stress response in pearl millet (Pennisetum glaucum). BMC Genomics. 2021;22:70.
Article CAS PubMed PubMed Central Google Scholar
Jha DK, Chanwala J, Sandeep IS, Dey N. Comprehensive identification and expression analysis of GRAS gene family under abiotic stress and phytohormone treatments in pearl millet. Funct Plant Biol. 2021;48:1039–52.
Article CAS PubMed Google Scholar
Chanwala J, Khadanga B, Jha DK, Sandeep IS, Dey N. MYB transcription factor family in pearl millet: genome-wide identification, evolutionary progression and expression analysis under abiotic stress and phytohormone treatments. Plants (Basel). 2023;12:355.
Article CAS PubMed Google Scholar
Varshney RK, Shi C, Thudi M, Mariac C, Wallace J, Qi P et al. International Pearl Millet Genome Sequencing Consortium (IPMGSC). https://cegresources.icrisat.org/data_public/PearlMillet_Genome/v1.1/. Accessed 27 Nov 2022.
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–9.
Article CAS PubMed Google Scholar
InterPro. https://www.ebi.ac.uk/interpro/download/Pfam/. Accessed 27 Nov 2022.
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121.
Article CAS PubMed PubMed Central Google Scholar
HMMER. http://hmmer.org/. Accessed 27 Nov 2022.
Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48:D265–8.
Article CAS PubMed Google Scholar
Batch CD-Search. https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi. Accessed 27 Nov 2022.
Jiang Y, Wang D, Yao Y, Eubel H, Künzler P, Møller IM, Xu D, MULocDeep. A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation. Comput Struct Biotechnol J. 2021;19:4825–39.
Article CAS PubMed PubMed Central Google Scholar
MULocDeep. https://mu-loc.org/. Accessed 27 Nov 2022.
Jin J, Tian F, Yang DC, Meng YQ, Kong L, Luo J, Gao G. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017;45(D1):D1040–5.
Article CAS PubMed Google Scholar
PlantTFDB. http://planttfdb.gao-lab.org/index.php?sp=Osj. Accessed 27 Nov 2022
PlantTFDB. http://planttfdb.gao-lab.org/index.php?sp=Ath. Accessed 27 Nov 2022
Yates AD, Allen J, Amode RM, Azov AG, Barba M, Becerra A, et al. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic Acids Res. 2022;50:D996–D1003.
Article CAS PubMed Google Scholar
EnsemblPlants. https://plants.ensembl.org/info/data/ftp/index.html. Accessed 27 Nov 2022.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80.
Article CAS PubMed PubMed Central Google Scholar
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol Biol Evol. 2018;35:1547–9.
Article CAS PubMed PubMed Central Google Scholar
Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–6.
Article CAS PubMed PubMed Central Google Scholar
Interactive Tree of Life. https://itol.embl.de/. Accessed 27 Nov 2022.
Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Res. 2015;43:W39–49.
Article CAS PubMed PubMed Central Google Scholar
Tsugama D, Qu Y, Dudhate A, Shinde HS, Takano T. Pearl millet transcription factor family characterization data. figshare. 2022. https://doi.org/10.6084/m9.figshare.21623829

Download references

Acknowledgements

The authors appreciate data and advice from Dr. Shashi Kumar Gupta and his colleagues in International Crops Research Institute for the Semi-Arid Tropics (ICRISAT).

Funding

This work was supported by JSPS (Japan Society for the Promotion of Science) Kakenhi Grant (Grant Number: 19KK0155 and 19H02928).

Author information

Authors and Affiliations

Asian Research Center for Bioresource and Environmental Sciences (ARC-BRES), Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Midori-cho, Nishi-tokyo-shi, 188-0002, Tokyo, Japan
Yingwei Qu, Tetsuo Takano & Daisuke Tsugama
Stowers Institute for Medical Research, 1000 East 50th Street, 64110, Kansas City, issouri, USA
Ambika Dudhate
University of Kentucky, 40506, Lexington, Kentucky, USA
Harshraj Subhash Shinde

Authors

Yingwei Qu
View author publications
You can also search for this author in PubMed Google Scholar
Ambika Dudhate
View author publications
You can also search for this author in PubMed Google Scholar
Harshraj Subhash Shinde
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuo Takano
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Tsugama
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors collected data and wrote the manuscript.

Corresponding author

Correspondence to Daisuke Tsugama.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Qu, Y., Dudhate, A., Shinde, H. et al. Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet. BMC Res Notes 16, 38 (2023). https://doi.org/10.1186/s13104-023-06305-2

Download citation

Received: 27 November 2022
Accepted: 06 March 2023
Published: 20 March 2023
DOI: https://doi.org/10.1186/s13104-023-06305-2

Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet

Abstract

Objectives

Data description

Objective

Data description

Limitations

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

BMC Research Notes

Contact us

Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet

Abstract

Objectives

Data description

Objective

Data description

Limitations

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Research Notes

Contact us