- Data Note
- Open Access
Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet
BMC Research Notes volume 16, Article number: 38 (2023)
Pearl millet (Pennisetum glaucum) is a cereal crop that is tolerant to a high temperature, a drought and a nutrient-poor condition. Characterizing pearl millet proteins can help to improve productivity of pearl millet and other crops. Transcription factors in general are proteins that regulate transcription of their target genes and thereby regulate diverse processes. Some transcription factor families in pearl millet were characterized in previous studies, but most of them are not. The objective of the data presented was to characterize amino acid sequences for most transcription factors in pearl millet.
Sequences of 2395 pearl millet proteins that have transcription factor-associated domains were extracted. Subcellular and suborganellar localization of these proteins was predicted by MULocDeep. Conserved domains in these sequences were confirmed by CD-Search. These proteins were classified into 85 families on the basis of those conserved domains. A phylogenetic tree including pearl millet proteins and their counterparts in Arabidopsis thaliana and rice was constructed for each of these families. Sequence motifs were identified by MEME for each of these families.
Pearl millet (Pennisetum glaucum) is a staple cereal crop that is tolerant to a high temperature, a drought and a poor-nutrient condition and that is produced in semi-arid regions . Characterization of pearl millet genes can help to better understand pearl millet stress tolerance and to improve productivity of pearl millet and other crops. The whole genome sequence of pearl millet was released previously . On the basis of this sequence, pearl millet gene or protein families such as a WRKY transcription factor (TF) family, an NAC (NAM, ATAF and CUC) TF family, a GRAS TF family and a MYB TF family have been identified and characterized [3,4,5,6]. However, most pearl millet protein families are uncharacterized. TFs in general regulate transcription of multiple genes and thus can act as hubs for diverse processes. TFs can therefore be useful as either a transgene in genetic modification or a target of genome editing for improving plant performance. The objective of the data presented was to characterize amino acid sequences of most pearl millet TFs.
Amino acid sequences for all pearl millet proteins deduced from its whole genome sequence  were downloaded from the International Pearl Millet Genome Sequencing Consortium website . Hidden Markov models (HMMs) for protein families in the Pfam database  were downloaded from an InterPro website . HMMs in those amino acid sequences were detected by the hmmscan program in HMMER (version 3.3) [10, 11]. On the basis of the detected HMMs, 2395 sequences were regarded as the sequences for putative pearl millet TFs and these were classified into 85 families. Conserved domains in these TFs were confirmed by Batch CD-Search [12, 13]. Subcellular and suborganellar localization of these TFs was predicted by MULocDeep [14, 15]. Amino acid sequences of rice (Oryza sativa ssp. japonica) and Arabidopsis thaliana TFs were downloaded from a PlantTFDB website [16,17,18]. For the families that were not available in PlantTFDB, amino acid sequences of all rice (O. sativa ssp. indica) and Arabidopsis proteins were downloaded from an Ensembl Plants website [19, 20] and used for hmmscan as described above to identify proteins in those families. For each of these families except the 13 families which contain less than five members, the sequences from pearl millet, rice and Arabidopsis were aligned by ClustalW  and a phylogenetic tree file was obtained with the neighbor-joining method on the MEGA X software . The phylogenetic tree was visualized on the Interactive Tree of Life (iTOL) online tool (version 6) [23, 24]. For each of the 84 families identified, motifs in the pearl millet amino acid sequences were identified de novo by the MEME program (version 5.5.0) . Data obtained by these analyses were deposited in the figshare repository (Table 1) .
Previous studies on protein family characterization [e.g., 3, 4, 5, 6] were not integrated in the data presented.
Most protein families other than the TF families in pearl millet are still uncharacterized.
The data described in this Data note can be freely and openly accessed on figshare under https://doi.org/10.6084/m9.figshare.21623829. Please see Table 1 and references  for details and links to the data.
Hidden Markov model
Basavaraj G, Rao PP, Bhagavatula S, Ahmed W. Availability and utilization of pearl millet in India. J SAT Agrirc Res. 2010;8:1–6.
Varshney RK, Shi C, Thudi M, Mariac C, Wallace J, Qi P, et al. Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat Biotechnol. 2017;35:969–76.
Chanwala J, Satpati S, Dixit A, Parida A, Giri MK, Dey N. Genome-wide identification and expression analysis of WRKY transcription factors in pearl millet (Pennisetum glaucum) under dehydration and salinity stress. BMC Genomics. 2020;21:231.
Dudhate A, Shinde H, Yu P, Tsugama D, Gupta SK, Liu S, Takano T. Comprehensive analysis of NAC transcription factor family uncovers drought and salinity stress response in pearl millet (Pennisetum glaucum). BMC Genomics. 2021;22:70.
Jha DK, Chanwala J, Sandeep IS, Dey N. Comprehensive identification and expression analysis of GRAS gene family under abiotic stress and phytohormone treatments in pearl millet. Funct Plant Biol. 2021;48:1039–52.
Chanwala J, Khadanga B, Jha DK, Sandeep IS, Dey N. MYB transcription factor family in pearl millet: genome-wide identification, evolutionary progression and expression analysis under abiotic stress and phytohormone treatments. Plants (Basel). 2023;12:355.
Varshney RK, Shi C, Thudi M, Mariac C, Wallace J, Qi P et al. International Pearl Millet Genome Sequencing Consortium (IPMGSC). https://cegresources.icrisat.org/data_public/PearlMillet_Genome/v1.1/. Accessed 27 Nov 2022.
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–9.
InterPro. https://www.ebi.ac.uk/interpro/download/Pfam/. Accessed 27 Nov 2022.
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121.
HMMER. http://hmmer.org/. Accessed 27 Nov 2022.
Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48:D265–8.
Batch CD-Search. https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi. Accessed 27 Nov 2022.
Jiang Y, Wang D, Yao Y, Eubel H, Künzler P, Møller IM, Xu D, MULocDeep. A deep-learning framework for protein subcellular and suborganellar localization prediction with residue-level interpretation. Comput Struct Biotechnol J. 2021;19:4825–39.
MULocDeep. https://mu-loc.org/. Accessed 27 Nov 2022.
Jin J, Tian F, Yang DC, Meng YQ, Kong L, Luo J, Gao G. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017;45(D1):D1040–5.
PlantTFDB. http://planttfdb.gao-lab.org/index.php?sp=Osj. Accessed 27 Nov 2022
PlantTFDB. http://planttfdb.gao-lab.org/index.php?sp=Ath. Accessed 27 Nov 2022
Yates AD, Allen J, Amode RM, Azov AG, Barba M, Becerra A, et al. Ensembl Genomes 2022: an expanding genome resource for non-vertebrates. Nucleic Acids Res. 2022;50:D996–D1003.
EnsemblPlants. https://plants.ensembl.org/info/data/ftp/index.html. Accessed 27 Nov 2022.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms. Mol Biol Evol. 2018;35:1547–9.
Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–6.
Interactive Tree of Life. https://itol.embl.de/. Accessed 27 Nov 2022.
Bailey TL, Johnson J, Grant CE, Noble WS. The MEME suite. Nucleic Acids Res. 2015;43:W39–49.
Tsugama D, Qu Y, Dudhate A, Shinde HS, Takano T. Pearl millet transcription factor family characterization data. figshare. 2022. https://doi.org/10.6084/m9.figshare.21623829
The authors appreciate data and advice from Dr. Shashi Kumar Gupta and his colleagues in International Crops Research Institute for the Semi-Arid Tropics (ICRISAT).
This work was supported by JSPS (Japan Society for the Promotion of Science) Kakenhi Grant (Grant Number: 19KK0155 and 19H02928).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Qu, Y., Dudhate, A., Shinde, H. et al. Phylogenetic trees, conserved motifs and predicted subcellular localization for transcription factor families in pearl millet. BMC Res Notes 16, 38 (2023). https://doi.org/10.1186/s13104-023-06305-2
- Pearl millet
- Transcription factor
- Phylogenetic analysis
- Protein family
- Subcellular localization
- Protein domain