- Research note
- Open Access
A study of genomic diversity in populations of Maharashtra, India, inferred from 20 autosomal STR markers
BMC Research Notes volume 14, Article number: 69 (2021)
This study was planned to evaluate the genetic diversity in the admixed and Teli (a Hindu caste) populations of Maharashtra, India using 20 autosomal Short Tandem Repeat (STR) genetic markers. We further investigated the genetic relatedness of the studied populations with other Indian populations.
The studied populations showed a wide range of observed heterozygosity viz. 0.690 to 0.918 for the admixed population and 0.696 to 0.942 for the Teli population. This might be due to the multi-directional gene flow. The admixed and Teli populations also showed a high degree polymorphism which ranged from 0.652 to 0.903 and 0.644 to 0.902, respectively. Their combined value of matching probability for all the studied loci was 4.29 × 10–25 and 5.01 × 10–24, respectively. The results of Neighbor-Joining tree and Principal Component Analysis showed that the studied populations clustered with the general populations of Jharkhand, UttarPradesh, Rajasthan and Central Indian States, as well as with the specific populations of Maharashtra (Konkanastha Brahmins) and Tamil Nadu (Kurmans). Overall, the obtained data showed a high degree of forensic efficacy and would be useful for forensic applications as well as genealogical studies.
The state of Maharashtra is located in the western peninsular region of India. It is the third-largest state by area and the second-most populous state in the country. It shares its geographical boundaries with the states of Karnataka and Goa in the South, Telangana in the South-east, Chhattisgarh in the East, Gujarat and Madhya Pradesh in the North, Dadra-Nagar Haveli in the North-west, and the Arabian Sea in the West (Fig. 1). As per the 2011 census, Maharashtra has a population 112,374,333, which contributes to9.28% of the total Indian population. Although 'Marathi' is the native and official language of the state, several regional languages and their dialects are also spoken across Maharashtra, because people from different regions such as Biharis, Gujaratis, Sindhis, Punjabis, Parsis, Marwaris, Kannadas and Tamilians are settled across the state .The population of Maharashtra is so diverse because it served as a geographical margin between Ancestral North India (ANI) and Ancestral South India (ASI) [3, 4], and has witnessed several migration waves over centuries. Interaction between these populations over innumerable generations have subsequently influenced the genomic diversity of the state [3, 5,6,7]. Not only people from varied regions, but people of different castes also reside in Maharashtra(Hindu hierarchical groups) like Teli caste. The 'Teli' community derives its name from Sanskrit word 'talika' or 'taila' which means oil and it indicates towards the traditional occupation of the Teli community which was to extract oil from sesame and mustard seeds. One of the Hindu mythological references of the Teli caste indicates that the first Teli individual was created by 'Lord Shiva' to rub him with oil .
The overall social, cultural, and lingual diversity of the state of Maharashtra led us to evaluate the genomic diversity of the admixed and Teli populations of this state. The genomic data of the selected populations was evaluated using in-silico or computational techniques through various population data software and servers such as GeneMapper™ ID-X, Arlequin v3.5, POPTREE2, PAST 3.02a, etc. The in-silico techniques have served as an efficient approach for the evaluation of very large genomic data sets such as STRs, SNPs, large sequence and NGS data [9,10,11,12] because they could quickly analyze large data sets with high-throughput and accuracy.
To investigate the genetic diversity of the admixed and Teli population of Maharashtra, we randomly selected 158 and 69 unrelated healthy adults, respectively.The subjects in the admixed group belonged to almost all the population groups residing in the state of Maharashtra and hence represented the diverse population of Maharashtra. On the contrary, the subjects in the Teli group were recruited only from theTeli community. An online randomization tool-the randomizer (www.random.org) was used to randomly allocate subjects to each group, prior to the sample collection.
First, an interview was conducted to confirm that each participant’s ancestors have been residing within the geographical boundaries of Maharashtra for more than three generations. Next, blood samples were collected from each participant following the ethical guidelines and the declaration of Helsinki . The collected blood samples were subjected to the Phenol–Chloroform Isoamyl Alcohol (PCIA) organic extraction method for DNA extraction. The extracted DNA was quantified using the PowerQuant® DNAQuantification kit (Promega, Madisson, USA-Promega) in a Real-Time Polymerase Chain Reaction machine (RT-PCR-7500) (Thermo Fisher Scientific, CA, USA) as recommended by the manufacturer(except for the half-reaction volume). A 500 pg DNA template was used to amplify 21 autosomal STR loci using PowerPlex® 21 System (Promega) on Veriti™ 96-Well Fast Thermal Cycler (ThermoFisher Scientific, CA, USA) as per manufacturer’s recommendations(except for the half-reaction volume). The amplified DNA fragments were separated by capillary electrophoresis using POP™-4, 36 cm capillary array and Genetic Analyzer 3500XL (Thermo Fisher Scientific, CA, USA) as recommended by the manufacturer. The allelic ladder provided with the kit was used for the allocation of the allele number at the particular loci. The DNA profile was evaluated using the GeneMapperTMID-X v1.5 software (Thermo Fisher Scientific, CA, USA). Positive and negative controls were used in the experiment to assure the quality control. Additionally, the authors conducting this study have passed the proficiency test conducted by GITAD, Spain (http://gitad.ugr.es/principal.htm).
The obtained genetic data was analyzed using statistical software.The GenAlex 6.5 software  was used to calculate the allele frequencies and the PowerStats v1.2 spreadsheet program  was used to calculate various forensic parameters namely polymorphic information content (PIC), power of discrimination (PD), power of exclusion (PE), matching probability (PM) and paternity index (PI). The observed heterozygosity (Hobs), expected heterozygosity (Hexp) and Hardy–Weinberg equilibrium (HWE) were calculated using the Arlequin v3.5 software . POPTREE2 program  was used to draw neighborjoining (NJ) tree and Nei's genetic distances  among the compared populations. The PAST 3.02a software  was used for the graphical representation of genetic distances among the compared populations, based on the Principal component analysis (PCA). Maximum likelihood (ML) phylogenetic tree was reconstructed as described earlier .
A total of 228 alleles, with an average of 11.4 alleles per locus were observed for the admixed population group, while a total of 194 alleles, with an average of 9.7 alleles per locus were observed for the Teli population group. The locus D3S1358 showed minimum allele number of 5, and loci Penta E and D21S11 showed maximum allele number of 19 in the admixed population group. On the other hand, in the Teli population group, the loci D3S1358 and TPOX showed minimum allele number of 5, and locus Penta E showed maximum allele number of 17. The range of allele frequencies for the admixed and Teli population group were 0.003 to 0.427 and 0.007 to 0.435, respectively. Allele 11 of locus TPOX was observed to be the most frequent allele in both admixed (Table 1) and Teli (Additional file 1: Table S1) population groups. All the studied loci for both population groups followed the Hardy–Weinberg equilibrium after applying Bonferroni correction (P = 0.05/20, at a 95% significance level).
The obtained forensic efficacy parameters for the admixed and Teli populations of Maharashtra are shown in Table 1 and Additional file 1: Table S1, respectively. The locus Penta E was the most polymorphic loci in both the population groups, with a value of 0.903 in the admixed and 0.902 in the Teli population group. In contrast, locus TPOX was the least polymorphic among all the studied loci, with a value of 0.652 in the admixed and 0.644 in the Teli population group. A high range of observed heterozygosity (Hobs) value in the admixed (0.690 to 0.918) group as well as the Teli (0.696 to 0.942) group might have resulted from the inflow of genes in the studied populations from various directions.
The power of discrimination (PD) for the admixed population group ranged from 0.857 (TPOX) to 0.980 (Penta E) and the PD for the Teli population group ranged from 0.849 (TPOX) to 0.974 (Penta E), with the combined value for all the studied loci as 1, for both the groups. In the admixed group, the power of exclusion (PE) ranged from 0.413 (CSF1PO) to 0.832 (D1S1656) with the combined value for all the studied loci as 0.999999998666, whereas in the Teli group,the PE range was 0.422 (D5S818) to 0.882 (D1S1656) with the combined value for all the studied loci as 0.999999999652. The combined value of matching probability for all the studied 20 autosomal STR loci was found to be 4.29 × 10–25 (Table 1) for the admixed group and 5.01 × 10–24 (Additional file 1: Table S1) for the Teli group.
A neighbour joining (NJ) tree (Fig. 2a) based on the Nei's genetic distance, constructed using POPTREE-2 software, was used to investigate the genetic affinity between the studied (admixed andTeli) populations and the reported Indian populations namely, the Konkanastha Brahmins (Maharashtra) ; the Mahadev Kolis (Maharashtra) ; the Iyengars (Tamil Nadu) ; the Kurumans (Tamil Nadu) ; the Yerukulas (Andhra Pradesh) ; the Koras (West Bengal) ; the Baniyas (Punjab) ; the population of Jharkhand ; the population of Uttar Pradesh ; the population of Rajasthan ; the populations of Central India ; and the pooled populations belonging to the geographical boundaries of India . The NJ tree revealed that the studied admixed and Teli populations of Maharashtra pooled into one cluster with the Konkanastha Brahmins of Maharashtra and the Kurumans of Tamil Nadu. The populations of Rajasthan, Uttar Pradesh, Madhya Pradesh and Jharkhand also pooled with the studied populations, which might be the result of ancestral relatedness [3, 4]. The Koras of West Bengal, the Baniyas of Punjab and the pooled populations of Indian geographical region were observed to be the outliers in the NJ tree, which could be attributed to the isolation on the account of distance . The Mahadev Koli population of Maharashtra, despite being geographically close to the studied populations, showed genetic distinction, which might be the result of small effective sample size, the founder effect and drift . The maximum likelihood (ML) phylogram (Fig. 2b) showed consistency with the NJ tree with respect to the scattering pattern of the studied and compared populations. In the NJ tree, three nodes out of eleven had the bootstrap values above 50 percent and the three nodes had the bootstrap values of more than 25 percent. In the case of the ML phylogram, out of the eleven nodes, two had the bootstrap values of over 50 percent and four of the nodes had the bootstrap values higher than 25 percent. Similar patterns in the bootstrap values were observed in the NJ tree and the ML phylogram, suggesting a low level of confidence.
In order to validate the genetic relatedness observed in the NJ tree and the ML phylogram with the low bootstrap values, principal component analysis (PCA) and locus-wise Fst distance calculation between the studied and compared populations were undertaken. In the PCA plot (Fig. 2c), both the studied populations clustered and made patterns similar to those observed in the NJ tree and the ML phylogram. In the case of pair-wise Fst distance, out of 15 loci, the admixed population of Maharashtra showed significant variations at ten loci with the Yerukulas (Andhra Pradesh), at nine loci with the Koras (Bengal), at seven loci with the Mahadev Kolis (Maharashtra), at four loci with the Konkanastha Brahmins (Maharashtra) and the Baniyas (Punjab), at three loci with the pooled Indian populations and the population of Rajasthan, at two loci with the Kurmans (Tamil Nadu), the Central Indian population and the population of Jharkhand, and at one locus with the Iyengars (Tamil Nadu) and the population of Uttar Pradesh. Similarly, the results of pair-wise Fst distance analyses in the Teli population group also showed significant variations at the ten loci with the Koras (Bengal), at six loci with the Mahadev Kolis (Maharashtra), at four loci with the Konkanastha Brahmins (Maharashtra) and the Yerukulas (Andhra Pradesh), at two loci with the Baniyas (Punjab), the population of Jharkhand, the population of Uttar Pradesh and the pooled Indian populations, and at one locus with the Iyengars (Tamil Nadu), the Kurmans (Tamil Nadu), the population of Rajasthan and the Central Indian population. No significant variations were observed in the studied populations among all the compared 15 loci (Additional file 2: Table S2). On the contrary, the Teli population group showed significant similarities at all compared 15 loci, with the admixed population group (Additional file 3: Table S3). Interestingly, both the studied populations showed a similar pattern of Fst distances with the compared populations. The mean Fst value of the studied and compared populations, irrespective of their geographical locations have been shown in Fig. 1a–c.
Overall, the results of the Principal Component Analysis and the Fst distance study were found to be consistent with each other, and support the genetic relatedness observed in the neighbourjoining tree and the maximum likelihood phylogram.
Since the obtained genetic data showed a high degree of polymorphism and forensic efficacy,it might be useful for forensic DNA application, genetic and genealogical studies, and may enrich the national autosomal STR database.
The small sample size was the main limitation of this study. However, the analyzed samples well explain the polymorphic nature of the studied genetic markers and the genetic affinity of the studied population with the previously reported populations. We further propose the use of larger sample size and Next Generation Sequencing (NGS) studies.
Availability of data and materials
Data sets generated during this study are available from the corresponding author on reasonable request.
Phenol–chloroform isoamyl alcohol
Real time—polymerase chain reaction
Polymorphic information content
Power of discrimination
Power of exclusion
Principle component analysis
Chandramouli C, General R. Census of India 2011. Provisional Popul Total New Delhi Gov India. 2011;
Navaneetham K, Dharmalingam A. Demography and development: preliminary interpretations of the. Census. Econ PolitWkly. 2011;2011:13–7.
Debortoli G, Abbatangelo C, Ceballos F, Fortes-Lima C, Norton HL, Ozarkar S, et al. Novel insights on demographic history of tribal and caste groups from West Maharashtra (India) using genome-wide data. Sci Rep. 2020;10(1):1–10.
Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461(7263):489.
Jonnalagadda M, Ozarkar S, Ashma R, Kulkarni S. Skin pigmentation variation among populations of West Maharashtra. India Am J Hum Biol. 2016;28(1):36–43.
Malhotra KC. Population structure among the Dhangar caste-cluster of Maharashtra, India. In: The People of South Asia. Springer; 1984. p. 295–324.
Reddy BM, Tripathy V, Kumar V, Alla N. Molecular genetic perspectives on the Indian social structure. Am J Hum Biol. 2010;22(3):410–7.
R. V. Russell. The Tribes and Castes of the Central Provinces of India Volume III. Macmillan And Co., Limited, St. Martins Street London 1916; 2007. 215–224 p. http://www.gutenberg.org/files/22010/22010-h/22010-h.htm
Yilmaz A, Çetin İ. In Silico Prediction of the Effects of Nonsynonymous Single Nucleotide Polymorphisms in the Human Catechol-O-Methyltransferase (COMT) Gene. Cell Biochem Biophys. 2020;1–13.
Gopalakrishnan C, Jethi S, Kalsi N, Purohit R. Biophysical aspect of huntingtin protein during polyQ: An in silico insight. Cell BiochemBiophys. 2016;74(2):129–39.
Doss CGP, Rajith B, Rajasekaran R, Srajan J, Nagasundaram N, Debajyoti C. In silico analysis of prion protein mutants: A comparative study by molecular dynamics approach. Cell BiochemBiophys. 2013;67(3):1307–18.
Nagasundaram N, Doss CGP. Predicting the impact of single-nucleotide polymorphisms in CDK2–flavopiridol complex by molecular dynamics analysis. Cell BiochemBiophys. 2013;66(3):681–95.
Rickham PP. Human experimentation. Code of ethics of the world medical association. Declaration of Helsinki. Br Med J. 1964;2(5402):177.
Sambrook J, Fritsch EF, Maniatis T. Molecular cloning: a laboratory manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1989. xxxviii + 1546 pp. https://www.cabdirect.org/cabdirect/abstract/19901616061
Peakall ROD, Smouse PE. GENALEX 6: genetic analysis in Excel. Population genetic software for teaching and research. Mol Ecol Notes. 2006;6(1):288–95.
Tereba A. Tools for analysis of population statistics. Profiles DNA. 1999;2:14–6.
Excoffier L, Lischer HEL. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. MolEcolResour. 2010;10(3):564–7.
Takezaki N, Nei M, Tamura K. POPTREE2: Software for constructing population trees from allele frequency data and computing other population statistics with Windows interface. MolBiolEvol. 2009;27(4):747–52.
Nei M. Genetic distance between populations. Am Nat. 1972;106(949):283–92.
Hammer Ø, Harper DAT, Ryan PD. PAST: paleontological statistics software package for education and data analysis palaeontol. Electronica. 2001;4:1–9.
Agrawal S, Khan F. Reconstructing recent human phylogenies with forensic STR loci: a statistical approach. BMC Genet. 2005;6(1):47.
Ghosh T, Kalpana D, Mukerjee S, Mukherjee M, Sharma AK, Nath S, et al. Genetic diversity of autosomal STRs in eleven populations of India. Forensic SciInt Genet. 2011;5(3):259–61.
HimaBindu G, Trivedi R, Kashyap VK. Genotypic polymorphisms at fifteen tetranucleotides and two pentanucleotide repeat loci in four tribal populations of Andhra Pradesh, southern India. J Forensic Sci. 2005;50:978–83.
Singh A, Trivedi R, Kashyap VK. Genetic polymorphism at 15 tetrameric short tandem repeat loci in four aboriginal tribal populations of Bengal. J Forensic Sci. 2006;51(1):183–7.
Giroti R, Talwar I. Diversity and differentiation in Khatris, Banias and Jat Sikhs of Punjab: a study with forensic microsatellites. Ind J Phys Anthr Hum Genet. 2013;32(2):309–28.
Imam J, Reyaz R, Singh RS, Bapuly AK, Shrivastava P. Genomic portrait of population of Jharkhand, India, drawn with 15 autosomal STRs and 17 Y-STRs. Int J Legal Med. 2018;132(1):139–40.
Shrivastava P, Kaitholia K, Kumawat RK, Dixit S, Dash HR, Srivastava A, et al. Forensic effectiveness and genetic distribution of 23 autosomal STRs included in Verifiler Plus TM multiplex in a population sample from Madhya Pradesh. India Int J Legal Med. 2019;12:1–2.
Kumawat RK, Shrivastava P, Shrivastava D, Mathur GK, Dixit S. Genomic blueprint of population of Rajasthan based on autosomal STR markers. Ann Hum Biol. 2020;15:1–6.
Shrivastava P, Jain T, Trivedi VB. Genetic polymorphism study at 15 autosomal locus in central Indian population. Springerplus. 2015;4(1):566.
Singh M, Nandineni MR. Population genetic analyses and evaluation of 22 autosomal STRs in Indian populations. Int J Legal Med. 2017;131(4):971–3.
Chaubey G, Metspalu M, Kivisild T, Villems R. Peopling of South Asia: Investigating the caste-tribe continuum in India. BioEssays. 2007;29(1):91–100.
Palo JU, Ulmanen I, Lukka M, Ellonen P, Sajantila A. Genetic markers and population history: Finland revisited. Eur J Hum Genet. 2009;17(10):1336–46.
We would like to thank the donors who provided samples for this study. We are also thankful to the Shri D.C.Sagar, IPS, Addl. DGP Technical Service, Govt of Madhya Pradesh, India and Director, State Forensic Science Laboratory, Sagar, M.P., for extending the laboratory facility to accomplish the study. The authors acknowledge Promega (India) for providing PP 21 multiplex kit and PowerQuant® System for this study.
There was no fund for this study.
Ethics approval and consent to participate
The study was conducted in compliance with ethical standards and approved by the ethics committee of Banaras Hindu University, Varanasi, India (Ref. No. I.Sc./ECM-XII/2018–19/06). Written informed consent form the volunteer donors were obtained following the declaration of Helsinki. No minor was included in this study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Allele frequencies and forensic parameters for the 20 autosomal STR loci in the Teli population of Maharashtra, India (n=69).
Fst pairwise genetic distances between the admixed population of Maharashtra and the compared populations with their corresponding p-valueFst pairwise genetic distances between the admixed population of Maharashtra and the compared populations with their corresponding p-value.
Fst pairwise genetic distances between the Teli population of Maharashtra and the compared populations with their corresponding p-value.
About this article
Cite this article
Badiye, A., Kapoor, N., Kumawat, R.K. et al. A study of genomic diversity in populations of Maharashtra, India, inferred from 20 autosomal STR markers. BMC Res Notes 14, 69 (2021). https://doi.org/10.1186/s13104-021-05485-z
- Power of discrimination
- Power of exclusion