Skip to content


  • Correspondence
  • Open Access

The need for genetic variant naming standards in published abstracts of human genetic association studies

  • 1Email author,
  • 1,
  • 1,
  • 1,
  • 1 and
  • 1
BMC Research Notes20092:56

  • Received: 01 December 2008
  • Accepted: 14 April 2009
  • Published:


We analyzed the use of RefSNP (rs) numbers to identify genetic variants in abstracts of human genetic association studies published from 2001 through 2007. The proportion of abstracts reporting rs numbers increased rapidly but was still only 15% in 2007. We developed a web-based tool called Variant Name Mapper to assist in mapping historical genetic variant names to rs numbers. The consistent use of rs numbers in abstracts that report genetic associations would enhance knowledge synthesis and translation in this field.


  • Genetic Association Study
  • Report Odds Ratio
  • Knowledge Synthesis
  • PubMed Abstract
  • Candidate Gene Association Study


By identifying millions of single nucleotide polymorphisms (SNPs), high-throughput genotyping technology has dramatically boosted the yield of genetic association studies [1]. Translating these data into useful health information depends on systematic review and knowledge synthesis [2]. However, the inconsistent description of key data elements – such as gene names, gene variant names, and measures of association – makes retrieval of published information challenging. Names for genes and polymorphisms are particularly problematic because historical or common names have often been used instead of standard nomenclature [3, 4], particularly in candidate gene association studies.

The National Library of Medicine (NLM) provides free access via PubMed [5] to the most comprehensive repository of biomedical literature abstracts in the world. Thus, the efficiency and sensitivity of scientific literature searches, as well as the robustness of computerized processes for data and text mining, depend closely on the way that information is presented in PubMed abstracts. By using standard names for genes and genetic variants in published abstracts, authors can increase the accessibility, utility, and influence of their findings.

The Human Genome Epidemiology (HuGE) Navigator is an integrated and searchable knowledge base of human genetic associations that have been extracted from PubMed weekly since 2001 by a combination of automatic and manual processes [6]. The curator indexes each new abstract with the relevant HUGO gene symbol(s) [4], so that users can perform gene-specific queries that can also accommodate gene aliases or protein names. For systematic review and synthesis of gene-disease associations, more specific data – at the level of the genetic variant – are required. The National Center for Biotechnology Information (NCBI) has developed the SNP database (dbSNP) [7] as a central repository for SNPs and other genetic variants, each of which is identified by a unique reference cluster number (rs number).

We examined with the HuGE Navigator trends in the reporting of gene variants and odds ratios in PubMed abstracts that were published from 2001 through 2007 (N = 27,132). Overall, 6.3% of abstracts reported rs numbers; 27% reported odds ratios. The proportion of abstracts reporting rs numbers increased substantially (from 1% to 17%) during this period, while the proportion reporting odds ratios remained fairly steady (Fig. 1). Abstracts for genome-wide association studies were more likely than other genetic association studies to include rs numbers (42%) and odds ratios (40%). Conversely, we selected a random 2% sample of all of the extracted PubMed abstracts for hand searching and found that almost all (91%) included common or historical genetic variant names. Matching these common names to the corresponding rs numbers would greatly aid in retrieval and synthesis of genetic association data.
Figure 1
Figure 1

Trends in the percentage of abstracts reporting odds ratios and rs numbers for gene variants, HuGE Navigator database, 2001–2007.

To facilitate the mapping of historical names for genetic variants to their rs numbers, we developed a searchable, web-based database called Variant Name Mapper [8]. This database contains historical names matched with their corresponding rs numbers. These data have been extracted from multiple open-access databases, including: SNP500Cancer [9], SNPedia [10], pharmGKB [11], ALFRED [12], AlzGene [13], PDGene [14], SZgene [15], and LSDBs [16], as well as from our own curated data from the HuGE Navigator. User submissions are also welcome. In the Variant Name Mapper, the user is able to search by historical (common) name of the polymorphism, by rs number, or by gene information (including gene symbol, gene name, and gene alias). The display information includes rs number, common/historical polymorphism names, gene-centered information, and a listing of the data sources [Figure 2]. We evaluated the tool's mapping capacity by entering the common names for genetic variants included in the 2% sample of abstracts described above. Overall, 62% of common names could be mapped to an rs number by using the Variant Name Mapper. This low return may be due to the heterogenous nature of the common names and limitations of the data sources. The content of the database will be continually improved and expanded as new data sources become available.
Figure 2
Figure 2

A screenshot of the Variant Name Mapper.

Genome-wide bioinformatics tools, such as HapMap [17] and the UCSC Genome Browser [18], are most useful to researchers for mining genomic information when data can be linked at the variant level. The Human Genome Variation Society (HGVS) has proposed a comprehensive and systematic nomenclature for the description of genetic variants [19]. The combination of dbSNP accession identifiers (rs numbers) with HGVS nomenclature will be beneficial for standardization. The use of standard nomenclatures (e.g., HUGO for genes, dbSNP for gene variants) and systematic reporting of statistics (e.g., odds ratios) in published abstracts would represent an evolutionary advance in information integration and retrieval, which are the first steps in translating genomic research.



We appreciate valuable comments from Donna Maglott. Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of CDC.

Authors’ Affiliations

Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, Georgia 30341, USA


  1. Kim S, Misra A: SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng. 2007, 9 (289–320): 289-320. 10.1146/annurev.bioeng.9.060906.152037.View ArticlePubMedGoogle Scholar
  2. Khoury MJ, Gwinn M, Yoon PW, Dowling N, Moore CA, Bradley L: The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into health care and disease prevention?. Genet Med. 2007, 9: 665-674.View ArticlePubMedGoogle Scholar
  3. Smigielski EM, Sirotkin K, Ward M, Sherry ST: dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 2000, 28: 352-355. 10.1093/nar/28.1.352.PubMed CentralView ArticlePubMedGoogle Scholar
  4. HUGO Gene Nomenclature. []
  5. PubMed. []
  6. Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ: A navigator for human genome epidemiology. Nat Genet. 2008, 40: 124-125. 10.1038/ng0208-124.View ArticlePubMedGoogle Scholar
  7. dbSNP. []
  8. Variant Name Mapper. []
  9. SNP500Cancer. []
  10. SNPedia. []
  11. pharmGKB. []
  12. ALFRED. []
  13. AlzGene. []
  14. PDGene. []
  15. SZgene. []
  16. LSDBs. []
  17. The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168.Google Scholar
  18. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34: D590-D598. 10.1093/nar/gkj144.PubMed CentralView ArticlePubMedGoogle Scholar
  19. den Dunnen JT, Antonarakis SE: Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat. 2000, 15: 7-12. 10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N.View ArticlePubMedGoogle Scholar


© Yu et al; licensee BioMed Central Ltd. 2009

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.