The need for genetic variant naming standards in published abstracts of human genetic association studies
© Yu et al; licensee BioMed Central Ltd. 2009
Received: 01 December 2008
Accepted: 14 April 2009
Published: 14 April 2009
We analyzed the use of RefSNP (rs) numbers to identify genetic variants in abstracts of human genetic association studies published from 2001 through 2007. The proportion of abstracts reporting rs numbers increased rapidly but was still only 15% in 2007. We developed a web-based tool called Variant Name Mapper to assist in mapping historical genetic variant names to rs numbers. The consistent use of rs numbers in abstracts that report genetic associations would enhance knowledge synthesis and translation in this field.
By identifying millions of single nucleotide polymorphisms (SNPs), high-throughput genotyping technology has dramatically boosted the yield of genetic association studies . Translating these data into useful health information depends on systematic review and knowledge synthesis . However, the inconsistent description of key data elements – such as gene names, gene variant names, and measures of association – makes retrieval of published information challenging. Names for genes and polymorphisms are particularly problematic because historical or common names have often been used instead of standard nomenclature [3, 4], particularly in candidate gene association studies.
The National Library of Medicine (NLM) provides free access via PubMed  to the most comprehensive repository of biomedical literature abstracts in the world. Thus, the efficiency and sensitivity of scientific literature searches, as well as the robustness of computerized processes for data and text mining, depend closely on the way that information is presented in PubMed abstracts. By using standard names for genes and genetic variants in published abstracts, authors can increase the accessibility, utility, and influence of their findings.
The Human Genome Epidemiology (HuGE) Navigator is an integrated and searchable knowledge base of human genetic associations that have been extracted from PubMed weekly since 2001 by a combination of automatic and manual processes . The curator indexes each new abstract with the relevant HUGO gene symbol(s) , so that users can perform gene-specific queries that can also accommodate gene aliases or protein names. For systematic review and synthesis of gene-disease associations, more specific data – at the level of the genetic variant – are required. The National Center for Biotechnology Information (NCBI) has developed the SNP database (dbSNP)  as a central repository for SNPs and other genetic variants, each of which is identified by a unique reference cluster number (rs number).
Genome-wide bioinformatics tools, such as HapMap  and the UCSC Genome Browser , are most useful to researchers for mining genomic information when data can be linked at the variant level. The Human Genome Variation Society (HGVS) has proposed a comprehensive and systematic nomenclature for the description of genetic variants . The combination of dbSNP accession identifiers (rs numbers) with HGVS nomenclature will be beneficial for standardization. The use of standard nomenclatures (e.g., HUGO for genes, dbSNP for gene variants) and systematic reporting of statistics (e.g., odds ratios) in published abstracts would represent an evolutionary advance in information integration and retrieval, which are the first steps in translating genomic research.
We appreciate valuable comments from Donna Maglott. Disclaimer: The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of CDC.
- Kim S, Misra A: SNP genotyping: technologies and biomedical applications. Annu Rev Biomed Eng. 2007, 9 (289–320): 289-320. 10.1146/annurev.bioeng.9.060906.152037.View ArticlePubMedGoogle Scholar
- Khoury MJ, Gwinn M, Yoon PW, Dowling N, Moore CA, Bradley L: The continuum of translation research in genomic medicine: how can we accelerate the appropriate integration of human genome discoveries into health care and disease prevention?. Genet Med. 2007, 9: 665-674.View ArticlePubMedGoogle Scholar
- Smigielski EM, Sirotkin K, Ward M, Sherry ST: dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 2000, 28: 352-355. 10.1093/nar/28.1.352.PubMed CentralView ArticlePubMedGoogle Scholar
- HUGO Gene Nomenclature. [http://www.gene.ucl.ac.uk/nomenclature]
- PubMed. [http://www.ncbi.nlm.nih.gov/entrez]
- Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ: A navigator for human genome epidemiology. Nat Genet. 2008, 40: 124-125. 10.1038/ng0208-124.View ArticlePubMedGoogle Scholar
- dbSNP. [http://www.ncbi.nlm.nih.gov/projects/SNP/]
- Variant Name Mapper. [http://www.hugenavigator.net/HuGENavigator/startPageMapper.do]
- SNP500Cancer. [http://snp500cancer.nci.nih.gov/home_1.cfm]
- SNPedia. [http://www.snpedia.com/index.php/SNPedia]
- pharmGKB. [http://www.pharmgkb.org/]
- ALFRED. [http://alfred.med.yale.edu/alfred/]
- AlzGene. [http://www.alzforum.org/res/com/gen/alzgene/default.asp]
- PDGene. [http://www.pdgene.org/]
- SZgene. [http://www.schizophreniaforum.org/res/sczgene/default.asp]
- LSDBs. [http://www.hgvs.org/dblist/glsdb.html]
- The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168.
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al: The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006, 34: D590-D598. 10.1093/nar/gkj144.PubMed CentralView ArticlePubMedGoogle Scholar
- den Dunnen JT, Antonarakis SE: Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat. 2000, 15: 7-12. 10.1002/(SICI)1098-1004(200001)15:1<7::AID-HUMU4>3.0.CO;2-N.View ArticlePubMedGoogle Scholar