The first Malay database toward the ethnic-specific target molecular variation

Background The Malaysian Node of the Human Variome Project (MyHVP) is one of the eighteen official Human Variome Project (HVP) country-specific nodes. Since its inception in 9th October 2010, MyHVP has attracted the significant number of Malaysian clinicians and researchers to participate and contribute their data to this project. MyHVP also act as the center of coordination for genotypic and phenotypic variation studies of the Malaysian population. A specialized database was developed to store and manage the data based on genetic variations which also associated with health and disease of Malaysian ethnic groups. This ethnic-specific database is called the Malaysian Node of the Human Variome Project database (MyHVPDb). Findings Currently, MyHVPDb provides only information about the genetic variations and mutations found in the Malays. In the near future, it will expand for the other Malaysian ethnics as well. The data sets are specified based on diseases or genetic mutation types which have three main subcategories: Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV) followed by the mutations which code for the common diseases among Malaysians. MyHVPDb has been open to the local researchers, academicians and students through the registration at the portal of MyHVP (http://hvpmalaysia.kk.usm.my/mhgvc/index.php?id=register). Conclusions This database would be useful for clinicians and researchers who are interested in doing a study on genomics population and genetic diseases in order to obtain up-to-date and accurate information regarding the population-specific variations and also useful for those in countries with similar ethnic background.

Studying the variations in the human genome represents new horizons in genetic research, which would help to reduce health problems and develop new strategies towards designing better diagnostic and preventive approaches. Many general and locus specific mutation disease databases have been established [1][2][3][4]. It is also well known that different ethnic backgrounds may have different disease-causing mutation(s) and variation(s). Therefore, population specific databases are beneficial not only for future surveys, but also for those conducting studies in the etiology of genetic disorders and distribution of the mutations.
With the advent of the SNPs-based microarray, a huge amount of data was added to the growing body of knowledge for each specific disease. The role of SNPs in the etiology of the genetic disorders, particularly in complex traits has been well accepted [5][6][7]. The SNPs may contribute to the genetic disorder or they may be in linkage disequilibrium with other casual variants and mutations. Different genetic backgrounds may have different susceptibility to the haploid in the sufficiency of variants or mutations. In addition, the variable prevalence of the same genetic disorder among various populations suggests the contribution of different individual genetic variants for each ethnic background. Therefore, establishment of a country-based molecular variation database will facilitate the accessibility of researchers to the genetic differences between various ethnic groups and ancestries. Such database will help to trace the population diversity, history and disease susceptibility for each sub-population [8]. MyHVP is a continuation of the Human Variome Project (HVP) which was detained in Beijing Meeting report at 2011 also, could be considered as an initial report in the updating of the Malaysian Node as an ongoing process. Therefore, MyHVPDb does not create the new databases; however, it will be continuously updating the database to be more comprehensive and reliable. In consequence, the MyHVPDb attempts to address the lack of coordinated effort in collecting and compiling genomic variations and common Mendelian disorders that might be associated with the multi-ethnic Malaysian population.

Ethnic background and common disorders in Malaysia
Malaysia is located in Southeast Asia and separated into two regions by the South China Sea, namely Peninsular Malaysia and East Malaysia (the latter is composed of Sabah and Sarawak). It has land borders with Thailand, Indonesia and Brunei also, maritime borders with Singapore, Vietnam and the Philippines. Malaysia is a multi-ethnic country with three major ethnic groups (Malay, Chinese and Indian); aborigines (Orang Asli which consist of Proto Malays, Negrito and Senoi); Sabahans and Sarawakian (Major sub-ethnic groups of Sabahan are Kadazan/Dusun, Bajau and Murut; followed by urban, Bidayuh and Melanau respectively as Major sub-ethnic groups of Sarawakian). Each ethnic group is further divided into sub-ethnic groups, representing the existing diversity of Malaysian population ( Figure 1). The total population is 28.3 million, where Malays comprised 63% of the total population, followed by Chinese (28%), Indians (8%), and other ethnic groups (1%) [9].
According to the Ministry of Health Malaysia, heart related disorders were the most common causes of death (16.09%), followed by Septicaemia (13.82%) and Malignant Neoplasms (10.85%) [10]. Regardless of ethnic groups, breast, colorectal, lung, cervix and nasopharynx cancers were the five most common malignancies among the population of Peninsular Malaysia. Leukemia is the most common cancer among children below 14 years old. Mendelian genetic disorders such as thalassemia [11], Duchenne muscular atrophy [12], spinal muscular atrophy [13], retinoblastoma [14], G6PD deficiency [15] and orofacial clefts [16] are also relatively common in the country.
The different ethnic group is known to have risks of certain diseases, for example thalassemia in Southeast Asia, sickle cell anemia in Negroid population, and hemochromatosis in Jews [2]. Genetically, spectrum of mutations differ according to different ethnic for the same respected gene and disorder. Therefore, the MyHVPDb intentionally includes mutational data for communal gene disease among Malaysians ethnics.

Database construction and implementation
In order to ensure that the stored data sets in Malaysian Node of the Human Variome Project Database (MyHVPDb) can be effectively shared, core elements of the data, followed by the standard nomenclature similarly adopted by international databases and the standard HVP database architecture were recruited ( Figure 2). Data sharing was done automatically using standardized descriptors and controlled vocabularies via the HVP Data Aggregator. Data Aggregator will allow HVP Country Nodes to automatically share existing data for possible with every gene/disease specific database in the Human Variome Project. All of this sharing was accomplished through a single data link to the aggregator, thus HVP Country Nodes do not need to establish and maintain the links to every database.
Additionally, the content and structure of MyHVPDb follows guidelines given by NCBI SNPs Database (http:// www.ncbi.nlm.nih.gov/projects/SNP/). The overall design of MyHVPDb was based on three-tier architecture model (client, web and database) as shown in Figure 3. The database was developed using Hypertext Markup Language (HTML), Cascading Style Sheet (CSS), Java Script, Hypertext Preprocessor (PHP) and Structured Query Language (SQL). PHP scripts were used as a Common Gateway Interface (CGI) for sending and receiving data between the front end user/client and the database server. To query data, users need to enter or choose a keyword such as SNP ID or chromosome number. The Apache web server will then transform the query into SQL for it to be requested from MySQL database.

Database access
The MyHVPDb can be accessed through http://hvpmalaysia. kk.usm.my which consists of the database administrator and user portal. The user portal has been divided into two status of privilege access: advanced and basic, whereby basic users will have some limitations in the scope of available data for viewing (Table 1).
A new user must register before being allowed to access the databases. Should there be any doubt regarding the information given in the registration fields, the database administrator will contact the user through email to request further information. Upon approval by the database administrator, the new user will be allowed to access the database as preferably requested.

Querying the database
For the SNP database, there are two methods of searching the SNPs datasets, either by 'SNP ID' or 'Chromosome  Number'. It is easier to search the database using 'Chromosome Number' for the advanced level user because it provides information on the sub-ethnicity group in addition to the chromosome numbers. By using the specific search, it is easier to retrieve accurate results of the SNP associated with a particular sub-ethnic group. However, for the basic level user, no information about the sub -ethnicity group is made available.
All outputs from the search result either by 'SNP ID' or 'Chromosome Number' will be linked to the NCBI SNP database based on SNP ID in order to assist users in obtaining further information pertaining to the DNA sequences. The purpose of connecting to the NCBI SNP database is to ensure a high quality of the stored data and its reliability.

Data submission
Submissions of new data could be carried out using the submission form provided on the website. The completed form should be sent to: 1mhgvc.hvp.secretariat@gmail.com.
We applied for the ethical approval through the

Database content
MyHVPDb is categorized into three main components:  (2)  Currently, the data from other ethnic groups are undergoing the analysis and will be soon available in the database. Search based on SNP ID will display information including sub-ethnicity by combining in a single page ( Figure 4). Therefore, it would be easier for users to compare the SNP information of each sub-ethnicity. Comparing the allele frequency of the genome wide SNP and genetic relationships among the population can be reconstructed by providing important clues about how humans adapted to changing climatic and nutritional environment. Moreover, some of these adaptations have important medical relevance, as they can be linked to differential disease susceptibility. ii. CNV database This component of MyHVPDb provides the copy number variation (CNV) and disease gene(s). This database currently consists of CNV of the SMN2 gene associated with spinal muscular atrophy in Malaysia. We are analysing the CNVs based on healthy individuals who belong to different ethnicities and sub-ethnicities. iii. Mutational database This database stores the data regarding to the specific gene(s) and mutation(s) that are related to Malaysian common diseases. It now consists of 143 mutations from 16 genes and 13 diseases. Some of the disease group are cancer, enzyme deficiency, infectious diseases, metabolic disease and hematological diseases. These diseases have been extensively studied with regards to its susceptibility and association with Malaysian population's genetic makeup in order to provide a robust platform for diagnosis to enhance research opportunities and treatment development.
These Malaysian mutational data were extracted from published scientific articles of studies conducted on the Malaysian population. The available information in the database is Gene Name, Disease, Position of Mutation, Nomenclature, Type of Mutation, Locus of Mutation, Description of Mutation and Effect of Mutation.

Conclusion
To date, the members of the MyHVP consist of 68 individuals from 12 Malaysian Universities and academic institutions. MyHVP also received support from professional Societies including the Genetics Society of Malaysia, the Malaysian Society of Human Genetics, the Medical Genetics Society of Malaysia, and the Malaysian Society of Bioinformatics and Computational Biology. Since its launch 3 years ago, MyHVP has attracted an increasing number of researchers and scientists to participate in this project across Malaysia. We have succeeded in developing an online database for the mutation of genes related to diseases discovered in Malaysia and also the Malay whole genome SNP database. MyHVP marks a new horizon of genetics activities in Malaysia and provides the specific database that relates to the Malaysian population. This database will be a useful resource for others countries with a similar ethnic groups.
The aim of this project would be the development of Southeast Asia (SEA) node of HVP, (HVP SEA node) which is expected to communicate the individuals by proper collaboration on the diagnostics and clinical care issues within Malaysia and SEA. The HVP SEA node will be established by taking Malaysia as the regional role model in genomic research and diagnostic services especially for the developing countries in SEA. MyHVP is able to assist these countries through human capital development by providing the proper trainings and educations also, increase the public awareness about the importance of genetics and genomics, as main health care contributors.
The Human Variome Project addresses global data sharing on its vision. This vision can be achieved by emphasizing to the four areas of activities for the Human Variome Project, which are set normative functions, behaving ethically, sharing knowledge and building capacity. MyHVP was established on these four foundations, and focused on sharing all information on genetic variations. This will ultimately lead to speedier, better and cheaper diagnosis followed by treatment of genetic disorders as well as better insight into the causes, severity and effect of common disease.
MyHVP Database is not only useful for those who are interested in general/specific mutations or disease specific database such as colorectal cancer, breast cancer or thalassemia; however, it would be useful as the future reference for the other researchers globally.