This is a study on concordance, sensitivity, specificity and accuracy of PM and GS in returning articles published between 2002 and 2009 on risk factors for sarcoma. The starting year was chosen because in late 2002 the World Health Organization published a new classification of sarcoma that gained widespread acceptance [5].
The papers shared by PM and GS - when adopting the same keywords and limits - were used to calculate a measure of concordance. Sensitivity, specificity and accuracy were assessed by using different searching strategies. The first was designed to be more sensitive and less specific: it was broader and could include both descriptive (ecological or clinical case series) and analytical (case control or cohort) epidemiological studies. The second strategy was instead more specific but less sensitive, focusing mainly on epidemiological analytical studies.
With two search engines (PM and GS) and two search strategies, there were four scenarios: GS1; GS2; PM1; PM2.
• GS1 started choosing all articles reporting the words "sarcoma", "incidence", and "case" (see string 1 in the box). Studies published before 2002 were excluded by using a special tool of GS. The list of papers was printed. In order to eliminate experimental animal studies and clinical studies on therapy and prognosis, retrieved articles presenting words such as "prognosis", "treat", "surg", "therapy", "efficacy", "survival", "chemotherapy", "mussel", "bivalve", "dog", "veterin", "cat", "feline", "bird", "avian", "fish", "mice", "rat", "mouse", "guinea", "rabbit", "ocean", "Kaposi's", "osteosarcoma", "Rous" (see string 2 in the box) were removed, but only after reading the title. Among the remaining articles, those not reporting "sarcoma" in the title or abstract, or being citations of books, abstract conferences, letters to the editor, or editorials were discarded. Interface was English, but pages written in any tongue were searched. Papers in language other than English were eventually ruled out.
• GS2 used the same search method as GS1, except that the searched terms were "sarcoma", "incidence", "case", and "risk" (see strings 3 and 4 in Additional file 1: Box).
• PM1 started by selecting articles reporting the words "sarcoma", "incidence", and "case" (see string 5 in the box). Furthermore, the following "Limits" were chosen: "Humans"; "English language"; "Classical article" as type of article; and "January 1st 2002 - April 22th 2009 as specific date range. After printing the list, the papers were scrutinized for the words "prognosis", "treat", "surg", "therapy", "efficacy", "survival", "chemotherapy", "Kaposi's", "osteosarcoma", "Rous" (see string 6 in box). Again, the same "Limits" as above were applied. The remaining articles were inspected for the presence of the word "sarcoma" in either the title or the abstract and those not reporting "sarcoma" in the title or the abstract were discarded. Clinical studies on diagnosis/therapy/prognosis were eventually discarded.
• PM2 used the same search method as in PM1, except that the searched terms were "sarcoma", "incidence", "case", and "risk" (see strings 7 and 8 in Additional file 1: Box).
After abstract selections were agreed upon, the filtered papers were evaluated by two independent reviewers in order to establish whether the risk factors for sarcoma were investigated or not. In the latter instance papers were mainly case reports or described new diagnostic devices (e.g. molecular biology or imaging techniques). Disagreements between reviewers concerning classification of articles were resolved by discussion and input from a third reviewer. Finally, the whole number of 168 (= 63+42+46+17) studies collected was reduced to 111 (common list) by excluding those shared by the parent lists GS1, GS2, PM1, and PM2.
Statistical analysis
The common list was broken down into a series of two-by-two tables, in which columns were headed "Yes RF" and "No RF" - depending on whether the risk factors (RF) for sarcoma were or were not investigated - and rows were GS1 (or, in turn, GS2, PM1, PM2) and the remaining sources altogether. In such tables we calculated sensitivity, specificity, precision and accuracy.
The sensitivity for a given strategy is defined as the proportion of articles retrieved that are scientifically sound and clinically relevant (high quality articles); specificity is the proportion of lower quality articles (did not meet criteria) that are not retrieved; precision is the proportion of retrieved articles that meet criteria (equivalent to positive predictive value in diagnostic test terminology); and accuracy is the proportion of all articles that are correctly dealt with by the strategy (articles that met criteria and were retrieved plus articles that did not meet criteria and were not retrieved divided by all articles in the database) [6].
The common list was also broken down to show papers shared by pairs of bibliographic sources (GS 1 and GS2; PM1 and PM2; GS and PM) that used the same key words and limits. The agreement between pairs was calculated using the Dunn's method [7].