Situational judgment test as an additional tool in a medical admission test: an observational investigation
© Luschin-Ebengreuth et al.; licensee BioMed Central. 2015
Received: 4 November 2014
Accepted: 24 February 2015
Published: 14 March 2015
In the framework of medical university admission procedures the assessment of non-cognitive abilities is increasingly demanded. As tool for assessing personal qualities or the ability to handle theoretical social constructs in complex situations, the Situational Judgment Test (SJT), among other measurement instruments, is discussed in the literature. This study focuses on the development and the results of the SJT as part of the admission test for the study of human medicine and dentistry at one medical university in Austria.
Observational investigation focusing on the results of the SJT. 4741 applicants were included in the study. To yield comparable results for the different test parts, “relative scores” for each test part were calculated. Performance differences between women and men in the various test parts are analyzed using effect sizes based on comparison of mean values (Cohen’s d). The associations between the relative scores achieved in the various test parts were assessed by computing pairwise linear correlation coefficients between all test parts and visualized by bivariate scatterplots.
Among successful candidates, men consistently outperform women. Men perform better in physics and mathematics. Women perform better in the SJT part. The least discriminatory test part was the SJT. A strong correlation between biology and chemistry and moderate correlations between the other test parts except SJT is obvious. The relative scores are not symmetrically distributed.
The cognitive loading of the performed SJTs points to the low correlation between the SJTs and cognitive abilities. Adding the SJT part into the admission test, in order to cover more than only knowledge and understanding of natural sciences among the applicants has been quite successful.
letters of recommendation as well as personal and autobiographical statements – whose reliability or predictive validity to date was not yet confirmed .
A further assessment instrument is the Situational Judgment Test (SJT) [13,14]. The SJT assesses – as McDaniel et al.  summarize in their meta-analysis – a plurality of constructs [13,15]. Following this result, O’Connell et al.  recommend to interpret SJTs best as measurement methods and not measures of a single construct . At any rate, the SJT is attested validity as a predictor for future job performance  and – assuming that relevant work-related situations are described – face and content validity [17,18].
As the only one of the three Austrian medical universities, the Medical University of Graz has amended its admission process (cognitive testing with the subsections biology, chemistry, physics and mathematics as well as the testing of text comprehension) by including a written Situational Judgment Test (SJT) in the year 2010 [19-21].
Distributions of applicants as well as of successful applicants according to sex and nationality in three consecutive academic years
Successful applicants from
2010 - 2012
Admission examination measures: cognitive test & situational judgment test
Mean relative scores showing the performance of women and men in the various test parts
Cohen’s d #
(.11 – .32)
(.05 – .24)
(.10 – .29)
(.11 – .33)
(.15 – .34)
(.23 – .43)
(.30 – .51)
(.37 – .57)
(.36 – .55)
(.16 – .38)
(.36 – .56)
(.38 – .58)
(−.02 – .19)
(.05 – .25)
(.08 – .28)
(−.25 – -.04)
(−.28 – -.09)
(−.32 – -.12)
Situational judgment test
the comprehensible context/the possible reference to basic statements of the bio-psycho-social model (information regarding the bio-psycho-social model was made available to all college applicants with a notice regarding its relevance for the test),
the degree of difficulty (no medical (pre)-knowledge is necessary for responding) and
Phase 2: Critical evaluation and extension of possible courses of action of the situational descriptions – included in the further process – by professors and lecturers.
Phase 3: Evaluation of the courses of action by the steering committee (professors/lecturers/psychologists) and discussion about or determination of the sequence of potential courses of action by the steering committee together with the core team.
Phase 4: Performance of a pre-test, again modification of the SJT items, taking into account the results of the pre-test. Final revision and approval .
Perceptions of the admission examination by the examinees
In 2010, after having completed the admission test, the applicants were invited to provide an evaluation of certain aspects of the procedure. For each part of the admission test, they were asked – among other questions – for their subjective judgment of the difficulty as well as of the importance within the admission test and the importance for their prospective future career in medicine. The candidates were given the opportunity to provide their rating on a 6-point scale (1 = not difficult at all, 6 = very difficult/1 = not meaningful at all, 6 = very meaningful). All data were made anonymous in order to eliminate any retracing.
For each test item, the index of discrimination describing the correlation of that index with the total test is computed. These indices of discrimination are then aggregated for the knowledge test (combined results on biology, chemistry, physics and mathematics), text comprehension test and SJT, separately for each year.
Computing relative scores this way ensures that they can range from 0.0 (all items of a test part falsely answered) to 1.0 (all items of a test part correctly answered). (Other normalizing schemes like z-scoring would have been possible; qualitative aspects of the results and conclusions probably would remain basically unchanged).
Basic statistical analyses of these relative scores are performed using the usual descriptive statistical techniques as well as correlation analysis. Performance differences between women and men in the various test parts are analyzed using effect sizes based on comparison of mean values (Cohen’s d) because due to the high frequency of observations even very small differences of mean values become statistically significant in terms of usually employed P-values. Cohen’s d values are generally interpreted as follows: d ≤ 0.2 indicates a weak effect, d > 0.5 indicates a strong effect, and 0.2 < d ≤ 0.5, a moderate effect.
The associations between the relative scores achieved in the various test parts were assessed by computing pairwise linear correlation coefficients between all test parts and visualized by bivariate scatterplots.
All statistical analyses are performed using STATA 13 software (StataCorp. LP, College Station, TX, USA).
The authors gathered anonymized data from a data set that is routinely collected about medical students’ admission, dropout, and graduation dates and examination history, as required by the Austrian Federal Ministry of Science and Research. Because the data were anonymous and no data beyond those required by law were collected for this study, the Medical University of Graz’s ethical approval committee did not require approval for this study.
Results and discussion
For the academic years 2010/11 to 2012/13, Table 1 shows basic data on the admission tests at the Medical University of Graz. As already described in an earlier publication , there are consistently more women than men among the applicants. This corresponds extensively with the communicated data on admission processes for Europe. Tiffin et al.  describe, for example, that for the UK, women – in relation to the UK population – are over-represented in medical school intakes . In contrast to this, the data from North America indicate a decrease in female applicants .
Table 2 shows the relative scores obtained by women and men in the different test parts as well as the effect size of sex. As can be seen from the mean values of the relative scores, among the natural science parts, physics is the most difficult test part (with the smallest relative scores), while biology, chemistry and mathematics present similar difficulties to the test applicants. Men perform considerably better in physics and mathematics: one result that is confirmed by all public medical universities in Austria [27,28] and discussed internationally, e.g., for physics and biology [2,25,29]. In the literature, stereotyping, different risk behavior in men and women, the factor time or testing anxiety, among other things, are listed as reasons for the gender gap in high stakes tests [24,29]. While in text comprehension men still perform slightly better than women, the reverse is true in SJT; here the negative values of Cohen’s d indicate consistent better performances of women with weak to moderate effect size. The 95% confidence intervals of Cohen’s d show that the observed effect t sizes are significantly different from zero in all cases, with the single exception of text comprehension in 2010/11; here, the confidence interval contains zero.
Indices of discrimination of the test parts
Mean item discrimination indices of the test parts, grouped per year of admission test
Pairwise linear correlation coefficients between relative scores on the various text parts, sorted by year of admission test *
a) Admission test 2010 (N = 1353)
b) Admission test 2011 (N = 1702)
c) Admission test 2012 (N = 1686)
Perceptions of the admission examination
Inclusion of the SJT in an admission procedure for medical studies which previously was nearly exclusively based on scientific knowledge was demonstrated to be organizationally feasible in the presented manner. Moreover, the subjective responses of the applicants were quite positive, probably because of the felt relevance for the future study as well as profession. The lack of significant correlations between the other test parts and the SJT indicated that the spectrum of competencies tested was indeed broadened by inclusion of the SJT; a fact that seemed highly desirable in view of the overwhelming contribution of natural science knowledge to the admission test in the past.
- Emery JL, Bell JF, Vidal Rodeiro CL. The BioMedical admissions test for medical student selection: issues of fairness and bias. Med Teach. 2011;33(1):62–71.View ArticlePubMedGoogle Scholar
- Cuddy MM, Swanson DB, Clauser BE. A multilevel analysis of examinee gender and USMLE step 1 performance. Acad Med. 2008;83(10 Suppl):S58–62.View ArticlePubMedGoogle Scholar
- Hurwitz S, Kelly B, Powis D, Smyth R, Lewin T. The desirable qualities of future doctors-A study of medical student perceptions. Med Teacher. 2013;(0):e1-8.Google Scholar
- Lumsden MA, Bore M, Millar K, Jack R, Powis D. Assessment of personal qualities in relation to admission to medical school. Med Educ. 2005;39(3):258–65.View ArticlePubMedGoogle Scholar
- Albanese MA, Snow MH, Skochelak SE, Huggett KN, Farrell PM. Assessing personal qualities in medical school admissions. Acad Med. 2003;78(3):313–21.View ArticlePubMedGoogle Scholar
- Shulruf B, Poole P, Wang GY, Rudland J, Wilkinson T. How well do selection tools predict performance later in a medical programme? Adv Health Sci Educ Theory Pract. 2012;17(5):615–26. doi:10.1007/s10459-011-9324-1.View ArticlePubMedGoogle Scholar
- McGaghie WC. Assessing readiness for medical education: evolution of the medical college admission test. JAMA. 2002;288(9):1085–90. http://dx.doi.org/10.1001/jama.288.9.1085.View ArticlePubMedGoogle Scholar
- Wilson IG, Roberts C, Flynn EM, Griffin B. Only the best: medical student selection in Australia. Med J Aust. 2012;196(5):357.View ArticlePubMedGoogle Scholar
- Lievens F. Adjusting medical school admission: assessing interpersonal skills using situational judgement tests. Med Educ. 2013;47(2):182–9. doi:10.1111/medu.12089.View ArticlePubMedGoogle Scholar
- Oates K, Goulston K. How to select the doctors of the future. Intern Med J. 2012;42(4):364–9. doi:10.1111/j.1445-5994.2012.02729.x.View ArticlePubMedGoogle Scholar
- Siu E, Reiter HI. Overview: what’s worked and what hasn’t as a guide towards predictive admissions tool development. Adv Health Sci Educ. 2009;14(5):759–75.View ArticleGoogle Scholar
- Prideaux D, Roberts C, Eva K, Centeno A, McCrorie P, McManus C, et al. Assessment for selection for the health care professions and specialty training: consensus statement and recommendations from the Ottawa 2010 conference. Med Teach. 2011;33(3):215–23. http://informahealthcare.com/doi/abs/10.3109/0142159X.2011.551560.View ArticlePubMedGoogle Scholar
- McDaniel MA, Morgeson FP, Finnegan EB, Campion MA, Braverman EP. Use of situational judgment tests to predict job performance: a clarification of the literature. J Appl Psychol. 2001;86(4):730.View ArticlePubMedGoogle Scholar
- Cabrera MAM, Nguyen NT. Situational judgment tests: a review of practice and constructs assessed. Int J Select Assess. 2001;9(1–2):103–13. doi:10.1111/1468-2389.00167.View ArticleGoogle Scholar
- McDaniel MA, Hartman NS, Whetzel DL, Grubb WL. Situational judgment tests, response instructions, and validity: a meta-analysis. Pers Psychol. 2007;60(1):63–91. doi:10.1111/j.1744-6570.2007.00065.x.View ArticleGoogle Scholar
- O’Connell MS, Hartman NS, McDaniel MA, Grubb WL, Lawrence A. Incremental validity of situational judgment tests for task and contextual job performance. Int J Sel Assess. 2007;15(1):19–29.View ArticleGoogle Scholar
- Whetzel DL, McDaniel MA, Nguyen NT. Subgroup differences in situational judgment test performance: a meta-analysis. Hum Perform. 2008;21(3):291–309.View ArticleGoogle Scholar
- Cleland J, Dowell J, McLachlan J, Nicholson S, Patterson F. Research report identifying best practice in the selection of medical students (literature review and interview survey). 2012.Google Scholar
- Reibnegger G, Caluba HC, Ithaler D, Manhal S, Neges HM, Smolle J. Progress of medical students after open admission or admission based on knowledge tests. Med Educ. 2010;44(2):205–14.View ArticlePubMedGoogle Scholar
- Sinha R, Oswald F, Imus A, Schmitt N. Criterion-focused approach to reducing adverse impact in college admissions. Appl Meas Educ. 2011;24(2):137–61.View ArticleGoogle Scholar
- Lievens F, Sackett PR. The validity of interpersonal skills assessment via situational judgment tests for predicting academic success and job performance. J Appl Psychol. 2012;97(2):460–8.View ArticlePubMedGoogle Scholar
- Bergman ME, Drasgow F, Donovan MA, Henning JB, Juraska SE. Scoring situational judgment tests: once you get the data, your troubles begin. Int J Sel Assess. 2006;14(3):223–35.View ArticleGoogle Scholar
- Lievens F, Sackett PR. Situational judgment tests in high-stakes settings: issues and strategies with generating alternate forms. J Appl Psychol. 2007;92(4):1043–55. doi:10.1037/0021-9010.92.4.1043.View ArticlePubMedGoogle Scholar
- Habersack M, Dimai HP, Ithaler D, Reibnegger G. Time: an underestimated variable in minimizing the gender gap in medical college admission scores. Wiener klinische Wochenschrift. 2014. doi:10.1007/s00508-014-0649-7.Google Scholar
- Tiffin PA, Dowell JS, McLachlan JC. Widening access to UK medical education for under-represented socioeconomic groups: modelling the impact of the UKCAT in the 2009 cohort. BMJ. 2012;344:e1805. http://dx.doi.org/10.1136/bmj.e1805.View ArticlePubMed CentralPubMedGoogle Scholar
- Grbic D, Brewer RL. Which factors predict the likelihood of reapplying to medical school? An analysis by gender. Acad Med. 2012;87(4):449–57.View ArticlePubMedGoogle Scholar
- Kraft HG, Lamina C, Kluckner T, Wild C, Prodinger WM. Paradise lost or paradise regained? Changes in admission system affect academic performance and drop-out rates of medical students. Med Teacher. 2012;e1-7.Google Scholar
- Statistische Berichte zum EMS in Innsbruck und Wien [database on the Internet]. Medizinische Universität Wien. 2011. Available from: http://www.unifr.ch/ztd/ems/doc/Bericht_EMSAT11.pdf. Accessed.
- Fields HW, Fields AM, Beck FM. The impact of gender on high-stakes dental evaluations. J Dent Educ. 2003;67(6):654–60.PubMedGoogle Scholar
- Hänsgen K, Spicher B. EMS. 2006.Google Scholar
- Marentette BJ, Meyers LS, Hurtz GM, Kuang DC. Order effects on situational judgment test items: a case of construct-irrelevant difficulty. Int J Sel Assess. 2012;20(3):319–32. doi:10.1111/j.1468-2389.2012.00603.x.View ArticleGoogle Scholar
- Oswald FL, Schmitt N, Kim BH, Ramsay LJ, Gillespie MA. Developing a biodata measure and situational judgment inventory as predictors of college student performance. J Appl Psychol. 2004;89(2):187.View ArticlePubMedGoogle Scholar
- Lievens F, Sackett PR. Video-based versus written situational judgment tests: a comparison in terms of predictive validity. J Appl Psychol. 2006;91(5):1181.View ArticlePubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.