Methods
Study participants
Study participants were recruited from a group of University of Ottawa medical students who volunteered to participate in a mock OSCE (described below). Fourth year medical students were recruited as examiners, third year students as examinees, and first and second year students as standardized patients (SPs). The same examiners remained throughout all three iterations of the mock OSCE. The examinees and SPs each took part in only one iteration. The study was approved by the Ottawa Health Science Network Research Ethics Board. All study participants provided informed consent. Subject identification numbers were assigned in order to anonymize data. Data collected from non-consenting students were discarded and not included in analysis.
Study OSCE
The mock OSCE was held at the University of Ottawa medical school and consisted of 5 stations which tested history-taking, physical examination, counselling, and management skills. Cases were based on the specialties represented in the Medical Council of Canada Qualifying Examination (MCCQE) Part II, a high-stakes licensure examination. Each station provided 1 min for students to read the prompt, 7 min to complete the station, and 2 min of feedback from the examiner-totaling 10 min. Cases were written by several medical students and were revised by a faculty member (KK). Peer examiners attended a training session prior to the mock OSCE.
Measures
Peer-assessment
Fourth-year examiners rated examinees using a station-specific score sheet consisting of a checklist and a 6-point Likert-type global rating scale (GRS), where 1 = inferior and 6 = excellent. The latter was used as a measure of peer-assessment (PA).
Self-assessment
To measure self-assessment (SA), examinees were prompted to rank their own performance on a GRS prior to receiving feedback in each station.
Data analysis
Two mixed measures analysis of variance (ANOVA) were used to examine the influence of gender (2 levels: male vs. female), assessment format (2 levels: self vs. peer assessment), and station (5 levels: 5 OSCE stations). Gender served as a between-subjects factor, while assessment format and station were within-subject factors. The dependent measures used were the mean GRS score and the mean checklist score. Post-hoc analyses included involved t-tests, all corrected for multiple comparisons using Bonferroni corrections.
Results
Thirty-three (15 males, 18 females) third-year students were included in the analysis. Participants scored themselves lower than their peers [F (1, 31) = 21.04, p < 0.001, \( \eta_{P}^{2} \) = 0.404]. Furthermore, females marked themselves lower than males [F (1, 31) = 9.24, p = 0.005, \( \eta_{P}^{2} \) = 0.230]. The linear model did not show any significant differences in SA-GRS and PA-GRS between stations [F (1, 31) = 0.24, p = 0.887, \( \eta_{P}^{2} \) = 0.001] and did not show any combined interactions between gender, station type, and SA and PA [F (1, 31) = 0.24, p = 0.887, \( \eta_{P}^{2} \) = 0.001].
As outlined in Fig. 1, females had significantly lower SA-GRS scores compared to PA-GRS scores (3.88 vs. 4.67; p < 0.001, d = 1.18), whereas no significant difference was found between SA-GRS and PA-GRS scores for male examinees (4.64 vs 4.80; p = 0.228, d = 0.32). No significant difference existed between male and female students in the achieved checklist (60.32 vs. 56.27; p = 0.828) and GRS scores (4.80 vs. 4.67; p = 0.452).
Discussion
Our study demonstrates that underestimation among females is observable even in a low-stakes setting. Notably, despite the disparity in self-assessment between genders, their overall achievement in the mock OSCE did not differ, corroborating the data in the current literature [6]. Our findings—in conjunction with previous research –are noteworthy for several reasons. Firstly, the presence of female underestimation in a low-stakes setting suggests the potential existence of systemic phenomena within medical school that affect mediators such as self-confidence and anxiety among female students. Colbert-Getz et al. [7] found that high anxiety in a high-stakes OSCE contributed to underestimation in performance among female medical students. Even within this low-stakes setting, anxiety may persist due to the pressure from being assessed by fellow medical students [8] or the perceived novelty of the stations. Secondly, these results suggest that similar performance outcomes between male and female students may not necessarily equate to similar perceptions of performance due to variations in anxiety, confidence, and/or self-efficacy [5,6,7]. Thirdly, socialization within the medical profession may affect male and female trainees differently, potentially contributing to the observed difference in self assessment [9]. Prior research suggests that female medical professionals are more likely to have personal values that are incongruent with institutional values of academic medicine compared to their male counterparts, leading to a reduction in self-confidence and self-efficacy [10]. Whether differences in self-assessment are inherent or acquired upon entry into medical school would be an interesting area of future research.
Curricula should thus move towards recognizing and addressing differences in performance perceptions between genders and promote a more equitable learning experience. A combination of vicarious and personal learning experiences that facilitate the identification of knowledge gaps could help students more accurately appraise their own performance [7].