We invited all of our graduates for this year—97 students—to participate in the interns’ award. These potential participants were students who had completed medical school and began year-long internship rotation at different hospitals. All the interns who had finished their medical programme at RCSI Bahrain and started their first-year internship rotation were contacted by email. They were informed of the award competition and its purpose, requirements, selection criteria, and a guide for implementing the MSF process. Interns were sent an email with a form that consisted of three tables to be completed by eight nominated colleagues from each of the three different categories: interns, chief resident/consultant, and co-workers/nurses. In addition to these three categories, evaluation forms and a self-evaluation form were expected to be completed.
The nomination form entailed some details about the raters, including: position, job title, department, and email address. Only sixteen interns were interested in applying for the award and each submitted their nominees’ list to an independent administrative team at RCSI Bahrain. The independent administrative team sent the evaluation forms to the raters and requested them to complete the forms and send them back. Each evaluator was given a month to complete and send the forms by e-mail or in person to the administrative team at RCSI-Bahrain. Given a sufficient period of time for completing the evaluation forms, raters who did not submit their forms were contacted, as a reminder, by means of a second email through the administrative team. The independent administrative team was responsible for distributing the instruments electronically, collecting them, anonymizing the forms using a number code for each intern, and inputting all data into Microsoft Excel Worksheet.
This study had three groups of people who rated the candidates: nurses, physicians, and student colleagues. The candidates selected eight individuals from each of these three categories, and the researchers randomly selected five out of these eight. Five members of each of these three groups were therefore responsible for rating each candidate.
Instrument
This study made use of the Bahrain Defence Force Instrument for professionalism, communication, and collaboration (BDF/PCC). It was established using several factors: the physician achievement review instrument (PAR) [12, 13]; the Maastricht list for history-taking and advice scoring instrument (MAAS-Global) [14], the Calgary-Cambridge tool, which measures communication abilities [15], the Sheffield peer review assessment tool (SPRAT) [1], the assessment of interprofessional team collaboration scale (AITCS) [16], and the opinions of specialists. The instrument focusses on the evaluation of professionalism, collaboration, and communication skills.
Previous studies were used to establish validity (face and content) for the BDF/PCC instrument [11]. It included 39 items, 15 of which measured professionalism, 13 of which to measured communication skills, and 11 of which measured collaboration. It was designed such that various groups of people, such as interns, consultants, senior medical colleagues, and coworkers, could all use it. It used a 5-point response scale, such that (1) meant “among the worst”; (2) meant “bottom half”; (3) meant “average”; (4) meant “top half”; and (5) meant “among the best”. There was also an option to provide “unable to assess” (UA) as a response.
Statistical analysis
This study used several statistical analyses to answer the research questions. Mean and standard deviation were calculated for the total responses for each participant to determine who scored the highest. To ascertain the level of feasibility of the BDF/PCC instrument, we used both the rate of response and the number of responders necessary to obtain reliable results [1, 13].
To find the appropriate groupings of items on the survey, explanatory factor analysis was used. For each survey item, a factor was assigned, and it was given a loading factor equal or greater than 0.40. Whenever an item was cross-loaded (that is, loaded on 2 or more factors), it was given to the highest among the factors it was loaded on. To determine how many factors to extract, the Kaiser rule was used (that is, eigenvalues > 1.0). If an item was loaded on more than one factor (cross-loading), the item was assigned to the highest factor where it was loaded. The number of factors to be extracted was based on the Kaiser rule that eigenvalues are greater than 1.0 [17].
It was also necessary to determine how homogeneous each composite scale was. To do so, we calculated item-total correlations, with corrections for overlap [18]. An item was considered to measure the same construct as other composite scale items if and only if its total correlation coefficient was 0.3 or higher. We also used Pearson’s correlation coefficient for estimating inter-scale correlations, to find how much the scales overlapped [19].
To determine internal consistency and reliability, Cronbach’s coefficient—which is a common way of evaluating internal consistency—was used for each factor and each scale individually [18]. Next, a generalizability analysis was used to find the Ep2 and to make sure that enough questions were given and enough evaluators were used for there to be stable and accurate data for every intern. Previous studies showed that if Ep2 is 0.70 or higher, the data are stable; otherwise, there must be more items on the list or more responders in order to obtain adequate stability [11, 20].
Responders
The responders for this study were organized into three groups: nurses, physicians, and fellow students. In order to be eligible to be a responder, they needed to have spent at least 1 or 2 months working alongside the graduate. Participants were asked to select eight individuals from each category, and the investigators randomly chose five out of these eight individuals, so that five individuals from each of the three above-mentioned groups rated each respondent. Different interns had different numbers of observers, and this difference was determined how many raters’ responses there were.