Participants and procedures
The data for the current study were obtained from previous research on the relationships among students’ engagement, burnout, and other related variables. We used a convenient sampling method. Seven Japanese universities and five colleges were invited to participate. Of the 3401 students contacted, 3280 returned the questionnaires. The participating university students were majoring in medical science, nursing, and natural science. One university and two colleges were national institutions, and six universities and three colleges were private. Students were informed orally and in writing on the front page of the questionnaire that participation was voluntary, and that participation refusal would not negatively impact them. All students were assured that their responses were anonymous. Passive consent was used herein, and participants were informed that by submitting their questionnaire, they were consenting to participate in the study. Before or after class, participants were instructed to complete the questionnaire, which included a cover sheet asking for their age, sex, grade, and other relevant questions. Cases with no answers were excluded. The total number of participants included was 3214 students. This research was reviewed and approved by the ethics committee of Chubu University.
Measures
The UWES-S [11] was used in the current study (Additional file 1: Appendix). The 14-item version of the UWES-S was selected over the newer 17-item version of the UWES-S [7] owing to how it has been used internationally. Moreover, the preliminary manual of the UWES-S [7] states that the default three factor model of the 17-item version does not fit to the data well (N = 572, Chi square = 59.99, CFI = 0.85, RMSEA = 0.08). It was ultimately converted to an 11-item version (N = 572, Chi square = 92.75, CFI = 0.95, RMSEA = 0.07). The 11-item version is similar to the 14-item version used herein. The original version of the 14-item scale had three factors: vigor (5 items; e.g., “When I’m studying, I feel mentally strong”), dedication (4 items; e.g., “I find my studies to be full of meaning and purpose”), and absorption (5 items; e.g., “Time flies when I’m studying”) [8]. All items are scored on a 7-point Likert scale ranging from 0 = “never” to 6 = “always”.
The Japanese version of the UWES-S used in this study was constructed by using the back-translation technique.
Analysis
A computer program randomly divided the sample into two groups. Sample 1 included 1607 participants (male = 922, female = 618, others were gender unknown; mean age = 19.85, SD of the age = 2.12); sample 2 included 1607 participants (male = 930; female = 671, others were gender unknown; mean age = 19.79, SD age = 2.11).
First, sample 1 was used for preliminary analysis to examine if item response theory analysis can be applied to the UWES-S. During this stage, the polyserial correlation coefficients were calculated, and an item with a coefficient under 0.20 was removed. Next, confirmatory factor analyses were conducted for the default three-factor structure and for a one-factor structure. The default, three-factor model was the same model that was presented by Schaufeli et al. [8]. The one-factor model was the model where only one latent factor influenced all observed variables. If the one-factor model fit to the data well, item response theory analysis could be conducted without considering the local factors generated by the bi-factor exploratory factor analysis. This process could be followed because this structure is conventionally regarded as one-factor structure. However, in a study by Wefald and Downey, the one-factor model, the two-factor model, and the default three-factor model did not demonstrate satisfactory fit to the data; the authors suggested that the one-factor model was the most parsimonious of the three [12]. Therefore, we implemented a bi-factor exploratory factor analysis as the second best option. In this model, one additional latent factor was installed and the local factors were explored with the data. This additional latent factor is called a general factor. The general factor influences all observed variables, but is not correlated with the local factors that influence the observed variables. If the factor structure with the general factor was confirmed, item response theory analysis could be conducted. Item response theory analysis was conducted using Samejima’s graded response modeling [13] with a two-parameter model. At the end of this stage, items indicating outlier values for each parameter were removed from subsequent analysis.
At the second analytical stage, the resulting items from the first stage were examined using sample 2. The parameters were estimated again and the test information curve represented the amount of test information to consider the characteristics of the scale. It was theoretically described by I (θ) as follows:
$$I\left( \theta \right) = D^{2} \sum\limits_{j = 1}^{n} {a^{2}_{j} P_{j} \left( \theta \right) \, Q_{j} \left( \theta \right)}$$
where θ is the latent trait measured by the scale (i.e., ability parameter), a is the discrimination parameter of each item, P
j
(θ) is item characteristic function, and Q
j
(θ) is calculated by 1 − P
j
(θ), and D is 1.7. P
j
(θ) is calculated as follows:
$$P_{j} \left( \theta \right) \, = \, \left\{ { \, 1 \, + \, exp \, \left[ { - \,1.7a_{j} \left( {\theta - b_{j} } \right) \, } \right] \, } \right\}^{ - 1}$$
where b is the difficulty parameter of each item.
All estimation was calculated by the maximum likelihood estimation. To evaluate the fit of each model to the data, we adopted the following indices: (1) the Chi square statistic, (2) the comparative fit index (CFI) [14], (3) the Tucker –Lewis index (TLI) [15], (4) the root mean square error of approximation (RMSEA) [16], and (5) the Akaike information criterion (AIC) [17]. Previous studies indicated that values for CFI and TLI greater than 0.90 indicate acceptable model-data fit [8, 9]. For RMSEA, values less than 0.08 indicate a satisfactory fit, while those greater than 0.10 signify that the model should be rejected [8]. The internal reliability was evaluated by McDonald’s omega coefficient [18].
Items with difficulty parameters more than an absolute value of 6.8 and discrimination parameters out of the range 0.34–3.4 were regarded as outlier values. Analyses were run via the statistical software R, version 3.3.0. The packages used for the analysis are “psych,” “lavaan,” “ltm,” “polycor,” and their related packages. The logistic model in the “ltm” package defines the measurement factor “D” as “1.0”.