Reliability and minimal detectable change of the ‘Imperial Spine’ marker set for the evaluation of spinal and lower limb kinematics in adults

Objectives As a step towards the comprehensive evaluation of movement in patients with low back pain, the aim of this study is to design a marker set (three rigid segment spine, pelvic and lower limb model) and evaluate the reliability and minimal detectable change (MDC) of this marker set in healthy adults during gait and sit to stand (STS) tasks using three dimensional motion capture. Results The ‘Imperial Spine’ marker set was used to assess relative peak angles during gait and STS tasks using the minimum recommended sample size (n = 10) for reliability studies with minimum Intraclass Correlation Coefficient (ICC) of 0.70, optimum ICC 0.90 and 9 trials replicated per subject per task. Intra- and inter-tester reliability between an experienced and inexperienced user was examined. ICC, mean, standard error (SEM), Bland Altman 95% limits of agreement (LOA) and MDC were computed. ICC values demonstrated excellent intra- and inter-tester reliability in both tasks, particularly in the sagittal plane (majority ICCs > 0.80). SEM measurements were lower in gait (0.8–5.5°) than STS tasks (1°-12.6°) as were MDC values. LOA demonstrated good agreement. The ‘Imperial Spine’ marker set is reliable for use in healthy adults during functional tasks. Future evaluation in patients is required.


Introduction
The 'Imperial Spine' marker set was developed to assess spinal and lower limb movement or kinematics during functional tasks using three dimensional motion capture (3DMC). To date, spinal movement has been examined in both healthy and patient populations using regional lumbar [1][2][3] or multiple spine segments [4][5][6]. Although, some consider the contribution of the spine, pelvis and lower limbs towards assessing spinal movement, few have analysed the absolute measures of measurement error and minimal detectable change (MDC). MDC describes the amount of change that is greater than the measurement error for each joint and plane of movement [7]. This permits kinematic data to be interpreted in a clinically meaningful manner, enabling the assessment of true differences.
Low back pain (LBP) is an extremely common symptom [8] associated with difficulties walking and sitting to standing (STS) [9]. Since current LBP management is at best, moderately effective [10], it is necessary to consider alternative therapeutic targets. Steps have been taken towards this through the examination of spine and lower limb segment motion in healthy adults using one and two rigid spinal models [7,11]. However, spinal models with more than two rigid segments will be required in order to reliably characterise and interpret movement during activities that are important to LBP patients [12].
This preliminary study builds upon previous research through the development of a three rigid segmented spine, pelvic and bilateral lower limb marker set, the 'Imperial Spine' . The objective of this study is to establish the reliability and MDC values relating to the 'Imperial Spine' in adults during gait and STS tasks using 3DMC as a step towards evaluation in LBP patients.

Methods
The sample size was determined and study design optimised using recommendations previously described ( α = 0.05, β = 0.20) [13]. Healthy adults (4 males, 6 females) were recruited from University staff (mean age 30.8 (25.8-35.8) years, mean body mass index 23.4 (19.0-27.6) kg/m 2 ). Strict criteria ensured that participants had no current or past history of LBP, spine or lower limb extremity trauma, neurological or musculoskeletal history that would affect task performance. Each participant provided written informed consent (REC Ref. 15IC2985).

Reliability testing
Reliability of the 'Imperial Spine' marker set was evaluated using testers with and without prior clinical knowledge; tester 1(JD) (physiotherapist, 16 years clinical experience) and tester 2 (EP) (biomechanist, no prior clinical experience). Prior to subject testing each tester completed training including marker set familiarisation (30 min) and practical training (60 min) using standardised written instruction to reduce tester bias.
Each session comprised of 5 gait and 5 STS trials, 2 of which involved participant familiarisation. The gait task required unshod participants to walk at a comfortable speed over a level 6 m walkway at a self-selected pace. The STS task required participants to stand up from a backless chair with arms crossed, knees initially flexed to 90° and both feet assuming a 'natural stance position' . Participants followed standardised verbal instruction.
Prior to the first session tester 1 (JD1) applied the marker set to participants using double-sided tape. On completion of the tasks, the marker set was systematically removed by tester 1 using alcohol swabs to remove signs of adhesive. An interval of 45 min was observed to ensure participant rest and to engage tester 1 in unrelated activities to minimise memory bias. During the second session the marker set was then re-applied and removed by tester 2 (EP) as described. Following the same interval, tester 1 repeated this sequence (JD2).
Testers were not permitted to observe each other or communicate during testing and were blinded to all kinematic outputs.

The 'Imperial Spine' marker set and data processing
The 'Imperial spine' was modelled in three segments according to easily identifiable anatomical landmarks; upper thoracic (T1-T6), lower thoracic (T7-T12) and lumbar (L1-L5). The upper thoracic (UT) segment was defined with its origin in T6, vertical axis from T6 to T1 (+ y) and horizontal axis through T6 (+ z to the right). The lower thoracic (LT) segment was defined with its origin in T12, vertical axis from T12 to T7 (+ y) and horizontal axis through T12 (+ z to the right). The lumbar (L) segment was defined with its origin in L5, vertical axis from L5 to L1 (+ y) and horizontal axis through L5 (+ z to the right) ( Fig. 1). Pelvic, hip, thigh, shank and foot local co-ordinate systems were also defined and reconstructed from joint centres and easily identifiable anatomical landmarks on the pelvis and lower limb [14][15][16].
Anatomical frames of the pelvis, thigh and shank were referenced to the corresponding technical frames (constructed from technical clusters of markers) in the static calibration trial such that anatomical markers (ASIS, PSIS, MFC, LFC, LMAL, MMAL) ( Fig. 1; Additional file 1: Table S1) could be removed prior to dynamic trial, permitting freedom of movement. All trials were recorded at 100 Hz using a 10-camera 3DMC system (Vicon Nexus (T160), Oxford Metrics Ltd., Oxford, UK) [17].
The onset and cessation of each task were determined using kinematics from each gait [18] and STS motion cycle [12,19]. Each cycle was extracted (Vicon Nexus (T160), Oxford Metrics Ltd., Oxford, UK) and filtered using a Woltring cross-validity quantic spline routine [20]. The data was then normalised to 100% of each motion cycle (MATLAB, Mathworks, Natick, MA., U.S.A.). 3D Kinematics of each segment and joints were calculated using the Joint Coordinate System (JCS) convention [21] and computed using Bodybuilder and Vicon Nexus software (Oxford Metrics Ltd., Oxford U.K.). The average relative peak angles were then extrapolated.

Statistical analysis
The normality of the data was confirmed using Q-Q plots and the Shapiro Wilks test (significance level p ≥ 0.05). Inter-tester and intra-tester ICCs (3, k) (2-way mixed model) and the 95% confidence intervals were derived. ICC values of 0.70 were considered acceptable, 0.75-1.00 excellent, 0.40-0.74 fair to good and ≤ 0.40 poor [22].
The mean peak joint angles (mean session one and two measurements), mean of the differences between measurements at session one and two (Mean Diff ), the respective 95% confidence intervals (95% CI) for these differences, the standard deviation of the differences (SD Diff ) and the 95% levels of agreement (95% LOA) were determined [23] in frontal, sagittal and transverse planes. The standard error of measurement (SEM) was calculated (SEM = SD Diff ÷ √ 2) [24]. The minimal detectable change (MDC), which expresses the amount of joint angle change was also calculated (MDC = 1.96 × √ 2 × SEM) [25]. ICC statistical analysis was conducted using SPSS software (SPSS Statistics Version 22, IBM, Chicago, IL., U.S.A.). A critical level p < 0.05 was defined as significant. Mean, Mean Diff, 95% CI, SD Diff, 95% LOA, SEM and MDC calculations were computed using Microsoft Excel (Excel 2010, Microsoft Corporation, Redmond, WA., U.S.A.).

Gait task
Analysis of the mean peak joint angles for the spine and lower limbs demonstrated that 70% of intra-tester and 76% of inter-tester ICC scores were excellent (0.75-0.99). The remainder ranged between 0.60-0.74 (intra-tester) and 0.50-0.56 (inter-tester). Overall, ICC values were higher in the sagittal plane (for both intra-and intertester reliability), whilst those in the frontal and transverse planes were lower (Table 1). Kinematic waveforms reflect this agreement (Additional file 2: Figure S1, Additional file 3: Figure S2 and Additional file 4: Figure S3).
The SEM values were ≤ 5.3° and ≤ 5.5° for all intra-and inter-tester trials respectively, with 91% of values falling below 5°. The mean differences between sessions for all parameters were lower for intra-tester trials (≤ 0.9°, except 1.3° for peak lumbar abduction/adduction) than Fig. 1 The 'Imperial Spine' marker set, segments and local anatomical frames. For all local anatomical frames, the + y axis (cephalad) is indicated in green, the + z axis (towards the right) in blue and the + x axis (perpendicular to both + y and + z axes) in red ICC (intraclass correlation coefficient), 95% CI (the 95% confidence interval for the ICC), Mean (average angle measured between tester 1 on 2 occasions (intra-tester) and tester 1 and 2 (inter-tester)), Mean Diff (represents the average of the differences between two measurements made by tester 1 on two occasions (intra-tester) and between tester 1 and 2 (inter-tester)) and the 95% CI for Mean Diff, SD Diff (the standard deviation of the differences), 95% LOA ( Bland and Altman 95% limits of agreement), SEM (standard error of measurement) and MDC (absolute minimal detectable change) inter-tester trials (≤ 1.4°, except 3.4° for peak hip internal/external rotation). The MDC values ranged between 2.4 and 4.7° (intra-tester) and 2.4°-15.3° (inter-tester). Bland Altman 95% limits of agreement for both intratester and inter-tester trials are outlined in Table 1.

STS task
The mean peak joint angles for the spine and lower limbs demonstrated ICC ranges of − 0.82-0.98 (intra-tester) and − 0.52-0.97 (inter-tester). 76% of intra-tester and 52% of inter-tester ICC scores indicated excellent reliability (0.75-0.99). ICC values were higher in the sagittal plane; 0.83-0.98 (intra-tester) and 0.89-0.97 (inter-tester, except 0.52 at the ankle) and lower in the transverse and frontal planes (− 0.2 to 0.89) ( Table 2). Kinematic waveforms reflect this agreement (Additional file 2: Figure S1, Additional file 3: Figure S2 and Additional file 4: Figure  S3). SEM values were ≤ 5° for intra-and inter-tester trials respectively with the exception of pelvic tilt, rotation, hip flexion/extension and ab/adduction and lumbar flexion/ extension (SEM range: 5.1-12.6°, with the largest error in pelvic rotation). Similar to the gait task, mean differences for all parameters between sessions were lower for intra-tester trials (≤ 3.9°) than inter-tester trials (≤ 5.3°). The range of MDC values was wider in the STS task (2.9-34.9° (intra-tester) and 3.6-25.6° (inter-tester)) compared to the gait task with the highest values relating to pelvic rotation in both cases.
Bland Altman 95% limits of agreement for both intratester and inter-tester STS trials are outlined in Table 2.

Discussion
To our knowledge, reliability has not been previously examined amongst experienced and inexperienced testers during both gait and STS tasks using a three rigid segmented spine, pelvic and lower limb model in adults. Similar gait studies, which focussed on a two rigid spine segment model with lower limbs but without pelvic outputs [11,26], also found small mean intra-tester differences (Mean Diff Intra-tester). The 'Imperial Spine' (3 rigid spine segment model including pelvic and lower limb outputs) builds upon this; inter-tester kinematics differences (Mean Diff Inter-tester) were low within both gait and STS tasks.
Systematic reviews of the reliability of 3DMC kinematic measurements have demonstrated that reliability varies between studies due to methodological variation [1,27], which makes direct comparison difficult. Overall, ICCs are reported to be above 0.7 for most range of movement parameters [1] and are highest within the sagittal plane [27]. These findings concur with this current study (median ICC for gait and STS tasks > 0.89 for intra-and inter-tester data) and that of more recent work [28,29].
Transverse plane measurements are typically less reliable (median ICC < 0.72) [27]. However, using the 'Imperial Spine' , the median values are increased in both transverse and frontal planes (median ICC > 0.80 for gait and STS task for intra-and inter-tester data) with the exception of transverse and frontal plane inter-tester ICCs for the STS tasks (median 0.60 and 0.62 respectively). To our knowledge, this has not been investigated until now in healthy adults using a multi-segmental spine and bilateral lower limb model.
In agreement with this current study, higher intratester than inter-tester reliability is reported [27] and may represent a difference in tester experience [7]. It is proposed that errors between 2 and 5° are acceptable [27]. In this current study the SEM for STS tasks (intra-and inter-tester) was higher than this, as one would expect for a task requiring through range movement (SEM range 1.0-7.8, except for peak pelvic rotation), and was lower in gait trials (SEM range 0. 8-5.5). Although, the corresponding MDC ranges approximate values recently cited during gait and STS tasks [28,29], the MDC range in our study was wider during STS.
Despite unavoidable and well documented errors implicated in 3DMC, these findings indicate that it should be possible to reliably establish kinematic differences using the 'Imperial Spine' . In order to identify potential therapeutic targets, further testing will be required in LBP patients.

Limitations
Although a pragmatic sample size was used in this study [13], reflecting that of previous reliability trials [1,5,30], the authors recognise that an increased sample size would have further enhanced reliability and MDC outcomes. Participants were examined by each tester following a 45 min rest period, which could also be considered a limitation. This was necessary to ensure that measurements were made at the same time of day to ensure that the diurnal changes of the spine (disc hydration) in this cohort or changes in movement over time did not account for the changes observed.
It is important to note that the reported error in the 'Imperial Spine' relates to healthy participants and Table 2 STS task ICC (intraclass correlation coefficient), 95% CI (the 95% confidence interval for the ICC), Mean (average angle measured between tester 1 on 2 occasions (intra-tester) and tester 1 and 2 (inter-tester)), Mean Diff (represents the average of the differences between two measurements made by tester 1 on two occasions (intra-tester) and between tester 1 and 2 (inter-tester)) and the 95% CI for Mean Diff, SD Diff (the standard deviation of the differences), 95% LOA ( Bland and Altman 95% limits of agreement), SEM (standard error of measurement) and MDC (absolute minimal detectable change)