Automated PGP9.5 immunofluorescence staining: a valuable tool in the assessment of small fiber neuropathy?

Background In this study we explored the possibility of automating the PGP9.5 immunofluorescence staining assay for the diagnosis of small fiber neuropathy using skin punch biopsies. The laboratory developed test (LDT) was subjected to a validation strategy as required by good laboratory practice guidelines and compared to the well-established gold standard method approved by the European Federation of Neurological Societies (EFNS). To facilitate automation, the use of thinner sections. (16 µm) was evaluated. Biopsies from previously published studies were used. The aim was to evaluate the diagnostic performance of the LDT compared to the gold standard. We focused on technical aspects to reach high-quality standardization of the PGP9.5 assay and finally evaluate its potential for use in large scale batch testing. Results We first studied linear nerve fiber densities in skin of healthy volunteers to establish reference ranges, and compared our LDT using the modifications to the EFNS counting rule to the gold standard in visualizing and quantifying the epidermal nerve fiber network. As the LDT requires the use of 16 µm tissue sections, a higher incidence of intra-epidermal nerve fiber fragments and a lower incidence of secondary branches were detected. Nevertheless, the LDT showed excellent concordance with the gold standard method. Next, the diagnostic performance and yield of the LDT were explored and challenged to the gold standard using skin punch biopsies of capsaicin treated subjects, and patients with diabetic polyneuropathy. The LDT reached good agreement with the gold standard in identifying small fiber neuropathy. The reduction of section thickness from 50 to 16 µm resulted in a significantly lower visualization of the three-dimensional epidermal nerve fiber network, as expected. However, the diagnostic performance of the LDT was adequate as characterized by a sensitivity and specificity of 80 and 64 %, respectively. Conclusions This study, designed as a proof of principle, indicated that the LDT is an accurate, robust and automated assay, which adequately and reliably identifies patients presenting with small fiber neuropathy, and therefore has potential for use in large scale clinical studies.


Background
PGP9.5 immunostaining of intra-epidermal nerve fibers in 50 µm sections is widely accepted as the gold standard for the diagnosis of small fiber neuropathy.
The assay is used to visualize the number and morphology of the somatic, small caliber, intra-epidermal nerve fibers, and supported by the European Federation of Neurological Societies (EFNS) [1]. The use of skin biopsies as a diagnostic tool in peripheral neuropathies has increased in the last decades, and is regarded as a reliable and standardized tool [1][2][3][4][5][6][7]. The Polyneuropathy Task Force [3] concluded that the gold standard was diagnostically efficient at distinguishing polyneuropathy patients [including small fiber neuropathy (SFN)] from normal subjects as controls. Skin biopsy immunostaining can help to detect aberrations in somatic nerves in neuropathies that were formerly classified as autonomic, such as Ross syndrome [8,9]. It was [4] concluded that the intra-epidermal nerve fiber density (IENF) assessment has proven its value as a measure for treatment success and for follow-up in clinical trials [10][11][12][13]. The EFNS Task Force developed guidelines for the diagnostic use of skin biopsies in peripheral neuropathies, published in 2005 [1]. The recommendations include the use of 3-mm punch skin biopsy from the distal leg, fixed in Zamboni solution and quantified for linear nerve fiber density in at least three consecutive 50 µm sections, after PGP9.5 immunohistochemical or immunofluorescence staining. In the counting rules it was emphasized to only include IENF crossing the dermal-epidermal junction, while excluding secondary/ tertiary branching from the quantification [1]. From the perspective of use of skin biopsies in clinical trials, the EFNS-advised method is time-consuming and challenging for large scale batch testing in a standardized manner. One way of approaching this challenge is to fully automate the staining procedure.
For each laboratory developed test (LDT), which aims to identify disease and is intended for accreditation and use in clinical trial settings, assay performance characteristics must be established as required by CAP/CLIA [14] and local accreditation boards [15]. Good laboratory practice requires that accuracy, precision, analytical sensitivity, analytical specificity, reportable range and reference intervals are established [16,17]. Biopsies from subjects clearly showing intra-epidermal nerve fiber reduction, as confirmed by conventional diagnostic tools, were selected from two studies conducted earlier by Ragé and colleagues [18,19]. In the first study, the experimental model of reversible capsaicin-induced small SFN [10,11,20] was examined in healthy subjects using the laser evoked potentials (LEPs) in comparison to the linear nerve fiber density in skin. The second study aimed at investigating the diagnostic performance of LEPs in the assessment of small nerve fiber loss in asymptomatic diabetic neuropathy. From this study, diabetic subjects presenting with distal polyneuropathy were included. In addition, skin biopsies of healthy subjects were studied to indicate achievable reference ranges, and reveal possible discrepancies and quantitative caveats between the LDT and the gold standard method.
In this study we developed an automated PGP9.5 LDT, to complement the gold standard, and validated its potential for use in large scale clinical studies.

Human samples
Fourteen skin punch biopsies are obtained from healthy volunteers (n = 14, age group 33-52 years). Five biopsy specimens, from diabetic subjects presenting poly-neuropathy (n = 5), obtained from the study described by Ragé and coworkers [19] were included in the study. In addition, two biopsy specimens obtained from healthy volunteers who received topical capsaicin application on three consecutive 24-h cycles were examined as well [18]. Studies were approved by the Local Ethics Committee; informed consent was obtained for all participants.

Skin punch biopsy processing
Skin punch biopsies (diameter 2-4 mm) were performed under local anesthesia at the lateral aspect of the distal leg in a clinical unit. Skin biopsies were fixed in cold Zamboni fixative solution (60 min at room temperature), cryopreserved in sucrose 30 % and frozen in OCT compound (Sakura TissueTek Europe, The Netherlands) as recommended by the EFNS guidelines [1,21]. After freezing, tissue-blocks were stored at −80 °C prior to sectioning.

Accuracy confirmation by western blot analysis and double immunofluorescence histochemistry
Western Blot analysis was performed by SDS-page on cell lysates from A549 (lung carcinoma) and U87 (glioblastoma) cell lines exhibiting high PGP9.5 expression, to verify the accuracy of the rabbit polyclonal anti-human PGP9.5 antibody (RA95101, UltraClone Ltd., UK). Subsequently, western transfer of proteins was performed onto a transfer membrane (Immobilon FL, Li-Cor, Germany). After incubation of the rabbit polyclonal anti-human PGP9.5 antibody (1/300), visualization was performed using IRDye ® 800CW Conjugated Goat (polyclonal) antirabbit IgG antibody on the Odyssey ® Infrared Imaging System (Li-Cor).

Automatization of PGP9.5 Immunofluorescence staining on Ventana discovery XT ®
Six consecutive 16 µm cryosections were produced and collected on dry ice. The staining protocol was customprogrammed using RESEARCH IHC QD Map XT software of the Discovery XT ® (Ventana, USA). After loading the slides into the instrument, incubation was performed using the rabbit polyclonal anti-human PGP9.5 primary antibody for 2 h. Subsequently, the visualization was established using a Cy3-labeled secondary goat anti-rabbit antibody and counterstained with Hoechst 33342 as described above.

Quality control
For all specimens, internal controls, i.e. autonomic nerve fibers innervating sweat glands and m. arrector pili, were evaluated for positive staining and subsequent acceptance of each individual section. As a negative control, rabbit immunoglobulins (confirm negative control rabbit, Ventana) were used to replace the primary antibody.

Imaging and quantification
The quantification of the linear density of nerve fibers was performed using a conventional fluorescence microscope (20×-40×; Zeiss, Germany) by two readers blinded for treatment. Virtual images were generated using the Axiovision Mozaik Imaging Software (Axiovision Rel 4.8 ® ) on an Axioplan 2 imaging microscope equipped with a motorized stage and z-stack features (Zeiss, Germany) and used to perform length measurements. Individual counts were divided by the length of the epidermis and expressed as mean numbers/mm (± SD).

Gold standard counting rule
The EFNS guidelines [1,3,23] prescribe to count each nerve fiber crossing the dermal-epidermal junction as a single unit. Nerve fibers that are approaching the junction without crossing it or nerve fiber fragments lying free in the epidermis are not enumerated, nor are branches.

LDT modifications to the gold standard counting rule
To critically evaluate the LDT, and the effect of reducing the section thickness, the following additions to the gold standard counting rule were implemented on both the LDT and gold standard stained slides. The epidermal nerve fibers, traditionally named IENF, include all nerve fibers crossing the dermal-epidermal junction ( Fig. 1) as per the gold standard counting rule. In order to accurately define existing discrepancies in the LDTs ability of visualizing the 3D-IENF network, incidence of secondary branching of IENF was taken into account. Therefore, intra-epidermal nerve fiber crossing dermal-epidermal junction as single fiber (IENF_Si) ( Fig. 1: 1A) and intraepidermal nerve fiber crossing dermal-epidermal junction as fiber showing branching (IENF_Br) ( Fig. 1: 1B) as being epidermal nerve fibers crossing the dermal-epidermal junction as single or branching fibers are reported. Since the LDT uses thinner tissue sections, and a higher incidence of IENF fragments is to be expected, intraepidermal nerve fiber fragments that do not cross the dermal-epidermal junction free intra-epidermal nerve fiber fragments (IENF_F, Fig. 1: 2A, 2B) were included. These fragments were not included in the EFNS guidelines but have been proven to be valuable [24]. Fragments are considered as such based on morphological characteristics and an approximate minimal length of 5 µm. For IENF_F, the incidence of secondary branching of IENF was taken into account as well, resulting in IENF_FSi ( Fig. 1: 2A) and IENF_FBr ( Fig. 1: 2B). The actual number of branches on each nerve fiber was recorded separately. The total number of epidermal nerve fibers (total ENF, Fig. 1) represented the sum of IENF and IENF_F.

LDT method validation
Besides the confirmation that the anti-human PGP9.5 antibody accurately detects its target, we established reference intervals and defined LDTs discrepancies in reference to the GS-EFNS (LDT modifications to the GS-EFNS counting rule) on biopsies obtained from healthy subjects. Diagnostic yield and diagnostic performance (analytical sensitivity and specificity) were explored by plotting the plausibility of false positives (specificity) and true positives (sensitivity). The closer the ROC curve approaches the true positive axis, the better the performance of the LDT (see "Statistics" section). The examination of the inter-slide stability of the GS-EFNS and LDT was included since this could highlight the need for a minimum number of serial slides to be examined. For each subject and each staining method three serial slide measurements were performed by one observer (M1, M2, and M3). In order to estimate the reliability of results obtained by independent observers, individual counts for the different parameters were compared after automated staining of 12 randomly selected samples.

Statistics
Method comparison of the LDT and GS-EFNS was performed using Bland-Altman analysis [25]. To prove a good agreement between the two techniques the values should be lumped near the 0-difference line. Statistical analysis was performed using a rank sum test for paired samples (Wilcoxon) for which p values below 0.05 were considered significant. Statistical analysis for assessing the diagnostic yield of the LDT compared to conventional diagnostic tools was performed as described before [19]. To explore if data obtained using the EFNS advised method can serve as the gold standard to define the diagnostic performance of the LDT on the selected biopsies, a one-way analysis of variance was performed (ANOVA). This allowed confirming that mean values were significantly different between control and SFN groups. The diagnostic performance of the LDT was estimated by the area under the receiver operating characteristic curve (ROC) with 95 % confidence interval for sensitivity and specificity using De Long's test. Interobserver agreement was evaluated by determining intraclass correlation (ICC) for all parameters using the same raters for all measurements and consistency as type. All analyses were performed using MedCalc ® v12.3.0.0 statistical software. Finally, a power calculation was carried out to determine the statistical power of this study and the optimal sample size for a future study using software package R, version 3.1.2 [26]. We performed this analysis using data for the total linear density of epidermal nerve fibers.

Accuracy of the anti-human PGP9.5 antibody
The accuracy of the rabbit polyclonal anti-human PGP9.5 antibody was confirmed by western blot analysis on cell lysates from U87 and A549 cell lines. For both cell lines the antibody showed a band at a molecular weight of approximately 25 kDa (Fig. 2a), confirming the recognition of PGP9.5 (27 kDa). In addition, βIII-tubulin, a nerve and langerhans cell-specific marker, co-localized in all PGP9.5 immunoreactive structures of the epidermis (Fig. 2b), confirming the accuracy of the antibody.

Assessment of IENF and nerve fiber branching density in skin biopsies of healthy volunteers**
A significant difference existed in the ability of the LDT assessing IENF compared to the gold standard (p < 0.001, mean difference 5.8 IENF/mm), especially with regard to IENF_Br (p < 0.001, mean difference 5.7 IENF_Br/mm) ( Table 1). For IENF_Si equal results were obtained for the GS-EFNS and LDT (p = 0.95, mean difference 0.1 IENF_ Si/mm). Enumeration of nerve fiber fragments lying free in the epidermis showed a significantly higher number of IENF_F for the LDT compared to GS-EFNS (p = 0.01, mean difference −2.8 IENF_F/mm). The IENF_F were mainly present as single nerve fiber fragments (IENF_FSi; p = 0.002, mean difference −2.5 IENF_FSi/mm). Overall, for the total population of epidermal nerve fibers (IENF and IENF_F) a good agreement was reached between the GS-EFNS and LDT (p = 0.24, mean difference 3.0 Total ENF/mm) ( Fig. 3; Table 1). Based on the mean difference and SD of both measurements, we determined that the difference between the groups equals 0.68 SD units. A power calculation showed that, to detect a difference in means of 0.68 SD units, with a power of 80 %, at alpha level of 0.05, sample size should be at least 35 in both groups.

Diagnostic performance of the LDT for SFN
The LDT was able to distinguish between healthy and SFN groups (p < 0.001) when assessing IENF linear density (Table 2). Importantly, the LDT reached good concordance with the gold standard as determined by Bland-Altman after assessing IENF (p > 0.42, mean difference 0.23 IENF/mm); IENF_F (p > 0.25, mean difference −0.29 IENF/mm) and thus total ENF (p > 0.42, mean difference −0.06 ENF/mm) in skin biopsies of SFN subjects. This clearly indicates that the LDT and GS-EFNS are equivalent in detecting SFN (Fig. 4). A lower number of nerve fiber fragments lying free in the epidermis (IENF_F) was observed when the gold standard was applied ( Table 2).
When secondary branching of nerve fibers was taken into account, a good agreement was found for all parameters evaluated, showing mean differences less than 0.5/ mm. Nevertheless, lower scoring for IENF without secondary branching (mean difference of −0.16 IEFN_Si/ mm) (p = 0.42) at one hand and a higher assessment of IENF showing secondary branching (mean difference of 0.39 IENF_Br/mm) (p = 0.09) was seen using the gold standard, as was the case in the biopsies from the healthy volunteers (Table 2).

Diagnostic yield of the LDT for SFN
ANOVA analysis confirmed that GS-EFNS can serve as gold standard for the diagnosis of SFN when performed in our lab. Data were grouped using the knowledge of the disease states of the different subjects. A significant difference was observed for IENF between healthy and SFN groups (p < 0.001, F-ratio 87.5) (Fig. 5a). Therefore, GS-EFNS results could be used to determine the expected diagnosis to be obtained using the LDT (ROC, Fig. 5b).
As IENF density, determined using the LDT, is significantly lower compared to the GS-EFNS, the total population of epidermal nerve fibers counted (Total ENF/ mm) was used as cut-off. The average linear density in the tested control group was 15.45 ± 4.43 Total ENF/mm (Table 1), therefore the lower cut-off used was 11.02 Total ENF/mm. Using this cut-off to classify data obtained with the LDT, a sensitivity of 80 % and specificity of 64 % was reached with an area under the ROC curve of 0.72 and p = 0.031.

Inter-slide stability and robustness
For all parameters and both staining methods, mean differences between repeated measures (M1, 2, 3) observed were very small, not exceeding 1 nerve fiber/mm as  (Table 3). Both techniques showed a success rate exceeding 95 % with no failure and thus no exclusion of samples when the LDT is applied.

Inter-observer agreement
Overall, the inter-reader agreement can be considered excellent with ICC values ranging from 0.93 to 0.99 (Table 4) for all linear density values enumerated on LDT stained samples.

Discussion
As proven by multiple investigators, skin biopsies are excellent tools to investigate the nerve fiber endings in the epidermis [1][2][3][4][5][6][7]. Since the early nineties, the neuronal biomarker PGP9.5 has been regarded as the most accurate for the visualization of epidermal nerves [22,   27]. As the use of this technique grew for the diagnosis of SFN, the need for guidelines and standardization grew accordingly. In 2005, the members of the EFNS indicated the advised procedure for the assessment of epidermal nerve fiber density [1]. The gold standard was described in detail [23], and implemented successfully in specialized laboratories over the world. Other investigators used the IENF density as a reference in experimental disease progression studies [10,11], and early detection of SFN in diabetics [12,13], and compared its accuracy to that of conventional diagnostic tools. Nevertheless, this method is a labor intense, manual staining procedure, requiring high methodological skills and training, and is therefore prone to human error when applied in conventional diagnostic laboratories. The use of uniform outcome measures in peripheral neuropathies has been commended recently, to enable comparison between studies [28].
In this study, we explored whether a gold standardrelated LDT could be developed, with decreased labor and increase of standardization as primary goals, by automating the staining method. Automated slide staining use implied a significant decrease in section thickness of the skin biopsies. The choice of using 16 µm-thick sections was driven by the work of Torres and colleagues [29] and Hedreen [30], both describing that thicker sections mounted on glass slides can present suboptimal penetration of immuno-reagents into the tissue, leading to a lost-cap phenomenon. The use of thinner sections for subsequent IENF assessment has been proven valuable by different groups, not only for SFN detection, but also the study of multiple nerve markers including ion channels and receptors in the same biopsies and the progression and/or follow up of disease [6,22,[31][32][33][34][35][36].
The design of the LDT and its subsequent validation was based on requirements of CAP, CLIA and a local accreditation board (BELAC) to be able to introduce the LDT in large scale batch testing required for clinical trial testing. We explored whether the LDT could meet the needs for both an accredited laboratory and the pharmaceutical industry, while maintaining a good concordance with the gold standard. In this study, fluorescence detection was preferred to allow future advanced imaging as described recently [37,38]. Published data showing good concordance between fluorescent and bright-field assessment of IENF in the diagnosis of SFN justifies this choice [39]. Implementation of fluorescence in IENF assessment additionally helps with the ease of studying multiple targets in a single tissue section.
Once good accuracy was confirmed, the automated user-defined assay performed on the Discovery XT ® was challenged against the gold standard using skin biopsies of healthy volunteers. We first demonstrated that performance of the assay in our accredited laboratory according the EFNS advised gold standard method meets all criteria. A good concordance was reached with published normative values (9.8-12.4 IENF/mm for female subjects in the age group of 33-52) as described by Lauria and co-workers [23] when assessing IENF in skin biopsies of predominantly female subjects included in this study. The gold standard could distinguish between healthy and SFN subjects with high significance and served as a reference.
When exploring the IENF density as assessed by the LDT, results clearly indicated that the LDT's ability to visualize IENF in skin of healthy subjects is considerably lower compared to the gold standard, whereas a good concordance was reached when both IENF and IENF fragments were considered (total ENF). A higher number of IENF and visualization of their branches using the gold standard is in line with the increased occurrence of nerve fiber fragments observed in the thinner sections of the LDT. Considering the threefold decrease in section thickness, one would expect to have a threefold decrease in Table 3 Assessment of the inter-slide stability for scoring PGP9. 5

using Bland-Altman analysis for the comparison of GS-EFNS and LDT
Diff difference, ENF epidermal nerve fiber, GS-EFNS gold standard method according to EFNS, IENF intra-epidermal nerve fiber, IENF_Si intra-epidermal nerve fiber crossing dermal-epidermal junction as single fiber, IENF_Br intraepidermal nerve fiber crossing dermal-epidermal junction as fiber showing branching, IENF_F nerve fiber fragment lying free in epidermis, IENF_FSi nerve fiber fragment lying free in epidermis as single fragment, IENF_FBr nerve fiber fragment lying free in epidermis showing branching, LDT laboratory developed test  linear densities of the evaluated parameters. As we found an approximate twofold decrease for healthy subjects and 1.25-fold decrease in SFN subjects compared to the GS, one must consider that nerve fibers might be included more than once in the counting strategy when the LDT is applied. Nevertheless, the diagnostic performance of the LDT was equal to that of the gold standard in discriminating healthy from SFN subjects with high significance when IENF was assessed.

GS-EFNS
Additionally, an excellent agreement between both staining methods was found for all parameters quantified in biopsies from subjects with (experimental) SFN. Both the number of IENF, IENF fragments and their secondary branching could be equally visualized. In addition, previously published work documented concordance of the LDT's outcome with TRPV1 staining performed in a different laboratory. Results describing a functional recovery, prior to morphological recovery in a time course study after capsaicin was applied, were consistent with findings of other groups [18].
When the diagnostic yield of the LDT was measured, we concluded that IENF and IENF fragments are both mandatory to be included in enumeration when the LDT is applied. IENF assessment alone lacked concordance with the gold standard method. When the total number of nerve fibers (IENF and IENF fragments) was considered, the LDT reached relatively good concordance with the gold standard, with strong sensitivity and specificity. Finally, the LDT showed an excellent robustness and an excellent inter-observer concordance.

Conclusions
To conclude, in this proof of principle study we evaluated a standardized, on-slide, automated staining procedure for the assessment of the linear density of epidermal nerve fibers. This method demonstrated an equal performance to the gold standard in distinguishing SFN from healthy subjects. By automation and on-slide application of staining thinner sections, labor, variation of results induced by human manipulation and differential penetration of immunoreagents are drastically reduced. The decreased sensitivity, however, for detecting the complex three-dimensional branching network of epidermal nerve fibers is important to consider, as depletion of more distal axons is likely to be an early feature of dying back neuropathies. In this perspective, the appearance of collateral sprouting, typical for regenerating nerve fibers [44], and the LDT's capability of discriminating that from normal epidermal nerve fibers, needs further investigation. The same applies for the risk of missing the SFN status in older patients. Effectiveness in detection of important cutaneous nerve abnormalities, such as axonal swellings, crawlers and sprouts, by the LDT need to be examined as well [45]. Formation of a larger cohort, minimally 35 per group as determined by power calculation and inclusion of a solid age and gender distribution is required to determine reference values and to determine whether the dynamic range of the LDT is acceptable. Power calculation showed that the current study design only offers 29 % power to detect a difference in means of the calculated SD units. Since epidermal nerve fiber fragments need to be included in the now more laborious counting strategy, compatibility of this technique with the semi-automated analysis recently presented [38] should be investigated as well. This technique could lead to high reproducibility of counting within and between neuropathological institutes. Introducing proficiency testing and dermal nerve quantification [46] could benefit in standardization.
Full automation of the staining procedure could be valuable and lead to accessible, stable and reliable testing in clinical trials and diagnostics for clear SFN detection. A number of limitations however exist for this LDT which require expert-evaluation in a larger cohort.

Authors' contributions
The study was designed and conducted by NVA. Samples were obtained from two studies originally designed and conducted by LP and MR in collaboration with TM and MT. NVA coordinated the experiments, performed by MDB, and digitized the images. Analysis was performed by NVA and SDS. Statistical analysis was supported by ES, reviewed and updated by EF. CD overviewed the quality assurance according to GLP guidelines. PA, PC and MK provided scientific support. All authors read and approved the final manuscript.

Authors' information
This proof of principle study was conducted as part of a thesis (NVA, PC). Most of the authors (NVA, ES, MK, MDB, CD, SDS) work in a laboratory specialized in developing next generation assays and make these procedures available to the scientific community after extensive validation. The scope of this manuscript was to assess the ability of automation on a small number of samples available through collaboration (MR, MT, EF, PA, TM, and LP). Now that we established that the automation is indeed possible, the validation in a large cohort is subject for a new study and a new manuscript. At this early stage we are of opinion that this work is of high value for the scientific community. 1 Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium. 2 HistoGeneX NV, Pr J Charlottelaan 10, Berchem, 2600 Antwerp,