The algorithm used to extract the BCC tumours consisted of four steps: pre-processing, segmentation, morphological operations, and feature extraction (see Figure 1). A copy of the algorithm used is available as a macro in Additional file 1. The macro also provides the specific parameters used in the algorithm. During pre-processing, colour deconvolution was used to separate the haematoxylin stain from each of the images. The resulting image was then segmented based upon pixel intensities. Subsequently, morphological operations were performed to connect the discontinuous regions that resulted from the segmentation process. Finally, area-based particle analysis was used to extract and quantify the ROIs from the image. This analysis allowed the performance of the algorithm to be evaluated.
Case selection/image acquisition
Cases were selected from a convenience sample of basal cell carcinomas reported by the senior author as part of his clinical sign-out practice. Digital images of 30 H&E stained BCC histology slides were obtained using a commercial Aperio CS-O slide scanner at 80 × magnification. Sections containing BCC were stored using the JPEG format (1072 × 902 pixels).
Software
The open source image processing and analysis program ImageJ was used in this study. First released in 1997 by software developer Wayne Rastban, ImageJ is an open source program based on the National Institutes of Health's NIH Image. Current features consist of numerous image processing and analysis operations, including image segmentation and extraction, noise reduction, image transformations, and particle analysis. These features are further expanded upon by an active user base. There are currently hundreds of downloadable user plugins and macros [24]. Additional benefits of this software include the support of numerous file formats, and platform independence [25]. As a result of being platform independent, ImageJ is capable of running on multiple operating systems, including MS Windows, Apple OS, and Linux. The algorithm described below was used in conjunction with version 1.44 of ImageJ. With the exception of the colour deconvolution plugin, all of the processes performed are available using the default ImageJ commands.
Digital image processing and analysis
Colour deconvolution
The colour deconvolution plugin by Gabriel Landini [26] was used to separate the BCC images into separate images containing the haematoxylin and eosin stain components using the built-in H&E vector. The plugin creates an additional image corresponding to the complement of the haematoxylin and eosin stains. Because the chromatin-rich basophilic (nuclear) regions were of interest, only the 8-bit Haematoxylin images were retained. The colour deconvolution process was followed by contrast enhancement in order to facilitate the segmentation process.
Segmentation
Thresholding was then used to segment the pixels darker than the threshold value. The ImageJ isodata algorithm [27] was used along with the automatic thresholding option. This algorithm This process resulted in a binary file containing only black and white pixels, where the black pixels corresponded to the regions above the threshold value.
Morphological operations
Due to the lack of intense haematoxylin staining in the non-basaloid cell regions, the binary images produced during the segmentation process frequently contained holes and disconnected regions in the tumour nests. As a result, morphological operations were performed on the segmented images. Hole filling was achieved using a combination of median filtering and binary closing operations. Initially a median filter was applied to the bright outliers using the ImageJ Remove Outliers command. This was followed by a binary closing operation, and median filtering of the dark outliers.
Feature extraction
As other baseloid and chromatin-rich features (e.g. single lymphocytes, hematoxylin stain precipitates, microcalcifications, etc.) could produce false positive results, we attempted to remove these features through a filtering step using the ImageJ particle analyzer feature. A minimum particle size of 750 pixels was used in order to exclude non-tumour nest particles. The extracted tumour was then obtained by removing all particles outside of the ROIs.
Analysis
The evaluation of a given algorithm is inherently subjective and biased towards the author's preferences, as standard methods for evaluating the algorithm do not exist [28]. For the purpose of this analysis a manual evaluation of tumor nests was used as the ground truth dataset.
To accomplish this, one of us (CN) manually evaluated printed photomicrographs of the 30 basal cell carcinoma images: 10 each of nodular, infiltrative and superficial subtypes. For each of these images, all tumour nests present were manually delineated with a black marker, scanned and analyzed with a manual approach. The main challenges in evaluating an extraction algorithm are determining the true dataset (ground truth), and the appropriate performance metrics [29, 30].
A further challenge is the lack of standardized image extraction algorithms, seeing that most existing algorithms are optimized for a specific task. This causes a further problem for evaluating the algorithm, and the colour deconvolution approach in particular. In order to assess the effect of using colour deconvolution, the same set of histology slides were analyzed using grayscale based thresholding in place of the colour deconvolution step. In the comparison algorithm, the image was first converted to an 8-bit grayscale image, and the colour deconvolution step was omitted. The remaining steps were carried out as described by the proposed algorithm.
The binary images of the algorithmically extracted tumour nests were subtracted from the binary images obtained by manual evaluation. The resulting image, containing the areas of the image not extracted by the algorithm, was considered to contain only false negative (FN) pixels. Similarly, the binary images of the manually extracted tumours were subtracted from the algorithmically extracted ones. The resulting image quantified the pixels considered to be false positives (FP). In addition, the number of true pixels (TP) was calculated by subtracting the total number of pixels identified by the algorithm from those deemed to be false positives. Finally, the number of true negative (TN) pixels was calculated by subtracting the total number of pixels in the image by the number of pixels identified by the algorithm, and by the number of false negatives.
Four different metrics were calculated to assess the performance of the algorithm. The sensitivity of the test evaluates the capability of the algorithm to identify pixels belonging to the tumour nests. The sensitivity was calculated as follows:
The specificity of the test evaluates the capability of the algorithm to correctly identify the pixels not belonging to the tumour nests. The specificity was calculated as follows:
The proportion of the histology slide occupied by the BCC may vary significantly between different slides. In general, superficial tumours occupy a smaller fraction of the slide compared to the nodular and infiltrative subtypes. For this reason the positive and negative predictive values were calculated. The positive predictive value (PPV) of the test indicates the probability that a positively identified pixel belongs to an actual tumour. As a result, images containing a lower tumour to non-tumour ratio result in lower PPVs. Conversely, the negative predictive value (NPV) is an indication of the probability of a negatively identified pixel actually belonging to non-tumour tissue. The PPV and NPV were calculated as follows: