| Abstract|| |
Introduction: The Pap stained cervical smear is a screening tool for cervical cancer. Commercial systems are used for automated screening of liquid based cervical smears. However, there is no image analysis software used for conventional cervical smears. The aim of this study was to develop and test the diagnostic accuracy of a software for analysis of conventional smears. Materials and Methods: The software was developed using Python programming language and open source libraries. It was standardized with images from Bethesda Interobserver Reproducibility Project. One hundred and thirty images from smears which were reported Negative for Intraepithelial Lesion or Malignancy (NILM), and 45 images where some abnormality has been reported, were collected from the archives of the hospital. The software was then tested on the images. Results: The software was able to segregate images based on overall nuclear: cytoplasmic ratio, coefficient of variation (CV) in nuclear size, nuclear membrane irregularity, and clustering. 68.88% of abnormal images were flagged by the software, as well as 19.23% of NILM images. The major difficulties faced were segmentation of overlapping cell clusters and separation of neutrophils. Conclusion: The software shows potential as a screening tool for conventional cervical smears; however, further refinement in technique is required.
Keywords: Automated screening, cervical cytology, image analysis, pap smear
|How to cite this article:|
Sanyal P, Ganguli P, Barui S, Deb P. Pilot study of an open-source image analysis software for automated screening of conventional cervical smears. J Cytol 2018;35:71-4
|How to cite this URL:|
Sanyal P, Ganguli P, Barui S, Deb P. Pilot study of an open-source image analysis software for automated screening of conventional cervical smears. J Cytol [serial online] 2018 [cited 2018 Aug 19];35:71-4. Available from: http://www.jcytol.org/text.asp?2018/35/2/71/228211
| Introduction|| |
Cervical cancer is the second most common cancer in Indian women. The Papanicolaou stained cervical smear is the investigation of choice for screening of cervical cancer. As a screening tool, often the sheer number of smears is taxing for the pathologist to examine. Thus, there has been interest in automated screening of cervical smears. Presently, the Focal Point Slide Profiler (FSPS)® (BD Tripath Imaging) and the ThinPrep Imaging System ® have been approved by FDA as primary screening tools for cervical cytology smears. However, both of these systems are closed source and use liquid based cytology systems prepared by a proprietary method. None of these devices operate on conventional cervical smears.
Analysis of conventional cervical smears has proved to be a difficult problem because of overlapping cell clusters. A study by Saieg et al. on automated image analysis of conventional smears reported a sensitivity and specificity of 100% and 70.3% respectively, concluding that automated analysis could analyze the majority of conventional smears. Bacus et al. have patented an apparatus for automated image analysis of conventional cervical smears. From India, a group of computer scientists from Bengaluru have reported 100% sensitivity and 90% specificity on automated analysis of conventional smears. Effort has also been made by Dey et al. to implement Artificial Neural Networks in categorization of benign and malignant cells, with good results. In view of the large amount of laboratories still working with conventional smears, there is a dearth of literature regarding image analysis of conventional smears. In addition, majority of these studies have been carried out using closed source proprietary software.
The aim of this pilot project was to develop an Open Source Image Analysis software for conventional cervical cytology smears, and to test whether it can distinguish between smears which fall in the broad category of “Negative for Intraepithelial Lesion or Malignancy” (“NILM” group) and those which have been reported to have some degree of abnormality (“Abnormal” group).
| Materials and Methods|| |
Initial development of software framework
Images were selected from a standard database (Bethesda Interobserver Reproducibility Project) for purpose of development and standardization of the software. The software was developed using Python programming language, the Numpy mathematical and statistical library, OpenCV, and Scikit-Image image analysis library. Image analysis techniques like thresholding, Gaussian blur, and difference of Hessian were used for segmentation of cell clusters and identification of nuclei.
The initial step is to segment the image in several connected regions which share at least one common pixel. The software then separates cellular area and nuclear area through successive filters [Figure 1]. The first parameter that is calculated is the N:C ratio of each region, from the overall cellular and nuclear area. It then applies a second level of filter like eccentricity of nuclear areas and minimum nuclear size, to arrive at number of definitive nuclei. The software then calculates statistical parameters like coefficient of variation (CV) of nuclear areas. In addition, in each nucleus identified, convexity defects in the nucleus are also calculated. A convexity defect is defined as any irregularity in nuclear membrane which is at least 5000 pixels deep. In this study, convexity defect was used as a measure of nuclear membrane irregularity.
|Figure 1: Steps of image analysis. (a) Original image (b) Segmented by regions (c) Single region (d) Cellular area (e) Nuclei by thresholding (f) Actual nuclei with clusters|
Click here to view
A “cluster” is defined as a set of any two nuclei which are less than or equal to 10 pixels apart. The values were standardized for images at 40× magnification. Identification of a cluster serves both as a marker of cellular crowding as well as high N:C ratio of the cells. This is because only when the two cells have large enough nuclei compared to their cytoplasm, their nuclei may come close to form a cluster.
The software then flags the image according to set criterion. The criterion which were found to be most specific in separating normal from abnormal images is as follows [Figure 2]:
- N:C ratio in any region >0.5
- N:C ratio in entire image >0.5 AND more than 6 significant (i.e., 5000 pixels deep) convexity defects in the image
- N:C ratio in entire image >0.5 AND overall CV in image >1
- N:C ratio in entire image >0.5 AND number of nuclear clusters in image >6 AND overall CV in image >1.
The parameters for image segmentation was set to classify microscopic foci from a set of cervical smear images with known diagnosis, collected from archives of this hospital. Random foci containing minimum cellularity of 10 cells in one high power field were photographed from each smear. The microscope used for this purpose was Dewinter Ultima FL with attached microphotography system. All of the images were photographed at 40× magnification, after correction of white balance, at the same condition of illumination. Initial resolution of the raw images was 2560 × 1920 pixels. After digital archiving, the dimension of the images was resized to 640 × 480 pixels in size carefully preserving the aspect ratio. A set of 130 images belonging to the “NILM” group, and 45 in the “abnormal” group were chosen for analysis.
Using standardization data from the previous run on images from Bethesda database, the parameters for analysis such as minimum nuclear size and cluster diameter were set. The software was then run on the set of images collected from the hospital archives.
| Results|| |
Of the 130 normal images that were analyzed in NILM category, the software produced a flag in 25 images (19.23%). Seventeen of these images had a significantly high N:C ratio (N:C > 0.5) in one of the regions of the image; the rest showed a high degree of nuclear size variability (CV > 1) and an overall high N:C ratio. Clusters were seen in 10 images, but significant convexity defects were not detected in any of the images [Table 1].
Of the 45 abnormal images scanned, 14 (31.11%) was not flagged by the software [Table 1]. Rest of the images had any one or more of a high nuclear CV, significantly high N:C ratio, nuclear convexity defects or presence of clusters (>6). Significant convexity defects were found in 11 images.
A clear degree of separation was obtained using the overall N:C ratio of a smear, which was >0.5 in majority of the abnormal images (73.33%), and <0.5 in majority of NILM images (93.07%). It was this found to be an important parameter in segregating normal foci from abnormal ones. However, the count of nuclei given by the software did not often correlate with the manual nuclei count, possibly because of interference with debris and neutrophils in the smear. Significant convexity defects were reported in no NILM images and in 11 abnormal images.
The sensitivity of the software in flagging abnormal foci was found to be 68.88%, and the specificity was found to be 80.76%. 19.23% false positives were also flagged by the software.
| Discussion|| |
Automated screening of liquid based cervical smears has been developing since the early 2000s. Good accuracy and reproducibility have been achieved in automated screening of liquid based cytology. A study from Shenzhen University China demonstrated on 21 cervical cell images their segmentation method achieved a 93% accuracy for cytoplasm, and a 87.3% F-measure for nuclei.
Analysis of single cells and classification has been achieved by intensive image analysis modalities. In India, a study in 2016 used an ensemble classifier for Pap smear More Details image, resulted an accuracy of 98.11% and precision of 98.38% in whole smears and 99.01% in single cells. Using the Herlev database of single cell images, it achieved an accuracy of 96.51%. However, there have been concerns about false negatives and increasing cost of investigation, as most of the screening systems are closed source. An FDA approved system, AutoPap 300 is used in rescreening of normal Pap smears for quality control. When compared with biopsy results, the AutoPap 300 detected 77% of HSIL and 86% of cancer. An important limitation of AutoPap 300 for primary screening is its high false-positive rate, about 40% when reviewing normal slides.
All the aforementioned studies have been carried out on liquid based cytology smears. There are only few studies which have been conducted on conventional smears. Notably, the study by Saieg et al. using the proprietary FocalPoint GS Imaging System (Beckton Dickinson). However, in another study, the rate of flagging in this system was found to be high: only 16–17% slides are archived without review by a pathologist.
Thus, there is a felt need of development of robust, accurate and open source screening method for conventional smears. The major problems with conventional smears is segmentation of overlapping cells, and separation of cells from neutrophils and background debris. A variety of segmentation methods for cervical cells have been proposed earlier—K means, edge detection, and thresholding.,, The present software uses standard techniques like thresholding and Gaussian blur, as well as two additional parameter. The first one, convexity defects in nuclei to identify nuclear membrane irregularities, is a hallmark of high grade lesions. There is no existing data on use of convexity defects as a screening technique. In this study, significant convexity defects were detected in 24.44% of abnormal images and none in NILM images. The validity of this parameter thus needs to be tested with a larger sample size.
The other technique, detection of clusters was also not effective in separating overlapping nuclei, separating epithelial cells from neutrophils, and overall did not correlate well with actual number of clusters in the image counted manually. There is scope for refinement of this technique to overcome these difficulties.
Instead of individual cell segmentation, the present software assesses a focus based on overall NC ratio, which has shown good correlation with the final diagnosis. The CV of nuclear size correlates well with the diagnosis, because only 6.1% of NILM images showed a raised CV and high NC ratio.
While designing the decision tree for the software [Figure 2], a balance between sensitivity and specificity was kept in mind. An image is flagged primarily based on its N:C ratio plus other parameters, but only a raised N:C ratio in the overall image is not sufficient to generate a flag. Only in combination of other parameters is a flag generated and the focus sent for review. With a different decision tree, the software can be made either more sensitive or more specific, depending on specific requirements.
The present study showed that the software is able to distinguish between benign and malignant foci with approximately 68% sensitivity with a combination of parameters, although the number of false positives is significant (specificity 80%). The diagnostic accuracy is much less than closed source and commercial screening software. However, the software has shown capability to operate over conventional cytological images. The major difficulties faced during the analysis of images were the overlapping nuclear clusters, segmentation of individual cells, and separation from neutrophils and debris.
| Conclusion|| |
The pilot study shows promise for further development of the software for screening of conventional cervical cytology images. However, further refinement of algorithm to reduce the number of false positives, as well as better cell and nuclear segmentation, is required. A study with a larger sample size will be required to evaluate the software.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Swaminathan S, Katoch VM, et a
l. Consolidated Report of Hospital Based Cancer Registries 2012-2014. Bengaluru: Indian Council of Medical Research; 2016 Mar.
Saieg MA, Motta TH, Fodra ME, Scapulatempo C, Longatto-Filho A, Stiepcich MM. Automated screening of conventional gynecological cytology smears: Feasible and reliable. Acta Cytol 2014;58:378-82.
Bacus, James W. Dual resolution method and apparatus for use in automated classification of pap smear and other samples. US patent 4,175,860 A. November 27, 1979.
Sreedevi MT, Usha BS, Sandya S. Papsmear Image based Detection of Cervical Cancer. Int J Comput Appl 2012;45.
Dey P, Banerjee N, Kaur R. Digital image classification with the help of artificial neural network by simple histogram. J Cytol 2016;33:63-5.
] [Full text]
Nayar R, Solomon D. Second edition of 'The Bethesda System for reporting cervical cytology'- atlas, website, and Bethesda interobserver reproducibility project. Cytojournal 2004;1:4.
] [Full text]
Zhang L, Kong H, Ting Chin C, Liu S, Fan X, Wang T, et al
. Automation-assisted cervical cancer screening in manual liquid-based cytology with hematoxylin and eosin staining. Cytometry 2014;85:214-30.
Mariarputham EJ, Stephen A. Nominated Texture Based Cervical Cancer Classification. Computational and Mathematical Methods in Medicine 2015, Article ID 586928
Bora K, Chowdhury M, Mahanta L, Kundu MK, Das AK. Automated classification of Pap smear images to detect cervical dysplasia. Comput Methods Programs Biomed 2017;138:31-47.
Birdsong GG. Automated screening of cervical cytology specimens. Hum Pathol 1996;27:468-81.
Wilbur DC, Bonfiglio TA, Rutkowski MA, Atkison KM, Richart RM, Lee JS, et al
. Sensitivity of the AutoPap 300 ® system for cervical cytological abnormalities. Biopsy data confirmation. Acta Cytol 1996;40:127-32
Wilbur DC, Prey MU, Miller WM, Pawlick GF, Colgan TJ. The AutoPap system for primary screening in cervical cytology. Comparing the results of the prospective intended-use study with routing manual practice. Acta Cytol 1998;42:214-22.
Tsai MH, Chan YK, Lin ZZ, Yang-Mao SF, Huang PC. Nucleus and cytoplast contour detector of cervical smear image. Pattern Recognit Lett 2008;29:1441-53.
Yang-Mao SF, Chan YK, Chu YP. Edge enhancement nucleus and cytoplast contour detector of cervical smear images. IEEE Trans Syst Man Cybern B Cybern 2008;38:353-66.
Bergmeir C, García Silvente M, Benítez JM. Segmentation of cervical cell nuclei in high-resolution microscopic images: A new algorithm and a web-based software framework. Comput Methods Programs Biomed 2012;107:497-512.
Dr. Prosenjit Ganguli
Department of Pathology, Command Hospital (EC), Alipore, Kolkata, West Bengal
Source of Support: None, Conflict of Interest: None
[Figure 1], [Figure 2]