Username   Password       Forgot your password?  Forgot your username? 


Cervical Cancer Diagnosis based on Random Forest

Volume 13, Number 4, July 2017 - Paper 12 - pp. 446-457
DOI: 10.23940/ijpe.17.04.p12.446457

Guanglu Suna,b, Shaobo Lia, Yanzhen Caoa , Fei Langb

aSchool of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
bResearch Center of Information Security & Intelligent Technology, Harbin University of Science and Technology, Harbin, 150080, China

(Submitted on January 29, 2017; Revised on April 12, 2017; Accepted on June 23, 2017)


Cervical cancer, with an annually increasing incidence rate, is becoming the leading cause of death among women in China. However, studies have shown that the early detection and accurate diagnosis of cervical cancer contribute to the long survival of cervical cancer patients. The machine learning method is a good substitute for manual diagnosis in the analysis of Pap smear cervical cell images, reflecting its effective and accurate classification. In the present study, a framework for cervical cancer diagnosis is presented based on a random forest (RF) classifier with ReliefF feature selection. Using preprocessing, segmentation, and feature extraction, 20 features were extracted. In the feature selection phase, 20 features were ranked according to weight using ReliefF. In the classification phase, the RF method was used as a classifier, and different dimensions of features were selected to train the classifier. To examine the efficacy of the proposed method, the Herlev data set collected at Herlev University Hospital was used, in which 917 Pap smear images were categorized into two classes: normal and abnormal. After a 10-fold cross validation, the experimental results showed that the best classification performance was obtained with the top 13 features based on the RF classifier, which were better than Naive Bayes, C4.5, and Logistic Regression. The accuracy was 94.44%, and the AUC value was 0.9804. The results also confirmed the effectiveness of cytoplasm features in the classification.


References: 47

    1.    J. Albert, E. Aliu, H. Anderhub, P. Antoranz, A. Armada, M. Asensio, and J. Becker, “Implementation of the random forest method for the imaging atmospheric Cherenkov telescope MAGIC,” Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 588, no. 3, pp. 424-432, 2008
    2.    A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. 1145-1159, 1997
    3.    C. Bergmeir, M. G. Silvente, and J. M. Benítez, “Segmentation of cervical cell nuclei in high-resolution microscopic images: A new algorithm and a web-based software framework,” Computer Methods & Programs in Biomedicine, vol. 107, no. 3, pp.497–512, 2012
    4.    L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001
    5.    M. P. Coleman, J. Esteve, P. Damiecki, A. Arslan, and H. Renard, “Trends in cancer incidence and mortality,” IARC scientific publications, 1992.
    6.    P. S. Chandran, N. B. Byju, R. U. Deepak, R. R. Kumar, S. Sudhamony, P. Malm, and E. Bengtsson, “Cluster detection in cytology images using the cellgraph method,” In Information Technology in Medicine and Education (ITME), 2012 International Symposium on, vol. 2, pp. 923-927, August, 2012
    7.    Y. F. Chen, P. C. Huang, K. C. Lin, H. H. Lin, L. E. Wang, C. C. Cheng, and J. Y. Chiang, “Semi-automatic segmentation and classification of pap smear cells,” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 1, pp. 94-108, 2014
    8.    L. Denny, M. Quinn, and R. Sankaranarayanan, “Screening for cervical cancer in developing countries,” Vaccine, 2006
    9.    R. Díaz-Uriarte and S. A. De Andres, “Gene selection and classification of microarray data using random forest,” BMC bioinformatics, vol. 7, no. 1, pp. 1, 2006
    10.    R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern classification,” New York: Wiley, vol. 2, 1973
    11.    A. Gençtav, S. Aksoy, and S. Önder, “Unsupervised segmentation and classification of cervical cell images,” Pattern Recognition, vol. 45, no. 12, pp. 4151-4168, 2012
    12.    R. T. Greenlee, T. Murray, S. Bolden, and P. A. Wingo, “Cancer statistics, 2000,” CA: a cancer journal for clinicians, vol. 50, no. 1, pp. 7-33, 2000
    13.    D. W. Hosmer Jr and S. Lemeshow, “Applied logistic regression,” John Wiley & Sons, 2004
    14.    G. Holmes, A. Donkin, and I. H. Witten, Holmes, “Weka: A machine learning workbench.” in Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on ,pp. 357-361, December, 1994
    15.    N. M. Harandi, S. Sadri, N. A. Moghaddam, and R. Amirfattahi, “An automated method for segmentation of epithelial cervical cells in images of ThinPrep,” Journal of medical systems, vol. 34, no. 6, pp. 1043-1058, 2010
    16.    R. Hummel, “Image enhancement by histogram transformation,” Computer graphics and image processing, vol. 6, no. 2, pp. 184-195, 1977
    17.    T. K. Ho, “The random subspace method for constructing decision forests,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, no. 8, pp. 832-844, 1998
    18.    T. hankong, N. Theera-Umpon, and S. Auephanwiriyakul, “Automatic cervical cell segmentation and classification in Pap smears,” Computer methods and programs in biomedicine, vol. 113, no. 2, pp. 539-556, 2014
    19.    A. Jemal, M. M. Center, C. DeSantis, and E. M. Ward, “Global patterns of cancer incidence and mortality rates and trends,” Cancer Epidemiology Biomarkers & Prevention, vol. 19, no. 8, pp. 1893-1907, 2010
    20.    D. Kong, C. Ding, H. Huang, and H. Zhao, “Multi-label relieff and f-statistic feature selections for image annotation,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2352-2359, IEEE, June, 2012
    21.    K. K. Kandaswamy, G. Pugalenthi, M. K. Hazrati, K. U. Kalies and T.  Martinetz, “BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection”, BMC bioinformatics, vol. 12, no. 1, pp.  345, 2011
    22.    M. Khalilia, S. Chakraborty, and M. Popescu, “Predicting disease risks from highly imbalanced data using random forest,” BMC medical informatics and decision making, vol. 11, no. 1, pp. 1, 2011
    23.    R. R. Kumar, V. A. Kumar, and P. N. Sharath Kumar, “Detection and removal of artifacts in cervical cytology images using support vector machine,” IT in Medicine and Education (ITME), 2011 International Symposium on, vol. 1, pp.  717-721, 2011
    24.    S. Kumar, L. Jena, K. Mohod, S. Daf, and A. K. Varma, “Virtual screening for potential inhibitors of high-risk human papillomavirus 16 E6 protein,” Interdisciplinary Sciences: Computational Life Sciences, vol. 7, no. 2, pp. 136-142, 2015
    25.    K. Li, Z. Lu, W. Liu, and J. Yin, “Cytoplasm and nucleus segmentation in cervical smear images using Radiating GVF Snake,” Pattern Recognition, vol. 45, no. 4, pp. 1255-1264, 2012
    26.    W. Z. Lin, J. A. Fang, X. Xiao, and K. C. Chou, “iDNA-Prot: identification of DNA binding proteins using random forest with grey model,” PloS one, vol. 6, no. 9, pp. e24756, 2011
    27.    A. Mohan, M. D. Rao, S. Sunderrajan, G. Pennathur, “Automatic classification of protein structures using physicochemical parameters,” Interdisciplinary Sciences: Computational Life Sciences, vol. 6, no. 3, pp. 176-186, 2014
    28.    A. H. Mbaga, and P. Zhijun, “Pap Smear Images Classification for Early Detection of Cervical Cancer,” International Journal of Computer Applications, vol.118, no. 7, 2015
    29.    J. H. Moore and B. C. White, “Tuning ReliefF for genome-wide genetic analysis.” in European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, pp. 166-175, April, 2007
    30.    L. Martin and M. Exbrayat, “Pap-smear classification” Technical University of Denmark-DTU, 2003
    31.    P. Malm, B. N. Balakrishnan, V. K. Sujathan, R. Kumar, and E. Bengtsson, “Debris removal in Pap-smear images,” Computer methods and programs in biomedicine, vol. 111, no. 1, pp. 128-138, 2013
    32.    Y. Marinakis, G. Dounias, and J. Jantzen, “Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification,” Computers in Biology and Medicine, vol. 39, no. 1, pp.69-78, 2009
    33.    J. Norup, “Classification of Pap-smear data by tranduction neuro-fuzzy methods” Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2005
    34.    M. Peker, A Arslan, B. Sen, F. V. Celebi, and A. But, “A novel hybrid method for determining the depth of anesthesia level: Combining ReliefF feature selection and random forest algorithm (ReliefF+ RF).” in Innovations in Intelligent SysTems and Applications (INISTA), 2015 International Symposium on, pp. 1-8, September, 2015
    35.    M. E. Plissiti and C. Nikou, “Cervical cell classification based exclusively on nucleus features,” Image Analysis and Recognition. Springer Berlin Heidelberg, pp. 483-490 ,2012
    36.    M. E. Plissiti, C. Nikou and, A. Charchanti, “Watershed-based segmentation of cell nuclei boundaries in Pap smear images,” Information Technology and Applications in Biomedicine (ITAB), 2010 10th IEEE International Conference on. pp. 1-4, 2010
    37.    M. E. Plissiti, C. Nikou, and A. Charchanti, “Automated Detection of Cell Nuclei in Pap Smear Images Using Morphological Reconstruction and Clustering,” IEEE Transactions on Information Technology in Biomedicine A Publication of the IEEE Engineering in Medicine & Biology Society, vol. 15, no .2, pp. 233-241, 2011
    38.    J. R. Quinlan, “C4.5: programs for machine learning,” Elsevier, 2014
    39.    M. Robnik-Šikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,A” Machine learning, vol.53, no. 1-2, pp. 23-69, 2003
    40.    P. Sobrevilla, E. Montseny, F. Vaschetto, and E. Lerma, “Fuzzy-based analysis of microscopic color cervical pap smear images: nuclei detection,” International Journal of Computational Intelligence and Applications, vol. 9, no. 03, pp. 187-206, 2010
    41.    S. Saha, M. Pal, A. Konar, and D. Bhattacharya, “Automatic Gesture Recognition for Health Care Using ReliefF and Fuzzy kNN.” In Information Systems Design and Intelligent Applications, pp. 709-717, 2015
    42.    S. N. Sulaiman, N. Ashidi, M. Isa, and N. H. Othman, “Semi-automated pseudo colour features extraction technique for cervical cancer's pap smear images,” International Journal of Knowledge-based and Intelligent Engineering Systems, vol. 15, no. 3, pp. 131-143, 2011
    43.    V.Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, “Random forest: a classification and regression tool for compound classification and QSAR modeling,” Journal of chemical information and computer sciences, vol. 43, no. 6, pp. 1947-1958, 2003
    44.    V. M. Valdespino and V. E. Valdespino, “Cervical cancer screening: state of the art,” Current Opinion in Obstetrics and Gynecology, vol. 18, no. 1, pp. 35-40, 2006
    45.    J. Wu, H. Liu, X. Duan, Y. Ding, H. Wu, Y. Bai, and X. Sun, “Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature,” Bioinformatics, vol. 25, no. 1, pp. 30-35, 2009
    46.    K. Q. Ye, “Indicator function and its application in two-level factorial designs,” Annals of Statistics, pp. 984-994, 2003
    47.    J Yue, Z Li, L Liu, and Z. Fu, “Content-based image retrieval using color and texture fused features,” Mathematical and Computer Modelling, vol. 54, no. 3, pp. 1121-1127, 2011


      Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

      This site uses encryption for transmitting your passwords.