Int J Performability Eng ›› 2020, Vol. 16 ›› Issue (8): 1183-1192.doi: 10.23940/ijpe.20.08.p5.11831192

Previous Articles     Next Articles

A Hybrid Model of Predicting Breast Cancer Survivability based on Specific Stages

Huaiguang Wua, Pengjie Xiea, Ming Chengb, and Hongwei Taoa,*   

  1. aSchool of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450001, China;
    bThe First Affiliated Hospital, Zhengzhou University, Zhengzhou, 450001, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: *E-mail address: tthhww_811@163.com
  • About author:Huaiguang Wu received his Ph.D. in computing from Wuhan University in 2011. He currently works in the School of Computer and Communication Engineering at Zhengzhou University of Light Industry. He was previously a postdoctoral fellow at Peking University and a research visitor at the University of Edinburgh from 2017 to 2018. His research interests include formal methods, software engineering, and algorithms .(Email: hgwu@zzuli.edu.cn) PengJie Xie is a postgraduate student currently pursuing research in the field of big data medical treatment at Zhengzhou University of Light Industry. Her research interests include artificial intelligence, big data analysis, and machine learning. (Email: pengjx_0526@163.com) Ming Cheng received his Ph.D. from Wuhan University in 2016. He is currently conducting postdoctoral research in the First Affiliated Hospital of Zhengzhou University. His research interests include biomedical natural language processing (NLP) and healthcare data mining. (Email: fccchengm@zzu.edu.cn) Hongwei Tao received his Ph.D. from East China Normal University in 2011. Currently, he is an associate professor at Zhengzhou University of Light Industry. His research interests include formal methods and big data analysis. (Email: tthhww_811@163.com)

Abstract: Breast cancer is the most common cancer that affects women in the world, and predicting breast cancer viability is a complex and challenging work. Most previous efforts focused on statistical or supervised approaches to predict the survival prospects of patients. However, both complex feature correlation and missing feature values may increase the difficulty of survivability prediction. In this paper, we propose a novel hybrid model to ameliorate patient survivability analysis. Firstly, we utilize principal component analysis (PCA) and K-means to create clusters with similar characteristics based on the different stages of breast cancer spread. Then, these clusters are exploited to train the classification model for patient survivability prediction. Experimental results show that, compared with the original historical data, the accuracy of survivability prediction for specific models is further improved by using identified patient cohorts.

Key words: breast cancer survivability prediction, PCA, K-means, hybrid model