Enhancing Subcellular Localization Prediction of Apoptosis Proteins by Ensemble SVMs with Random Under-Sampling

Volume 14, Number 7, July 2018, pp. 1635-1640
DOI: 10.23940/ijpe.18.07.p28.16351640

Xiao Wang, Xiaohe Li, Hui Li, Hongwei Tao, Rong Wang, and Yinghui Meng

School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China

(Submitted on April 11, 2018; Revised on May 21, 2018; Accepted on June 16, 2018)


The locations of apoptosis proteins in the cell determine their biological functions. So firstly, it is necessary to identify the subcellular locations of these proteins. In recent years, researchers have proposed a large number of prediction methods, specifically for apoptosis proteins. However, the vast majority of the methods have the following problems: (1) they utilize sequence-based methods rather than annotation-based methods for feature representation; (2) they ignore the negative impact of the imbalanced training dataset. In the work, a balanced predictor, GOIL-Apo, is proposed for dealing with the issues, which yields balanced solutions for predicting locations of apoptosis proteins. Firstly, by using gene ontology (GO) based methods, apoptosis proteins are represented as GO feature vectors. Subsequently, an ensemble classifier that fuses multiple SVMs with random under-sampling is proposed to deal with the data imbalance problem. Rigorous cross-validations show that the accuracy of GOIL-Apo is much better than the up-to-date predictors.


