Username   Password       Forgot your password?  Forgot your username? 


Clustering-Based Feature Selection Framework for Microarray Data 

Volume 13, Number 4, July 2017 - Paper 5 - pp. 383-389
DOI: 10.23940/ijpe.17.04.p5.383389

Smita Chormungea, and Sudarson Jenab

aResearch Scholar, Department of Computer Science and Engineering, GITAM University, Hyderabad, INDIA
bDepartment of Information Technology, GITAM University, Hyderabad, INDIA

(Submitted on December 4, 2016; Revised on May 7, 2017; Accepted on June 18, 2017)


Gene’s expression data contains hundreds to thousands of features. It is challenging for machine learning algorithms to find the relevant information from such huge and correlated data. Irrelevant and redundant features are computationally costly and decrease the accuracy of machine learning algorithms. Feature selection plays important role to solve the problem of dimensionality. But most of the traditional feature selection algorithms fail to scale on high dimensionality problems. In this paper Clustering based Feature Selection Framework named as (CFSF) is proposed. CFSF produces optimal feature subset by eliminating irrelevant features using clustering algorithm and redundant features by applying filter measure on each cluster. Extensive experiments are carried out to compare proposed framework and other representative methods with respect to two classifiers namely Naive Bayes and Instance Based on microarray datasets. The empirical study demonstrates that the proposed framework is very efficient and effective for producing optimal feature subset and improves classifier performance.


References: 25

1.    John, G. H., Kohavi, R. and Pfleger, K., “Irrelevant features and the subset selection problem” In Proc. the Eleventh International Conference on Machine Learning, 121-129, 1994.
2.    M. Dash and H. Liu, 1997, “Feature Selection for Classification,” Intelligent Data Analysis, vol. 1, no. 3, pp. 131 156.
3.    Liu, H. and Yu, L, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491-502, 2005.
4.    D. K. Bhattacharyya, J. K. Kalita, “Network Anomaly Detection: A Machine Learning Perspective,” CRC Press, 2013.
5.    H. Frohlich, O. Chapelle, B. Scholkopf, “Feature selection for support vector machines by means of genetic algorithm, in: Tools with Artificial Intelligence,” Proceedings 15th IEEE International Conference on, IEEE, pp. 142–148,2003.
6.    S.-W. Lin, K.-C. Ying, C.-Y. Lee and Z.-J. Lee, “An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection,” Applied Soft Computing, 12 (10) 3285–3290,2012.
7.    L. Yu, H. Liu, “Redundancy based feature selection for microarray data,” Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2004, pp. 737–742,2004.
8.    I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
9.    C. Krier, D. Francois, F. Rossi, and M. Verleysen, “Feature Clustering and Mutual Information for the Selection of Variables in Spectral Data,” Proc. European Symp. Artificial Neural Networks Advances in Computational Intelligence and Learning, pp. 157-162, 2007.
10.    G. Van Dijck and M.M. Van Hulle, “Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis,” Proc. Int’l Conf. Artificial Neural Networks, 2006.
11.    Qinbao Song, Jingjie Ni, and Guangtao Wang A, “Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 1, January 2013
12.    Yu-MengXu , Chang-DongWang and Jian-HuangL, “Weighted Multi-view Clustering with Feature Selection,” Pattern Recognition,53,pp-25-35,2016.
13.    Darío García-García and Raúl Santos-Rodríguez, “Spectral Clustering and Feature Selection for Microarray Data,” Machine Learning and Applications, Fourth International Conference on (2009), Miami Beach, Florida, Dec. 13, 2009 to Dec. 15, 2009, pp: 425-428, ISBN: 978-0-7695-3926-3: DOI ICMLA. 2009.86
14.    Gouchol Pok, Jyh-Charn Steve Liu, and Keun Ho Ryu, “Effective feature selection framework for cluster analysis of microarray data,” Bioinformation. 2010; 4(8): 385–389. PMCID: PMC2951666.
15.    K. Kira and L.A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. 10th Nat’l Conf. .Artificial Intelligence, pp. 129-134, 1992.
16.    Mark A. Hall, Geoff rey Holmes, “Benchmarking Attribute Selection Techniques for Discrete Class Data Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, NO. 3,2003.
17.    G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text Classification,” J. Machine Learning Research, vol. 3, pp. 1289-1305,2003.
18.    Dhillon I. and Modha D, “Concept Decomposition for Large Sparse Text Data Using Clustering. Machine Learning,” 42, pp.143-175, 2001.
19.    LeiWuy, Rong Jinz, Steven C.H. Hoiy, Jianke Zhu, and Nenghai Yu, “Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 478-491, 2010.
20.    Onoda, T., Sakai, M., “Independent component analysis based seeding method for k-means clustering,” In: IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology 2011. doi:10.1109/WI-IAT.2011.29.
21.    Smita Chormunge, Sudarson Jena, “Metric based Performance Analysis of Clustering Algorithms for High Dimensional Data,” Proc International Conf on IEEE, doi 10.1109/CSNT CSNT, pp 1060-1064,.2015.127,2015.
22.    Michael Greenacre,Raul Primicerio, “Measures of Distance between Samples: Euclidean.. Fundacion,” BBVA publication, ISBN: 978-84-92937-50-9 pp-47-59,2013.
23.    Bourennani F,Ken Q. Pu,Ying Zhu, “Visualization and Integration of Databases Using Self-Organizing Map,” IEEE International Conference on Advances in Databases, Knowledge, and Data Applications, pp-155-160, 2009,DOI 10.1109/DBKDA.2009.30.
24.    Remco R. Bouckaert, Eibe Frank, Peter Reutemann, Mark Hall, Richard Kirkby, Alex Seewald and David Scuse, “WEKA Manual for Version 3-7-10”, 2013.
25.    Datasets can be downloaded from,,


Click here to download the paper.

Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

This site uses encryption for transmitting your passwords.