Username   Password       Forgot your password?  Forgot your username? 


An Information Flow-based Feature Selection Method for Cross-Project Defect Prediction

Volume 14, Number 6, June 2018, pp. 1263-1274
DOI: 10.23940/ijpe.18.06.p17.12631274

Yaning Wu, Song Huang, and Haijin Ji

Research Center of Software Engineering, Army Engineering University of PLA, Nanjing, 210001, China

(Submitted on March 12, 2018; Revised on April 17, 2018; Accepted on May 8, 2018)


Software defect prediction (SDP) plays a significant part in identifying the most defect-prone modules before software testing and allocating limited testing resources. One of the most commonly used scenarios in SDP is classification. To guarantee the prediction accuracy, the classification models should first be trained appropriately. The training data could be obtained from historical software repositories, which may affect the performance of classification to a large extent. In order to improve the data quality, we propose a novel software feature selection method, which innovatively utilizes the information flows to perform causality analysis in the features of training datasets. More specifically, we conduct causality analysis between each feature metric and the labeled metric bug; then, based on the obtained feature ranking list, we select the top-k features to control redundancy. Finally, we choose the most suitable feature subset based on the F-measure. To demonstrate the effectiveness and practicability of the feature selection method, we select the Nearest Neighbor approach to construct a homogeneous training dataset, and utilize three commonly used classification models to implement comparison experiments. The final experimental results have verified the availability and validity of the feature selection method.


References: 34

        1. J. Ba and S. Wu, "SdDirM: A dynamic defect prediction model," in Ieee/asme International Conference on Mechatronics and Embedded Systems and Applications, 2012, pp. 252-256.
        2. C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics): Springer-Verlag New York, Inc., 2006.
        3. G. Boetticher, T. Menzies, T. Ostrand, and G. Boetticher, "\{PROMISE\} Repository of empirical software engineering data," West Virginia University Department of Computer Science, 2007.
        4. S. Chamoli, G. Tenne, and S. Bhatia, "Analysing Software Metrics for Accurate Dynamic Defect Prediction Models," Indian Journal of Science & Technology, vol. 8, 2015.
        5. S. S. Choi, S. H. Cha, and C. C. Tappert, "A Survey of Binary Similarity and Distance Measures," Journal of Systemics Cybernetics & Informatics, vol. 8, pp. 43--48, 2009.
        6. K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, "Choosing software metrics for defect prediction: an investigation on feature selection techniques," Software Practice & Experience, vol. 41, pp. 579–606, 2011.
        7. C. W. J. Granger, "Investigating Causal Relations by Econometric Models and Cross-spectral Methods," Econometrica, vol. 37, pp. 424-438, 1969.
        8. H. He and E. A. Garcia, "Learning from Imbalanced Data," Knowledge & Data Engineering IEEE Transactions on, vol. 21, pp. 1263-1284, 2009.
        9. P. He, B. Li, X. Liu, J. Chen, and Y. Ma, "An empirical study on software defect prediction with a simplified metric set," Information & Software Technology, vol. 59, pp. 170-190, 2015.
        10. Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, "An investigation on the feasibility of cross-project defect prediction," Automated Software Engineering, vol. 19, pp. 167-199, 2012.
        11. K. Herzig, S. Just, and A. Zeller, "It's not a bug, it's a feature: How misclassification impacts bug prediction," in International Conference on Software Engineering, 2013, pp. 392-401.
        12. Y. Hong, W. Kim, and J. Joo, "Prediction of defect distribution based on project characteristics for proactive project management," in International Conference on Predictive MODELS in Software Engineering, 2010, p. 15.
        13. G. Jagannathan, K. Pillaipakkamnatt, and R. N. Wright, "A Practical Differentially Private Random Decision Tree Classifier," in IEEE International Conference on Data Mining Workshops, 2009, pp. 114-121.
        14. X. Y. Jing, S. Ying, Z. W. Zhang, S. S. Wu, and J. Liu, "Dictionary learning based software defect prediction," 2014, pp. 414-423.
        15. M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," in International Conference on Predictive MODELS in Software Engineering, Promise 2010, Timisoara, Romania, September, 2010, pp. 1-10.
        16. T. M. Khoshgoftaar, K. Gao, and A. NAPOLITANO, "AN EMPIRICAL STUDY OF FEATURE RANKING TECHNIQUES FOR SOFTWARE QUALITY PREDICTION," International Journal of Software Engineering & Knowledge Engineering, vol. 22, pp. 161-183, 2012.
        17. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Transactions on Software Engineering, vol. 34, pp. 485-496, 2008.
        18. X. S. Liang, "Normalizing the causality between time series," Phys. Rev. E, vol. 92, 2015.
        19. X. S. Liang, "Unraveling the cause-effect relation between time series," Physical Review E Statistical Nonlinear & Soft Matter Physics, vol. 90, p. 052150, 2014.
        20. W. Liu, S. Liu, Q. Gu, and J. Chen, "Empirical Studies of a Two-Stage Data Preprocessing Approach for Software Fault Prediction," IEEE Transactions on Reliability, vol. 65, pp. 1-16, 2015.
        21. T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Transactions on Software Engineering, vol. 33, pp. 2-13, 2007.
        22. T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, "Defect prediction from static code features: current results, limitations, new approaches," Automated Software Engineering, vol. 17, pp. 375-407, 2010.
        23. J. Nam and S. Kim, "Heterogeneous defect prediction," IEEE Transactions on Software Engineering, vol. PP, pp. 1-1, 2015.
        24. F. Rahman, S. Khatri, E. T. Barr, and P. Devanbu, "Comparing static bug finders and statistical prediction,", pp. 424-434, 2014.
        25. Rahman, Foyzur, Posnett, Daryl, Herraiz, Devanbu, et al., "Sample size vs. bias in defect prediction," 2013.
        26. C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, "RUSBoost: A Hybrid Approach to Alleviating Class Imbalance," IEEE Transactions on Systems Man & Cybernetics Part A Systems & Humans, vol. 40, pp. 185-197, 2010.
        27. M. Shepperd, Q. Song, Z. Sun, and C. Mair, "Data Quality: Some Comments on the NASA Software Defect Datasets," IEEE Transactions on Software Engineering, vol. 39, pp. 1208-1215, 2013.
        28. S. Shivaji, E. J. Whitehead, R. Akella, and S. Kim, "Reducing Features to Improve Code Change-Based Bug Prediction," IEEE Transactions on Software Engineering, vol. 39, pp. 552-569, 2013.
        29. Q. Song, J. Ni, and G. Wang, "A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data," Knowledge & Data Engineering IEEE Transactions on, vol. 25, pp. 1-14, 2013.
        30. B. Turhan, T. Menzies, A. B. Bener, and J. D. Stefano, "On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng," Empirical Software Engineering, vol. 14, pp. 540-578, 2009.
        31. J. Vaidya, M. Kantarcıoğlu, and C. Clifton, "Privacy-preserving Naïve Bayes classification," The VLDB Journal, vol. 17, pp. 879-898, 2008.
        32. H. Wang, T. M. Khoshgoftaar, and A. Napolitano, "A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction," in International Conference on Machine Learning and Applications, Icmla 2010, Washington, Dc, Usa, 12-14 December, 2010, pp. 135-140.
        33. S. Wang and X. Yao, "Using Class Imbalance Learning for Software Defect Prediction," IEEE Transactions on Reliability, vol. 62, pp. 434-443, 2013.
        34. F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou, "Towards building a universal defect prediction model," 2014, pp. 182-191.


              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

              Download this file (IJPE-2018-06-17.pdf)IJPE-2018-06-17.pdf[An Information Flow-based Feature Selection Method for Cross-Project Defect Prediction]782 Kb
              This site uses encryption for transmitting your passwords.