Username   Password       Forgot your password?  Forgot your username? 


Data Complexity Analysis for Software Defect Detection

Volume 14, Number 8, August 2018, pp. 1695-1704
DOI: 10.23940/ijpe.18.08.p5.16951704

Ying Maa,b,*, Yichang Lia, Junwen Lua, Peng Sunc, Yu Sund, and Xiatian Zhue

aXiamen University of Technology, Xiamen, 361024, China
bEngineering Research Center for Software Testing and Evaluation of Fujian Province, Xiamen, 361024, China
cUniversity of Electronic Science and Technology of China, Chengdu, 610054, China
dXiamen Institute of Software Technology, Xiamen, 361000, China
eQueen Mary, University of London, London, E1 4NS, UK

(Submitted on May 11, 2018; Revised on June 20, 2018; Accepted on July 26, 2018)


Most researchers conduct defect detection under the assumption that the training and future test data must be in the same feature space and the same distribution. However, in the practical applications, data sets come from different domains and different distributions. Sometimes, local data in the target projects are limited and data are usually affected by noise. In these cases, the performance of the software defect detection model is uncertain. Firstly, we introduce the data complexity concept into the software engineering from data mining field. Secondly, we investigate the data complexity measurement on public software data sets to find out which complexity metric is appropriate to apply in defect detection. Finally, we analyze the relationship between complexity metrics and model performance to gain valuable insight into the effects of data complexity on defect detection. We are optimistic that our method can provide decision-making support for detection model management and design.


References: 19

              1. J. D. Strate and P. A. Laplante, “A Literature Review of Research in Software Defect Reporting,” IEEE Transactions on Reliability, Vol. 62, No. 2, pp. 444-454, June 2013
              2. T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Transactions on Software Engineering, Vol. 33, No. 1, pp. 2-13, January  2007 
              3. M. X. Liu, L. S. Miao, and D. Q. Zhang, “Two-Stage Cost-Sensitive Learning for Software Defect Prediction,” IEEE Transaction Reliability, Vol. 63, No. 2, pp. 676- 686, June 2014
              4. Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer Learning for Cross-Company Software Defect Prediction,” Information and Software Technology, Vol. 54, No. 3, pp. 248-256, March 2012
              5. J. Ren, K. Qin, Y. Ma, and G. Luo, “On Software Defect Prediction using Machine Learning,” Journal of Applied Mathematics, No. 3, pp. 201-211, 2014
              6. P. A. Laplante and J. F. Defranco, “Software Engineering of Safety-Critical Systems: Themes From Practitioners,” IEEE Transactions on Reliability, Vol. 99, pp. 1-12, September 2017
              7. X. Yang, K. Tang, and X. Yao, “A Learning-to-Rank Approach to Software Defect Prediction,” IEEE Transactions on Reliability, Vol. 64, No.1, pp. 234-246, March 2015
              8. T. K. Ho and M. Basu, “Complexity Measures of Supervised Classification Problems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 3, pp. 289-300, March 2002
              9. Y. Ma, K. Qin, and S. Zhu, “Discrimination Analysis for Predicting Defect-Prone Software Modules,” Journal of Applied Mathematics, No. 1, 2014
              10. S. Wang and J. Wei, “Feature Selection based on Measurement of Ability to Classify Subproblems,” Neurocomputing, Vol. 224, pp. 155-165, February 2017
              11. M. Basu and T. K. Ho, “Data Complexity in Pattern Recognition,” Springer Science & Business Media, 2006
              12. S. Singh, “Multiresolution Estimates of Classification Complexity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, pp. 1534-1539, December 2003
              13. R. Baumgartner and R. L. Somorjai, “Data Complexity Assessment in Under-Sampled Classification of High-Dimensional Biomedical Data,” Pattern Recognition Letters, Vol. 27, No. 12, pp. 1383-1389, September 2006
              14. L. Morn-Fernndez, V. Boln-Canedo, and A. Alonso-Betanzos, “Can Classification Performance be Predicted by Complexity Measures? A Study using Microarray Data,” Knowledge & Information Systems, pp. 1-24, October 2016
              15. L. Morn-Fernndez, V. Boln-Canedo, and A. Alonso-Betanzos, “Centralized vs. Distributed Feature Selection Methods based on Data Complexity Measures,” Knowledge-based Systems, Vol. 117, pp. 27-45, February 2017
              16. J. Luengo and A. Herrera, “An Automatic Extraction Method of the Domains of Competence for Learning Classifiers using Data Complexity Measures,” Knowledge and Information Systems, Vol. 42, No. 1, pp. 147-180, October 2015
              17. I. H. Witten and E. Frank, “Data Mining: Practical Machine Learning Tools and Techniques,” Morgan Kaufmann, 2005
              18. M. Sokolova and G. Lapalme, “A Systematic Analysis of Performance Measures for Classification Tasks,” Information Processing & Management, Vol. 45, No. 4, pp. 427-437, May 2009
              19. G. Boetticher, T. Menzies, and T. Ostrand, “The PROMISE Repository of Empirical Software Engineering Data,” West Virginia University Department of Computer Science, 2007


                          Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

                          This site uses encryption for transmitting your passwords.