Username   Password       Forgot your password?  Forgot your username? 

 

Impact of Hyper Parameter Optimization for Cross-Project Software Defect Prediction

Volume 14, Number 6, June 2018, pp. 1291-1299
DOI: 10.23940/ijpe.18.06.p20.12911299

Yubin Qua, Xiang Chenb, Yingquan Zhaob, and Xiaolin Jub

aSchool of Mechanical and Electrical Engineering, Jiangsu College of Engineering and Technology, Nantong, 212003, China
bSchool of Computer Science and Technology, Nantong University, Nantong, 226000, China

(Submitted on March 2, 2018; Revised on April 16, 2018; Accepted on May 19, 2018)

Abstract:

Recently, most studies have considered the default value for hyper parameters of the classification methods used by cross-project defect prediction (CPDP) methods. However, in previous studies for within-project defect prediction (WPDP), researchers found that the optimization for hyper parameter helps to improve the performance of software defect prediction models. Moreover, the default value for some hyper parameters in different machine learning libraries (such as Weka, Scikit-learn) may not be consistent. To the best of our knowledge, we first conduct an in-depth analysis for the influence on the performance of CPDP by using hyper parameter optimization. Based on different classification methods, we consider 5 different instance selection based CPDP methods in total. In our empirical studies, we choose 8 projects in AEEEM and Relink datasets as our evaluation subjects, and we use AUC as our model performance measure. Final results show that among these methods, the influence of hyper parameter optimization for 4 methods is non-negligible. Among the 11 hyper parameters considered by these 5 classification methods, the influence of 8 hyper parameters is non-negligible, and these hyper parameters are mainly distributed in support vector machine and k nearest neighbor classification methods. Meanwhile, by analyzing the actual computational cost of hyper parameter optimization, we find that the spent time is within the acceptable range. These empirical results show that in the future CPDP research, the hyper parameter optimization should be considered in experimental design.

 

References: 32

        1. M. D’Ambros, M. Lanza, and R. Robbes, “An extensive comparison of bug prediction approaches,” in Proceedings of International Conference on Mining Software Repositories, 2010, pp. 31–41.
        2. X. Chen, Y. Zhao, Q. Wang, and Z. Yuan, “Multi: Multi-objective effort-aware just-in-time software defect prediction,” Information & Software Technology, vol. 93, pp. 1–13, 2018.
        3. S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp.476–493, 1994.
        4. J. Cohen, “Statistical power analysis for the behavioral sciences,” Journal of the American Statistical Association, vol. 2, no. 4, pp. 19–74,1988.
        5. D. A. D. Costa, S. Mcintosh, W. Shang, U. Kulesza, R. Coelho, and A. Hassan, “A framework for evaluating the results of the szz approach for identifying bug-introducing changes,” IEEE Transactions on Software Engineering, vol. PP, no. 99, pp. 1–1, 2016.
        6. T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering, vol. 38, no. 6, pp.1276–1304, 2012.
        7. M. H. Halstead, Elements of Software Science (Operating and Programming Systems Series). New York, NY, USA: Elsevier Science Inc., 1977.
        8. S. Herbold, A. Trautsch, and J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” IEEE Transactions on Software Engineering, vol. PP, no. 99, pp. 1–1, 2017.
        9. S. Hosseini, B. Turhan, and D. Gunarathna, “A systematic literature review and meta-analysis on cross project defect prediction,” IEEE Transactions on Software Engineering, vol. PP, no. 99, pp. 1–1, 2017.
        10. X. Jing, F. Wu, X. Dong, F. Qi, and B. Xu, “Heterogeneous cross-company defect prediction by unified metric representation and CCA based transfer learning,” in Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2015, pp. 496–507.
        11. Y. Jiang, B. Cukic, and T. Menzies, “Can data transformation help in the detection of fault-prone modules?” in Proceedings of The Workshop on Defects in Large Software Systems, 2008, pp. 16–20.
        12. Y. Kamei and E. Shihab, “Defect prediction: Accomplishments and future challenges,” in Proceedings of International Conference on Software Analysis, Evolution, and Reengineering, 2016, pp. 33–45.
        13. T. Lee, J. Nam, D. Han, S. Kim, and H. P. In, “Developer micro interaction metrics for software defect prediction,” IEEE Transactions on Software Engineering, vol. 42, no. 11, pp. 1015–1035, 2016.
        14. S. Liu, X. Chen, W. Liu, J. Chen, Q. Gu, and D. Chen, “Fecar: A feature selection framework for software defect prediction,” in Proceedings of Annual Computer Software and Applications Conference, 2014, pp. 426–435.
        15. W. Liu, S. Liu, Q. Gu, J. Chen, X. Chen, and D. Chen, “Empirical studies of a two-stage data preprocessing approach for software fault prediction,” IEEE Transactions on Reliability, vol. 65, no. 1, pp. 38–53,2016.
        16. W. Liu, S. Liu, Q. Gu, X. Chen, and D. Chen, “Fecs: A cluster based feature selection method for software fault prediction with noises,” in Proceedings of Annual Computer Software and Applications Conference,2015, pp. 276–281.
        17. Y. Ma, G. Luo, X. Zeng, and A. Chen, “Transfer learning for cross-company software defect prediction,” Information & Software Technology, vol. 54, no. 3, pp. 248–256, 2012.
        18. T. J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vol. 2, no. 4, pp. 308–320, 1976.
        19. T. Mende, “Replication of defect prediction studies: problems, pitfalls and recommendations,” in Proceedings of International Conference on Predictive MODELS in Software Engineering, 2010, p. 5.
        20. J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in Proceedings of the International Conference on Software Engineering, 2013, pp. 382–391.
        21. J. Nam and S. Kim, “Heterogeneous defect prediction,” in Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2015, pp. 508–519.
        22. J. Nam and S. Kim, “Clami: Defect prediction on unlabeled datasets,” in Proceedings of International Conference on Automated Software Engineering, 2015, pp. 452–463.
        23. C. Ni, W. S. Liu, X. Chen, Q. Gu, D. X. Chen, and Q. G. Huang, “A cluster based feature selection method for cross-project software defect prediction,” Journal of Computer Science and Technology, vol. 32, no. 6, pp. 1090–1107, 2017.
        24. C. Tantithamthavorn, “Towards a better understanding of the impact of experimental components on defect prediction modelling,” in Proceedings of International Conference on Software Engineering, 2017, pp.867–870.
        25. C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “Automated parameter optimization of classification techniques for defect prediction models,” in Proceedings of the International Conference on Software Engineering, 2016, pp. 321–332.
        26. A. Tosun and A. Bener, “Reducing false alarms in software defect prediction by decision threshold optimization.” in Proceedings of International Symposium on Empirical Software Engineering and Measurement,2009, pp. 477–480.
        27. B. Turhan, T. Menzies, A. B. Bener, and J. D. Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering, vol. 14, no. 5, pp. 540–578, 2009.
        28. R. Wu, H. Zhang, S. Kim, and S.-C. Cheung, “Relink: Recovering links between bugs and changes,” in Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2011, pp. 15–25.
        29. X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “Hydra: Massively compositional model for cross-project defect prediction,” IEEE Transactions on Software Engineering, vol. 42, no. 10, pp. 977–998,2016.
        30. H. Zhang, “An investigation of the relationships between lines of code and defects,” in Proceedings of International Conference on Software Maintenance, 2009, pp. 274–283.
        31. F. Zhang, Q. Zheng, Y. Zou, and A. E. Hassan, “Cross-project defect prediction using a connectivity-based unsupervised classifier,” in Proceedings of the International Conference on Software Engineering, 2016,pp. 309–320.
        32. T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: a large scale experiment on data vs.domain vs. process,” in Proceedings of Joint Meeting of the European Software Engineering Conference and the International Symposium on Foundations of Software Engineering, 2009, pp. 91–100.

               

              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

              Attachments:
              Download this file (IJPE-2018-06-20.pdf)IJPE-2018-06-20.pdf[Impact of Hyper Parameter Optimization for Cross-Project Software Defect Prediction]548 Kb
               
              This site uses encryption for transmitting your passwords. ratmilwebsolutions.com