Username   Password       Forgot your password?  Forgot your username? 


Challenges of Testing Machine Learning Applications

Volume 14, Number 6, June 2018, pp. 1275-1282
DOI: 10.23940/ijpe.18.06.p18.12751282

Song Huanga, Er-Hu Liub, Zhan-Wei Huia, Shi-Qi Tangb, and Suo-Juan Zhangb

aSoftware Testing and Evaluation Centre, Army Engineering University of PLA, Nanjing, 210001, China
bCommand & Control Engineering College, Army Engineering University of PLA, Nanjing, 210001, China

(Submitted on March 21, 2018; Revised on April 20, 2018; Accepted on May 16, 2018)


Machine learning applications have achieved impressive results in many areas and provided effective solution to deal with image recognition, automatic driven, voice processing etc. problems. As these applications are adopted by multiple critical areas, their reliability and robustness becomes more and more important. Software testing is a typical way to ensure the quality of applications. Approaches for testing machine learning applications are needed. This paper analyzes the characteristics of several machine learning algorithms and concludes the main challenges of testing machine learning applications. Then, multiple preliminary techniques are presented according to the challenges. Moreover, the paper demonstrates how these techniques can be used to solve the problems during the testing of machine learning applications.


References: 7

        1. Available:
        2. X. Y. Xie, J. W. K. Ho, C. Murphy, G. Kaiser, B. W. Xu, and T. Y. Chen, "Testing and validating machine learning classifiers by metamorphic testing," Journal of Systems and Software, vol. 84, pp. 544-558, Apr 2011.
        3. Available:
        4. S. Nakajima and H. N. Bui, "Dataset Coverage for Testing Machine Learning Computer Programs," in 2016 23rd Asia-Pacific Software Engineering Conference, A. Potanin, G. C. Murphy, S. Reeves, and J. Dietrich, Eds., ed New York: Ieee, 2016, pp. 297-304.
        5. K. Pei, Y. Cao, J. Yang, and S. Jana, "DeepXplore: Automated Whitebox Testing of Deep Learning Systems," pp. 1-18, 2017.
        6. Available:
        7. Available:
        8. W. E. Howden, "Theoretical and Empirical Studies of Program Testing," IEEE Trans.softw.eng, vol. 4, pp. 305-311, 1978.
        9. E. T. Barr, M. Harman, P. Mcminn, M. Shahbaz, and S. Yoo, "The Oracle Problem in Software Testing: A Survey," IEEE Transactions on Software Engineering, vol. 41, pp. 507-525, 2015.
        10. D. Peters and D. L. Parnas, "Generating a test oracle from program documentation:work in progress," in International Symposium on Software Testing and Analysis, 1994, pp. 58-65.
        11. P. Lin, J. Thangarajah, Z. Zhang, and T. Miller, "Model-Based Test Oracle Generation for Automated Unit Testing of Agent Systems," IEEE Transactions on Software Engineering, vol. 39, pp. 1230-1244, 2013.
        12. J. Mayer and R. Guderlei, "Test Oracles Using Statistical Methods," in Testing of Component-Based Systems and Software Quality, Proceedings of Soqua, 2004, pp. 179-189.
        13. A. Avizienis, "The N-Version Approach to Fault-Tolerant Software," IEEE Trans.softw.eng, vol. 11, pp. 1491-1501, 1985.
        14. T. Y. Chen, S. C. Cheung, and S. M. Yiu, "Metamorphic testing: a new approach for generating next test cases," 1998.
        15. Z. Q. Zhou, "Metamorphic testing: A review of challenges and opportunities," Acm Computing Surveys, 2018.
        16. Y. Tian, K. Pei, S. Jana, and B. Ray, "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars," 2017.
        17. Available:
        18. Available:
        19. G. J. Myers and C. Sandler, The Art of Software Testing: Wiley, 1979.
        20. S. Godboley, G. S. Prashanth, D. P. Mohapatro, and B. Majhi, "Increase in Modified Condition/Decision Coverage using program code transformer," in Advance Computing Conference, 2013, pp. 1400-1407.
        21. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Computer Science, 2014.
        22. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Computer Vision and Pattern Recognition, 2016, pp. 770-778.
        23. C. Murphy, G. E. Kaiser, L. Hu, and L. Wu, "Properties of Machine Learning Applications for Use in Metamorphic Testing," Department of Computer Science Columbia University, pp. 867-872, 2008.
        24. W. B. Langdon, S. Yoo, and M. Harman, "Inferring automatic test oracles," in Ieee/acm International Workshop on Search-Based Software Testing, 2017, pp. 5-6.
        25. N. Sahavechaphan and K. Claypool, "XSnippet: mining For sample code," in ACM Sigplan Conference on Object-Oriented Programming Systems, Languages, and Applications, 2006, pp. 413-430.
        26. P. G. Bishop, D. G. Esp, M. Barnes, P. Humphreys, G. Dahll, and J. Lahti, "PODS — A project on diverse software," IEEE Transactions on Software Engineering, vol. SE-12, pp. 929-940, 2012.
        27. S. Bajracharya and C. Lopes, "Mining search topics from a code search engine usage log," in IEEE International Working Conference on Mining Software Repositories, 2010, pp. 111-120.
        28. Y. Cao, Z. Q. Zhou, and T. Y. Chen, "On the Correlation between the Effectiveness of Metamorphic Relations and Dissimilarities of Test Case Executions," in International Conference on Quality Software, 2013, pp. 153-162.
        29. T. Y. Chen, D. H. Huang, T. H. Tse, and Z. Q. Zhou, "Case Studies on the Selection of Useful Relations in Metamorphic Testing," 2004.
        30. T. Y. Chen, P. L. Poon, and X. Xie, "METRIC: METamorphic Relation Identification based on the Category-choice framework ☆," Journal of Systems & Software, vol. 116, p. 0000, 2016.
        31. H. Liu, F. C. Kuo, D. Towey, and T. Y. Chen, "How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?," IEEE Transactions on Software Engineering, vol. 40, pp. 4-22, 2014.
        32. H. Liu, X. Liu, and T. Y. Chen, "A New Method for Constructing Metamorphic Relations," in International Conference on Quality Software, 2012, pp. 59-68.
        33. J. Mayer and R. Guderlei, "An Empirical Study on the Selection of Good Metamorphic Relations," in Computer Software and Applications Conference, 2006. COMPSAC '06. International, 2006, pp. 475-484.
        34. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, et al., "Intriguing properties of neural networks," Computer Science, 2013.
        35. X. Xu, X. Chen, C. Liu, A. Rohrbach, T. Darell, and D. Song, "Can you fool AI with adversarial examples on a visual Turing test?," 2017.
        36. Z. Zhao, D. Dua, and S. Singh, "Generating Natural Adversarial Examples," 2017.
        37. L. Metz, B. Poole, D. Pfau, and J. Sohldickstein, "Unrolled Generative Adversarial Networks," 2017.
        38. N. Carlini and D. Wagner, "Towards Evaluating the Robustness of Neural Networks," pp. 39-57, 2016.
        39. O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi, "Measuring Neural Net Robustness with Constraints," 2016.
        40. K. Chalupka, P. Perona, and F. Eberhardt, "Visual Causal Feature Learning," Computer Science, 2015.
        41. I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and Harnessing Adversarial Examples," Computer Science, 2014.
        42. X. Huang, M. Kwiatkowska, S. Wang, and M. Wu, "Safety Verification of Deep Neural Networks," 2016.
        43. U. Shaham, Y. Yamada, and S. Negahban, "Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization," Computer Science, 2015.
        44. S. Baluja and I. Fischer, "Adversarial Transformation Networks: Learning to Generate Adversarial Examples," 2017.
        45. A. Kurakin, I. Goodfellow, and S. Bengio, "Adversarial Machine Learning at Scale," 2016.
        46. J. Edvardsson, "A Survey on Automatic Test Data Generation," 1999.
        47. F. Corno, E. Snchez, M. S. Reorda, and G. Squillero, "Automatic Test Program Generation: A Case Study," IEEE Design & Test, vol. 21, pp. 102-109, 2004.
        48. R. A. Demillo and A. J. Offutt, "Constraint-based automatic test data generation," IEEE Trans.softw.eng, vol. 17, pp. 900-910, 1991.
        49. C. Nebut, F. Fleurey, Y. L. Traon, and J. M. Jezequel, "Automatic Test Generation: A Use Case Driven Approach," IEEE Transactions on Software Engineering, vol. 32, pp. 140-155, 2006.


              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

              This site uses encryption for transmitting your passwords.