Username   Password       Forgot your password?  Forgot your username? 

A Novel Ensemble Classification for Data Streams with Class Imbalance and Concept Drift

Volume 13, Number 6, October 2017 - Paper 15  - pp. 945-955
DOI: 10.23940/ijpe.17.06.p15.945955

Yange Suna,b, Zhihai Wanga,*, Hongtao Lia, Yao Lia

aSchool of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
bSchool of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China

(Submitted on July 25, 2017; Revised on August 30, 2017; Accepted on September 15, 2017)

(This paper was presented at the Third International Symposium on System and Software Reliability.)


The processing of streaming data implies new requirements concerning restrictive processing time, limited amount of memory and one scan of incoming instances. One of the biggest challenges facing data stream learning is to deal with concept drift, i.e., the underlying distribution of the data may be evolving over time. Most of the approaches in the literature are under the hypothesis that the distribution of classes is balance. Unfortunately, the class imbalance issue is common in the real-world. And the imbalance issue further increases the difficulty of solving the concept drift problem. Motivated by this challenge, a novel ensemble classification for mining imbalanced streaming data is proposed to overcome both issues simultaneously. The algorithm utilizes the under-sampling and over-sampling techniques to balance the positive and negative instances. Moreover, dynamic weighting strategy was adopted to deal with concept drift. The experimental results on synthetic and real datasets demonstrate that our proposed method performs better than competitive algorithms, especially in situations where there exist concept drift and class imbalance.


References: 26

    1. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "MOA: Massive Online Analysis," Journal of Machine Learning Research, vol. 11, pp. 1601-1604, May 2010.
    2. D. Brzeziński, and J. Stefanowski, "Accuracy Updated Ensemble for Data Streams with Concept Drift," In Proceedings of the 6th International Conference on Hybrid Artificial Intelligent Systems, Berlin, Springer-Verlag, pp. 155-163, 2011.
    3. D. Brzezinski, and J. Stefanowski, "Reacting to Different Types of Concept Drift: The accuracy updated ensemble algorithm," IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 1, pp. 81-94, January 2014.
    4. N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, "Smote: Synthetic Minority Oversampling Technique,". J. Artif. Intell. Res., vol. 16, pp. 321-357, 2002.
    5. S. Chen, and H. He, "Towards Incremental Learning of Nonstationary Imbalanced Data Stream: a Multiple Selectively Recursive Approach," Evol. Syst., vol. 2, no. 1, pp. 35-50, March 2011.
    6. L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, and O. Kipersztok, "Real-Time Data Mining of Non-stationary Data Streams from Sensor Networks," Information Fusion, vol. 9, no. 3, pp. 344-353, 2008.
    7. G. Ditzler, and R. Polikar, "Incremental Learning of Concept Drift from Streaming Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol 25, no. 10, pp. 2283-2301, October 2013.
    8. P. Domingos, and G. Hulten, "Mining High-Speed Data Streams," in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: ACM Press, pp. 71-80, August 2000.
    9. R. Elwell, and R. Polikar, "Incremental Learning of Concept Drift in Nonstationary Environments," IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1517-1531, October 2011.
    10. J. Gama, "Knowledge Discovery from Data Streams," New York: CRC Press, 2010.
    11. J. Gama, R. Sebastiao, and P. P. Rodrigues, "On Evaluating Stream Learning Algorithms," Machine Learning, vol. 90, no. 3, pp. 317-346, March 2013.
    12. J. Gama, I. Žliobaitė, A. Bifet, and M. Pechenizkiy, "A Survey On Concept Drift Adaptation," ACM Computing Surveys, vol. 46, no. 4, pp. 231-238, April 2014.
    13. J. Gao, W. Fan, J. Han, and P. S. Yu, "A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions," in Proceedings of the 7th SIAM International Conference on Data Mining. Minneapolis, Society for Industrial and Applied Mathematics, pp. 3-14, 2007.
    14. H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet, "A Survey on Ensemble Learning for Data Stream Classification," ACM Computing Surveys (CSUR), vol. 50, no. 2, Article 23. pp. 1-36, June 2017.
    15. H. He, and E. A. Garcia, "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263-1284, September 2009.
    16. I. Katakis, G. Tsoumakas, and I. Vlahavas, "Tracking Recurring Contexts Using Ensemble Classifiers: An application to email filtering," Knowledge and Information Systems, vol. 22, issue 3,pp. 371-391, March 2010.
    17. C. X. Ling, V. S. Sheng, and Q. Yang, "Test Strategies for Cost-sensitive Decision Trees," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1055-1067, August 2006.
    18. B. Mirza, Z. Lin, and K. A. Toh, "Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning," Neural Processing Letters, vol. 38, no. 3, pp. 465-486, December 2013.
    19. G. D. F. Morales, A. Bifet, L. Khan, J. Gama, and W. Fan, "IoT Big Data Stream Mining," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: ACM Press, pp. 2119-2120, August 2016.
    20. W. N. Street, and Y. S. Kim, "A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification," in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, pp. 377-382, August 2011.
    21. Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. Wang, "Cost-Sensitive Boosting for Classification of Imbalanced Data," Pattern Recognition, vol. 40, pp. 3358-3378, 2007.
    22. A. Tsymbal, "The Problem of Concept Drift: Definitions and Related Work. Technical Report," Department of Computer Science, Trinity College, Dublin, Ireland, 2004.
    23. G. I. Webb, R. Hyde, H. Cao, H. Cao, H. L. Nguyen, and F. Petitjean, "Characterizing Concept Drift," Data Mining and Knowledge Discovery, vol. 30, Issue 4, pp. 964-994, July 2016.
    24. G. Widmer, and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Contexts," Machine Learning, vol. 23, no. 1, pp. 69-101, April 1996.
    25. H. Wang, W. Fan, P. S. Yu, and J. Han, "Mining Concept-Drifting Data Streams Using Ensembles Classifiers," in Proceedings of Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: ACM Press, pp. 226-235, August 2003.
    26. S. Wang, and X. Yao, "Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models," in Proceedings of the IEEE Symposium Series on Computational Intelligence and Data Mining, Washington, DC: IEEE Computer Society, pp. 324-331, 2009.


      Click here to download the paper.

      Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

      This site uses encryption for transmitting your passwords.