Username   Password       Forgot your password?  Forgot your username? 


Mixed Weighted KNN for Imbalanced Datasets

Volume 14, Number 7, July 2018, pp. 1391-1400
DOI: 10.23940/ijpe.18.07.p2.13911400

Qimin Caoa, Lei Lab, Hongxia Liua, and Si Hana

aLibrary, China University of Political Science and Law, Beijing, 100088, China
bSchool of information technology & management, University of International Business and Economic, Beijing, 100029, China

(Submitted on April 13, 2018; Revised on May 25, 2018; Accepted on June 25, 2018)


It is well known that imbalanced datasets are a common phenomenon and will reduce the accuracy of classification. For solving the class imbalance problem, this paper proposed the mixed weighted KNN algorithm. According to the imbalance between the classes, this algorithm assigns each sample of datasets an inverse proportion weight, and then it combines with the distance weight, making the weight of the training sample close to the test sample greater. In order to improve the operating efficiency and make it easy to handle massive datasets, we implemented the parallelism of MW-KNN based on the Hadoop framework. Experimental results show that the proposed algorithm is simple and effective.


References: 21

          1. T. Cover, P. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. Inf. theory, vol.13, no.1, pp.21-27, 1967.
          2. H. Dubey, V. Pudi, “ Class based Weighted K-nearest Neighbor over Imbalance Dataset,” Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.305-316,2013.
          3. A. Fernández, S. García, F. Herrera, “Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution,” Hybrid Artificial Intelligent Systems, pp.1-10, 2011.
          4. B. He, W. Fang, Q. Luo, et al. “Mars: a MapReduce Framework on Graphics Processors,” International Conference on Parallel Architectures and Compilation Techniques IEEE, PP. 260-269, 2008.
          5. J. V. Hulse, T. Khoshgoftaar, “Knowledge Discovery from Imbalanced and Noisy Data,” Data & Knowledge Engineering, vol.68, no.12, pp.1513-1542, 2009.
          6. Z. Hajizadeh, M. Taheri, M. Z. Jahromi. “Nearest Neighbor Classification with Locally Weighted Distance for Imbalanced Data,” International Journal of Computer and Communication Engineering, vol.3, no.2, pp. 81-86, 2014.
          7. N. Japkowicz, S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol.6, no.5, pp. 429-449, 2002.
          8. W. Liu, S. Chawla, “Class Confidence Weighted Knn Algorithms for Imbalanced Data Sets,” Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp.345-356, 2011.
          9. R. Min, D. A. Stanley, Z. Yuan, et al, “A Deep Non-linear Feature Mapping for Large-margin KNN Classification,” Ninth IEEE International Conference on Data Mining (ICDM'09) IEEE Computer Society, pp.357-366, 2009.
          10. H. Patel, G. S. Thakur, “A Hybrid Weighted Nearest Neighbor Approach to Mine Imbalanced Data,” Proceedings of the International Conference on Data Mining (DMIN), pp.106-110, 2016.
          11. D. Ramyachitra, P. Manikandan, “Imbalanced Dataset Classification and Solutions: A Review,” International Journal of Computing and Business Research, vol.5,no.4, pp.1-9, April 2014.
          12. S. Tan, “Neighbor-weighted K-nearest Neighbor for Unbalanced Text Corpus,” Expert Systems with Applications, vol.28, no.4, pp. 667-671, 2005.
          13. R. Vernica, M. J. Carey, C. Li, “Efficient Parallel Set-similarity Joins Using MapReduce,” Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, PP. 495-506, 2010.
          14. S. Vajda, G. A. Fink, “Strategies for Training Robust Neural Network based Digit Recognizers on Unbalanced Data Sets,” International Conference on Frontiers in Handwriting Recognition IEEE, pp.148-153, 2010.
          15. K. Q. Weinberger, L. K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” The Journal of Machine Learning Research, vol.10, no.1, pp. 207-244, 2009.
          16. X. D. Wu, and V. Kumar, W. B. Li, S. Y. Wu, “The Top Ten Algorithms in Data Mining,” Tsinghua University press, Beijing, 2013.
          17. Y. Wang, L. Xu, “Research on Text Categorization of KNN based on K-Means for Class Imbalanced Problem,” Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control IEEE, pp.579-583,2016
          18. Q. Yang, X. Wu, “10 Challenging Problems in Data Mining Research,” International Journal of Information Technology & Decision Making, vol.5, no.4, pp.597–604, 2006.
          19. T. Yang, L. Cao, C. Zhang, “A Novel Prototype Reduction Method for the K-nearest Neighbor Algorithm with K≥1,” Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2010.
          20. Y. Yan, T. Ma, J. Wang. “Parallel Implementing KNN Classification Algorithm Using MapReduce Programming Mode,” Journal of Nanjing University of Aeronautics & Astronautics, vol.45, no.4, pp.550-555, 2013.
          21. W. Zhao, H. Ma, Q. He, “Parallel K-means Clustering based on MapReduce,” Cloud Computing, Springer, Berlin, Heidelberg, 2009.


                  Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

                  This site uses encryption for transmitting your passwords.