Integrating Deep Learning Architectures for Enhanced Human Action Recognition: An Ensemble Approach

doi:10.23940/ijpe.24.04.p7.253262

Abstract

Abstract: Recent developments in deep learning have revolutionized the field of activity recognition by humans and other entities. These advancements allow models to learn complex representations and hierarchies from raw data, hence increasing recognition accuracy. Many machine learning algorithms, such as support vector machines and histogram of gradients with k-nearest neighbor classifiers, have lost part of their appeal due to the extensive feature engineering and data preparation required. The objective of this research is to construct an efficient and robust ensemble model by utilizing the strengths of Convolutional neural networks (CNN), Visual Geometry Group (VGG16), Inception model, and Residual Networks 50 (ResNet50) model in order to boost the resilience and predicted accuracy of human activity recognition from raw visual data. This work makes use of a large and diverse dataset that includes over 12,000 annotated photos illustrating various human activities and methodology yields promising results while removing the need for sophisticated feature manipulation. Furthermore, the results show how the ensemble model is better at Human Action Recognition (HAR).

Key words: deep learning, VGG16, ResNet50, inception, HAR

Ujjwal Deep, Sushant Kumar, and Kanika Singla. Integrating Deep Learning Architectures for Enhanced Human Action Recognition: An Ensemble Approach [J]. Int J Performability Eng, 2024, 20(4): 253-262.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

[1] Morshed M.G., Sultana T., Alam A., and Lee Y.K.Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities. Sensors, vol. 23, no. 4, pp. 2182, 2023.
[2] Reiss, A. and Stricker, D.Pamap2 Physical Activity Monitoring Data Set.UCI ML Repository, 2012.
[3] Amrani H., Micucci D., and Napoletano P.Personalized Models in Human Activity Recognition using Deep Learning. In2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp. 9682-9688, 2021.
[4] Zhou C., Li Q., Li C., Yu J., Liu Y., Wang G., Zhang K., Ji C., Yan Q., He L., and Peng H.A Comprehensive Survey on Pretrained Foundation Models: A History from Bert to Chatgpt.arXiv preprint arXiv:2302.09419, 2023.
[5] Singh P.K., Kundu S., Adhikary T., Sarkar R., and Bhattacharjee D.Progress of Human Action Recognition Research in the Last Ten Years: A Comprehensive Survey.Archives of Computational Methods in Engineering, pp.1-41, 2021.
[6] Jegham I., Khalifa A.B., Alouani I., and Mahjoub M.A.Vision-Based Human Action Recognition: An Overview and Real World Challenges. Forensic Science International: Digital Investigation, vol. 32, pp. 200901, 2020.
[7] Dataset, https://www.kaggle.com/datasets, accessed on April 1, 2024.
[8] Saleem G., Bajwa U.I., and Raza R.H.Toward Human Activity Recognition: A Survey. Neural Computing and Applications, vol. 35, no. 5, pp. 4145-4182, 2023.
[9] Çalışkan A.Detecting Human Activity Types from 3D Posture Data using Deep Learning Models. Biomedical Signal Processing and Control, vol. 81, pp. 104479, 2023.
[10] An S., Bhat G., Gumussoy S., and Ogras U.Transfer Learning for Human Activity Recognition using Representational Analysis of Neural Networks. ACM Transactions on Computing for Healthcare, vol. 4, no. 1, pp. 1-21, 2023.
[11] Kuehne H., Jhuang H., Garrote E., Poggio T., and Serre T.HMDB: A Large Video Database for Human Motion Recognition. In2011 International conference on computer vision, IEEE, pp. 2556-2563, 2011.
[12] UCF101 - Action Recognition Data Set, https://www.crcv.ucf.edu/data/UCF101.php, accessed on April 1, 2024.
[13] Reyes-Ortiz, F., Anguita, D., Ghio, A., Oneto, L., and Parra, X. Human Activity Recognition using Smartphones.UCI Machine Learning Repository, 2012.
[14] Saoudi E.M., Jaafari J., and and aloussi S.J.Advancing Human Action Recognition: A Hybrid Approach using Attention-Based LSTM and 3D CNN. Scientific African, vol. 21, pp. e01796, 2023.
[15] Mim T.R., Amatullah M., Afreen S., Yousuf M.A., Uddin S., Alyami S.A., Hasan K.F., and Moni M.A.GRU-INC: An Inception-Attention Based Approach using GRU for Human Activity Recognition. Expert Systems with Applications, vol. 216, pp. 119419, 2023.
[16] Roggen D., Calatroni A., Nguyen-Dinh, L.-V., Chavarriaga, R., and Sagha, H. OPPORTUNITY Activity Recognition.UCI Machine Learning Repository, 2012.
[17] Wensel J., Ullah H., and Munir A.Vit-Ret: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos.IEEE Access, 2023.
[18] UCF50 - Action Recognition Data Set, https://www.crcv.ucf.edu/data/UCF50.php, accessed on April 1, 2024.
[19] Tasnim, N. and Baek, J.H.Dynamic Edge Convolutional Neural Network for Skeleton-Based Human Action Recognition. Sensors, vol. 23, no. 2, pp. 778, 2023.
[20] Chen C., Jafari R., and Kehtarnavaz N.UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor. In2015 IEEE International conference on image processing (ICIP), pp. 168-172, IEEE, 2015.
[21] Li W., Zhang Z., and Liu Z.Action Recognition Based on a Bag of 3d Points. In2010 IEEE computer society conference on computer vision and pattern recognition-workshops, IEEE, pp. 9-14, 2010.
[22] Mathina Kani, M.A.J., Parvathy, M.S., Maajitha Banu, S., and Abdul Kareem, M.S. Classification of Skin Lesion Images using Modified Inception V3 Model with Transfer Learning and Augmentation Techniques. Journal of Intelligent & Fuzzy Systems, vol. 44, no. 3, pp. 4627-4641, 2023.
[23] Rotemberg V., Kurtansky N., Betz-Stablein, B., Caffery, L., Chousakos, E., Codella, N., Combalia, M., Dusza, S., Guitera, P., Gutman, D., and Halpern, A. A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas using Clinical Context. Scientific data, vol. 8, no. 1, pp. 34, 2021.
[24] Zhang Y., Ding K., Hui J., Liu S., Guo W., and Wang L.Skeleton-RGB Integrated Highly Similar Human Action Prediction in Human-Robot Collaborative Assembly. Robotics and Computer-Integrated Manufacturing, vol. 86, pp. 102659, 2024.
[25] Dasari P., Zhang L., Yu Y., Huang H., and Gao R.Human Action Recognition using Hybrid Deep Evolving Neural Networks. In2022 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1-8, 2022.
[26] Khan M.A., Javed K., Khan S.A., Saba T., Habib U., Khan J.A., and Abbasi A.A.Human Action Recognition using Fusion of Multiview and Deep Features: An Application to Video Surveillance. Multimedia tools and applications, vol. 83, no. 5, pp. 14885-14911, 2024.
[27] Anguita D., Ghio A., Oneto L., Parra X., and Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition using Smartphones. In Esann, vol. 3, pp. 3, 2013.
[28] Sun Z., Ke Q., Rahmani H., Bennamoun M., Wang G., and Liu J.Human Action Recognition from Various Data Modalities: A Review. IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 3, pp. 3200-3225, 2022.
[29] Mutegeki, R. and Han, D.S.A CNN-LSTM Approach to Human Activity Recognition. In2020 international conference on artificial intelligence in information and communication (ICAIIC), IEEE, pp. 362-366, 2020.
[30] Moya Rueda, F., Grzeszick, R., Fink, G.A., Feldhorst, S., and Ten Hompel, M. Convolutional Neural Networks for Human Activity Recognition using Body-Worn Sensors. In Informatics, MDPI, vol. 5, no. 2, pp. 26., 2018.
[31] Al-Qaness, M.A., Helmi, A.M., Dahou, A., and Elaziz, M.A. The Applications of Metaheuristics for Human Activity Recognition and Fall Detection using Wearable Sensors: A Comprehensive Analysis. Biosensors, vol. 12, no. 10, pp. 821, 2022.
[32] D’Angelo, G. and Palmieri, F. Enhancing COVID-19 Tracking Apps with Human Activity Recognition using a Deep Convolutional Neural Network and HAR-Images. Neural Computing and Applications, vol. 35, no. 19, pp. 13861-13877, 2023.
[33] Muhammad K., Ullah A., Imran A.S., Sajjad M., Kiran M.S., Sannino G., and de Albuquerque, V.H.C. Human Action Recognition using Attention Based LSTM Network with Dilated CNN Features. Future Generation Computer Systems, vol. 125, pp. 820-830, 2021.
[34] Wang Z.J., Turko R., Shaikh O., Park H., Das N., Hohman F., Kahng M., and Chau D.H.P. CNN Explainer: Learning Convolutional Neural Networks with Interactive Visualization. IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1396-1406, 2020.
[35] Zhang J., Qiao S., Lin Z., and Zhou Y.Human Activity Recognition Based on Residual Network. In IOP Conference Series: Earth and Environmental Science, IOP Publishing, vol. 693, no. 1, pp. 012041, 2021.
[36] He F., Liu T., and Tao D.Why Resnet Works? Residuals Generalize. IEEE transactions on neural networks and learning systems, vol. 31, no. 12, pp. 5349-5362, 2020.
[37] Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z.Rethinking the Inception Architecture for Computer Vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826, 2016.
[38] Pham K., Kim D., Park S., and Choi H.Ensemble Learning-Based Classification Models for Slope Stability Analysis. Catena, vol. 196, pp. 104886, 2021.
[39] Abdelrazik M.A., Zekry A., and Mohamed W.A.Efficient Hybrid Algorithm for Human Action Recognition. Journal of Image and Graphics, vol. 11, no. 1, pp. 72-81, 2023.
[40] Chong, M.J. and Forsyth, D.Effectively Unbiased Fid and Inception Score and Where to Find Them. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6070-6079, 2020.