Leveraging Deep Learning Approach for the Detection of Human Activities from Video Sequences

PDF (1101KB), PP.77-89

Views: 0 Downloads: 0

Author(s)

Preethi Salian K. 1 Karthik K. 2,*

1. Department of Information Science and Engineering NMAM Institute of Technology, NITTE (Deemed to be University), Nitte, Karnataka, India

2. School of Computer Science and Engineering Vellore Institute of Technology, Vellore, TamilNadu, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2025.06.05

Received: 9 Oct. 2024 / Revised: 26 Mar. 2025 / Accepted: 26 Jul. 2025 / Published: 8 Dec. 2025

Index Terms

Categorization, CNN-LSTM, Deep Learning, Detection, Human Activities, Stochastic Gradient Descent, Video Sequence

Abstract

Using deep learning approaches, recognizing human actions from video sequences by automatically deriving significant representations has demonstrated effective results from unprocessed video information. Artificial intelligence (AI) systems, including monitoring, automation, and human-computer interface, have become crucial for security and human behaviour analysis. For the visual depiction of video clips during the training phase, the existing action identification algorithms mostly use pre-trained weights of various AI designs, which impact the characteristics discrepancies and perseverance, including the separation among the visual and temporal indicators. The research proposes a 3-dimensional Convolutional Neural Network and Long Short-Term Memory (3D-CNN-LSTM) network that strategically concentrates on useful information in the input frame to recognize the various human behaviours in the video frames to overcome this problem. The process utilizes stochastic gradient descent (SGD) optimization to identify the model parameters that best match the expected and observed outcomes. The proposed framework is trained, validated, and tested using publicly accessible UCF11 benchmark dataset. According to the experimental findings of this work, the accuracy rate was 93.72%, which is 2.42% higher compared to the state-of-the-art previous best result. When compared to several other relevant techniques that are already in use, the suggested approach achieved outstanding performance in terms of accuracy. 

Cite This Paper

Preethi Salian K., Karthik K., "Leveraging Deep Learning Approach for the Detection of Human Activities from Video Sequences", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.6, pp. 77-89, 2025. DOI:10.5815/ijigsp.2025.06.05

Reference

[1]C. I. Patel, D. Labana, S. Pandya, K. Modi, H. Ghayvat, and M. Awais, “Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences,” Sensors, vol. 20, no. 24, p. 7299, 2020, doi: https://doi.org/10.3390/s20247299.
[2]Qiu S, Zhao H, Jiang N, Wang Z, Liu L, An Y, Zhao H, Miao X, Liu R, Fortino G, “Multi-sensor information fusion based on machine learning for real applications in human activity recognition: State-of-the-art and research challenges,” Inf. Fusion, vol. 80, pp. 241–265, 2022, doi: https://doi.org/10.1016/j.inffus.2021.11.006.
[3]L. Martínez-Villaseñor and H. Ponce, “A concise review on sensor signal acquisition and transformation applied to human activity recognition and human–robot interaction,” Int. J. Distrib. Sens. Netw., vol. 15, no. 6, p. 1550147719853987, 2019, doi: https://doi.org/10.1177/1550147719853987.
[4]A. Kushwaha, A. Khare, and P. Srivastava, “On integration of multiple features for human activity recognition in video sequences,” Multimed. Tools Appl., vol. 80, pp. 32511–32538, 2021, doi: https://doi.org/10.1007/s11042-021-11207-1.
[5]R. Singh, A. K. S. Kushwaha, and R. Srivastava, “Multi-view recognition system for human activity based on multiple features for video surveillance system,” Multimed. Tools Appl., vol. 78, pp. 17165–17196, 2019, doi: https://doi.org/10.1007/s11042-018-7108-9.
[6]Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS, “A comprehensive survey of vision-based human action recognition methods,” Sensors, vol. 19, no. 5, p. 1005, 2019, doi: https://doi.org/10.3390/s19051005.
[7]C. N. Phyo, T. T. Zin, and P. Tin, “Deep learning for recognizing human activities using motions of skeletal joints,” IEEE Trans. Consum. Electron., vol. 65, no. 2, pp. 243–252, 2019.
[8]A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, “Action recognition in video sequences using deep bi-directional LSTM with CNN features,” IEEE Access, vol. 6, pp. 1155–1166, 2017, doi: https://doi.org/10.1109/ACCESS.2017.2778011.
[9]F. Serpush and M. Rezaei, “Complex human action recognition using a hierarchical feature reduction and deep learning-based method,” SN Comput. Sci., vol. 2, pp. 1–15, 2021, doi: https://doi.org/10.1007/s42979-021-00484-0.
[10]C. I. Patel, S. Garg, T. Zaveri, A. Banerjee, and R. Patel, “Human action recognition using fusion of features for unconstrained video sequences,” Comput. Electr. Eng., vol. 70, pp. 284–301, 2018, doi: https://doi.org/10.1016/j.compeleceng.2016.06.004.
[11]L. M. Dang, K. Min, H. Wang, M. J. Piran, C. H. Lee, and H. Moon, “Sensor-based and vision-based human activity recognition: A comprehensive survey,” Pattern Recognit., vol. 108, p. 107561, 2020, doi: https://doi.org/10.1016/j.patcog.2020.107561.
[12]J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, “Deep learning for sensor-based activity recognition: A survey,” Pattern Recognit. Lett., vol. 119, pp. 3–11, 2019, doi: https://doi.org/10.1016/j.patrec.2018.02.010.
[13]E. Fridriksdottir and A. G. Bonomi, “Accelerometer-based human activity recognition for patient monitoring using a deep neural network,” Sensors, vol. 20, no. 22, p. 6424, 2020, doi: https://doi.org/10.3390/s20226424.
[14]E. De-La-Hoz-Franco, P. Ariza-Colpas, J. M. Quero, and M. Espinilla, “Sensor-based datasets for human activity recognition–a systematic review of literature,” IEEE Access, vol. 6, pp. 59192–59210, 2018, doi: https://doi.org/10.1109/ACCESS.2018.2873502.
[15]A. Manaf and S. Singh, “Computer vision-based survey on human activity recognition system, challenges and applications,” in 2021 3rd International Conference on Signal Processing and Communication (ICPSC), IEEE, 2021, pp. 110–114. doi: https://doi.org/10.1109/ICSPC51351.2021.9451736.
[16]T. Alhersh, H. Stuckenschmidt, A. U. Rehman, and S. B. Belhaouari, “Learning human activity from visual data using deep learning,” IEEE Access, vol. 9, pp. 106245–106253, 2021, doi: https://doi.org/10.1109/ACCESS.2021.3099567.
[17]A. Lentzas and D. Vrakas, “Non-intrusive human activity recognition and abnormal behavior detection on elderly people: A review,” Artif. Intell. Rev., vol. 53, no. 3, pp. 1975–2021, 2020, doi: https://doi.org/10.1007/s10462-019-09724-5.
[18]Arshad M, Jaskani FH, Sabri MA, Ashraf F, Farhan M, Sadiq M, Raza H., “Hybrid machine learning techniques to detect real time human activity using UCI dataset,” EAI Endorsed Trans. Internet Things, vol. 7, no. 26, pp. e1–e1, 2021, doi: https://doi.org/10.4108/eai.26-5-2021.170006.
[19]M. Abdel-Basset, H. Hawash, R. K. Chakrabortty, M. Ryan, M. Elhoseny, and H. Song, “ST-DeepHAR: Deep learning model for human activity recognition in IoHT applications,” IEEE Internet Things J., vol. 8, no. 6, pp. 4969–4979, 2020, doi: https://doi.org/10.1109/JIOT.2020.3033430.
[20]R. R. Varior, M. Haloi, and G. Wang, “Gated siamese convolutional neural network architecture for human re-identification,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, Springer, 2016, pp. 791–808. doi: https://doi.org/10.1007/978-3-319-46484-8_48.
[21]Ehatisham-Ul-Haq, Muhammad, Ali Javed, Muhammad Awais Azam, Hafiz MA Malik, Aun Irtaza, Ik Hyun Lee, and Muhammad Tariq Mahmood, “Robust human activity recognition using multimodal feature-level fusion,” IEEE Access, vol. 7, pp. 60736–60751, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2913393.
[22]O. Elharrouss, N. Almaadeed, S. Al-Maadeed, A. Bouridane, and A. Beghdadi, “A combined multiple action recognition and summarization for surveillance video sequences,” Appl. Intell., vol. 51, pp. 690–712, 2021, doi: https://doi.org/10.1007/s10489-020-01823-z.
[23]M. Sharif, M. A. Khan, F. Zahid, J. H. Shah, and T. Akram, “Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection,” Pattern Anal. Appl., vol. 23, pp. 281–294, 2020, doi: https://doi.org/10.1007/s10044-019-00789-0.
[24]A. Nadeem, A. Jalal, and K. Kim, “Human actions tracking and recognition based on body parts detection via Artificial neural network,” in 2020 3rd International conference on advancements in computational sciences (ICACS), IEEE, 2020, pp. 1–6. doi: https://doi.org/10.1109/ICACS47775.2020.9055951
[25]F. Mehmood, E. Chen, M. A. Akbar, and A. A. Alsanad, “Human action recognition of spatiotemporal parameters for skeleton sequences using MTLN feature learning framework,” Electronics, vol. 10, no. 21, p. 2708, 2021, doi: https://doi.org/10.3390/electronics10212708.
[26]C. Dhiman and D. K. Vishwakarma, “View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics,” IEEE Trans. Image Process., vol. 29, pp. 3835–3844, 2020, doi: https://doi.org/10.1109/TIP.2020.2965299.
[27]M. A. Khan, M. Sharif, T. Akram, M. Raza, T. Saba, and A. Rehman, “Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition,” Appl. Soft Comput., vol. 87, p. 105986, 2020, doi: https://doi.org/10.1016/j.asoc.2019.105986.
[28]M. Gnouma, A. Ladjailia, R. Ejbali, and M. Zaied, “Stacked sparse autoencoder and history of binary motion image for human activity recognition,” Multimed. Tools Appl., vol. 78, no. 2, pp. 2157–2179, 2019, doi: https://doi.org/10.1007/s11042-018-6273-1.
[29]Yadav, G.K., Shukla, P., Sethfi, A., 2016. Action recognition using interest points capturing differential motion information, in: Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, IEEE. pp. 1881–1885. 
[30]J. Cho, M. Lee, H. J. Chang, and S. Oh. Robust action recognition using local motion and group sparsity. Pattern Recognition, 47(5):1813–1825, 2014
[31]S. Sharma, R. Kiros, and R. Salakhutdinov. Action recognition using visual attention. CoRR, abs/1511.04119, 2015.
[32]H. Wang, A. KlŁser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, pages 3169– 3176. IEEE Computer Society, 2011.
[33]A. Ferrari, D. Micucci, M. Mobilio, and P. Napoletano, “On the personalization of classification models for human activity recognition,” IEEE Access, vol. 8, pp. 32066–32079, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2973425.
[34]Abdul, Majid and Wang, Professor Yulin, Harcnn: A Union of 3d Convolutional Neural and Recurrent Neural Network for Human Action Recognition. 2023, Available at SSRN: https://ssrn.com/abstract=4654397 or http://dx.doi.org/10.2139/ssrn.4654397.
[35]Reddy, G.V., Deepika, K., Malliga, L., Hemanand, D., Senthilkumar, C., Gopalakrishnan, S. and Farhaoui, Y., Human action recognition using difference of Gaussian and difference of wavelet. Big Data Mining and Analytics, 6(3), pp.336-346. 2023.
[36]M. Ravanbakhsh, H. Mousavi, M. Rastegari, V. Murino, and L. S. Davis. Action recognition with image based CNN features. CoRR, abs/1512.03980, 2015.
[37]Virginia F Mota, Jessica Souza, Arnaldo de A Araujo, Marcelo Bernardes Vieira, “Combining orientation tensors for human action recognition,” in Graphics, Patterns and Images (SIBGRAPI), 2013 26th SIBGRAPI-Conference on. IEEE, 2013, pp. 328–333
[38]Nazli Ikizler-Cinbis and Stan Sclaroff, “Object, scene and actions: Combining multiple features for human action recognition,” Computer Vision–ECCV 2010, pp. 494–507, 2010.