Anomaly Detection in Crowd Video Using Different Versions of YOLOv8

PDF (537KB), PP.100-114

Views: 0 Downloads: 0

Author(s)

Punith Kumar M. B. 1 Shrikanth C. R. 2

1. Professor, Department of ECE, PES College of Engineering, Mandya, Karnataka, India

2. Department of ECE, PES College of Engineering, Mandya, Karnataka, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijem.2026.03.07

Received: 4 Jan. 2026 / Revised: 26 Feb. 2026 / Accepted: 12 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

YOLOv8, Anomaly Detection, Crowd Video Analysis, 3D CNN (3D Convolutional Neural Network), Surveillance Systems, Real-Time Object Detection, Deep SORT Tracking, and Spatio-Temporal Modelling

Abstract

This paper focuses on real-time anomaly detection in surveillance video using YOLOv8, the latest in the YOLO object detection series, integrated with spatio-temporal analysis. The system aims to detect abnormal behavior in crowded environments by combining spatial object detection with temporal activity analysis. YOLOv8 is used to detect and track individuals in video frames, while a 3D Convolutional Neural Network (3D CNN) processes sequences of frames to identify behavioral anomalies based on movement patterns. Three variants of YOLOv8—Nano (n), Small (s), and Medium (m)—are evaluated for performance trade-offs in accuracy, processing speed (FPS), and latency. Results show YOLOv8n offers the best real-time performance, while YOLOv8m provides higher accuracy at the cost of increased latency. The system uses the UCF-Crime dataset for training and testing, and metrics such as accuracy, FPS, and latency are used for evaluation. The modular pipeline supports scalability and real-time deployment, with visual outputs aiding interpretation. By integrating object detection with spatio-temporal modelling, the system effectively identifies anomalies such as loitering or sudden movements. Future work includes refining detection accuracy using labelled anomalies and exploring advanced models like Transformers for improved temporal understanding. The significance of this research lies in its ability to combine lightweight real-time object detection with effective temporal behavior modeling within a scalable and modular architecture. The proposed framework contributes to the advancement of intelligent surveillance systems by improving anomaly detection reliability while maintaining computational efficiency suitable for deployment in smart cities, public safety monitoring, and edge-based surveillance applications.

Cite This Paper

Punith Kumar M. B., Shrikanth C. R., "Anomaly Detection in Crowd Video Using Different Versions of YOLOv8", International Journal of Engineering and Manufacturing (IJEM), Vol.16, No.3, pp.100-114, 2026. DOI:10.5815/ijem.2026.03.07

Reference

[1]T. Sharma, B. Debaque, N. Duclos, A. Chehri, B. Kinder, and P. Fortier, “Deep learning-based object detection and scene perception under bad weather conditions,” Electronics, vol. 11, no. 4, p. 563, 2022, doi: 10.3390/electronics11040563.
[2]R. Chandrakar, R. Raja, R. Miri, U. Sinha, A. K. S. Kushwaha, and H. Raja, “Enhanced moving object detection and object tracking for traffic surveillance using RBF-FDINN and CBF algorithm,” Expert Systems with Applications, vol. 191, p. 116306, 2022, doi: 10.1016/j.eswa.2021.116306.
[3]R. Kalsotra and S. Arora, “Background subtraction for moving object detection: Explorations of recent developments and challenges,” The Visual Computer, vol. 38, no. 12, pp. 4151–4178, 2022, doi: 10.1007/s00371-021-02370-2.
[4]S. M. Patil, C. M. Raut, A. P. Pande, A. R. Yeruva, and H. Morwani, “An efficient approach for object detection using deep learning,” Journal of Pharmaceutical Negative Results, pp. 563–572, 2022.
[5]K. Bjerge, H. M. Mann, and T. T. Høye, “Real-time insect tracking and monitoring with computer vision and deep learning,” Remote Sensing in Ecology and Conservation, vol. 8, no. 3, pp. 315–327, 2022, doi: 10.1002/rse2.245.
[6]T. S. Doan, T. K. T. Nguyen, and T. A. Vo, “Weapon detection with YOLO model version 5, 7, 8,” 2023.
[7]K. Boudjit and N. Ramzan, “Human detection based on deep learning YOLO-v2 for real-time UAV applications,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 34, no. 3, pp. 527–544, 2022, doi: 10.1080/0952813X.2020.1867022.
[8]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
[9]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
[10]P. Zhou, G. Liu, J. Wang, Q. Weng, K. Zhang, and Z. Zhou, “Lightweight unmanned aerial vehicle video object detection based on spatial-temporal correlation,” International Journal of Communication Systems, vol. 35, no. 17, p. e5334, 2022, doi: 10.1002/dac.5334.
[11]F. M. Talaat and H. ZainEldin, “An improved fire detection approach based on YOLOv8 for smart cities,” Neural Computing and Applications, vol. 35, no. 28, pp. 20939–20954, 2023, doi: 10.1007/s00521-023-08918-4.
[12]H. Zhao, H. Zhang, and Y. Zhao, “YOLOv7-SEA: Object detection of maritime UAV images based on improved YOLOv7,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 233–238, doi: 10.1109/WACVW58289.2023.00030.
[13]S. Ansith and A. A. Bini, “Land use classification of high-resolution remote sensing images using an encoder-based modified GAN architecture,” Displays, vol. 74, Art. no. 102229, 2022, doi: 10.1016/j.displa.2022.102229.
[14]L. Scheibenreif, J. Hanna, M. Mommert, and D. Borth, “Self-supervised vision transformers for land-cover segmentation and classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 1421–1430, doi: 10.1109/CVPRW56347.2022.00148.
[15]B. Xiao, J. Liu, J. Jiao, Y. Li, X. Liu, and W. Zhu, “Modeling dynamic land use changes in the eastern portion of the Hexi Corridor, China by CNN-GRU hybrid model,” GIScience & Remote Sensing, vol. 59, no. 1, pp. 501–519, 2022, doi: 10.1080/15481603.2022.2037888.
[16]M. Yu, H. Xu, F. Zhou, S. Xu, and H. Yin, “A deep-learning-based multimodal data fusion framework for urban region function recognition,” ISPRS International Journal of Geo-Information, vol. 12, no. 12, Art. no. 468, 2023, doi: 10.3390/ijgi12120468.
[17]W. Li et al., “Aligning semantic distribution in fusing optical and SAR images for land use classification,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 199, pp. 272–288, 2023, doi: 10.1016/j.isprsjprs.2023.04.008.
[18]S. Ouyang, S. Du, X. Zhang, S. Du, and L. Bai, “MDFF: A method for fine-grained UFZ mapping with multimodal geographic data and deep network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 9951–9966, 2023, doi: 10.1109/JSTARS.2023.3326160.
[19]C. Su, X. Hu, Q. Meng, L. Zhang, W. Shi, and M. Zhao, “A multimodal fusion framework for urban scene understanding and functional identification using geospatial data,” International Journal of Applied Earth Observation and Geoinformation, vol. 127, Art. no. 103696, 2024, doi: 10.1016/j.jag.2024.103696.
[20]X. Yan et al., “A multimodal data fusion model for accurate and interpretable urban land use mapping with uncertainty analysis,” International Journal of Applied Earth Observation and Geoinformation, vol. 129, Art. no. 103805, 2024, doi: 10.1016/j.jag.2024.103805.
[21]V. Reddy and P. Kumar M. B., “Cotton plant disease detection using image processing and deep learning techniques: A survey,” in Proceedings of the International Conference on Electronics and Telecommunication for Real Time Applications, 2023, pp. 160–163, ISBN: 978-93-5620-485-0.
[22]P. Kumar M. B. and P. S. Puttaswamy, “A shot boundary detection method for news video based human skin region (face) detection,” IJSRD - International Journal for Scientific Research & Development, vol. 1, no. 7, pp. 1385–1387, 2013, ISSN: 2321-0613.