IJISA Vol. 17, No. 4, 8 Aug. 2025
Cover page and Table of Contents: PDF (size: 674KB)
PDF (674KB), PP.40-49
Views: 0 Downloads: 0
Image Recognition, Computer Vision
This study investigates the enhancement of the YOLOv5 model for price tag detection in retail environments, aiming to improve both accuracy and robustness. The research utilizes the "Price Tag Detection" dataset from SOVAR, which contains 1,073 annotated images covering four classes: price tags, labels, prices, and products and is split into training, validation, and test sets with extensive preprocessing and augmentation such as resizing, rotation, color adjustments, blur, noise, and bounding box transformations. Several modifications to the YOLOv5 architecture were proposed, including advanced image augmentation techniques to simulate real-world variations in lighting and noise, enhanced anchor box optimization through K-means clustering on the dataset annotations to better fit typical price tag shapes, and the integration of the Convolutional Block Attention Module (CBAM) to enable the model to selectively focus on relevant spatial and channel-wise features. The combined application of these enhancements resulted in a substantial improvement, with the model achieving a mean Average Precision (mAP) of 96.8% at IoU 0.5 compared to the baseline YOLOv5's 92.5%. The attention mechanism and optimized anchor boxes notably improved detection of small, partially occluded, and diverse price tags, highlighting the effectiveness of combining data-driven augmentation, architectural tuning, and attention mechanisms to address the challenges posed by cluttered and dynamic retail scenes.
Anatolii Ivanov, Viktoriia Onyshchenko, "Robust Price Tag Recognition Using Optimized Detection Pipelines", International Journal of Intelligent Systems and Applications(IJISA), Vol.17, No.4, pp.40-49, 2025. DOI:10.5815/ijisa.2025.04.04
[1]Aliev M A, Bocharov D A, Kunina I A, et al. A low computational approach for price tag recognition. arXiv preprint arXiv:1912.01923, 2019.
[2]Laptev P, Litovkin S, Davydenko S, et al. Neural network-based price tag data analysis. Future Internet, 2022, 14(3): 88.
[3]Turan M, Peker M, Özkan H et al. Development of a price tag detection system on mobile devices using deep learning. Orclever Proceedings of Research and Development, 2022, 1(1): 178–187.
[4]Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 1.
[5]Everingham M, Van Gool L, Williams C K I et al. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338.
[6]Girshick R, Donahue J, Darrell T et al.. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524, 2013.
[7]Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016.
[8]Ren S, He K, Girshick R et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 2015, 28: 91–99.
[9]Terven J, Córdova-Esparza D M, Romero-González J A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning and Knowledge Extraction, 2023, 5(4): 1680–1716.
[10]Everingham M, Van Gool L, Williams C K I et al. The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2): 303–338.
[11]Lin T Y, Maire M, Belongie S et al. Microsoft COCO: Common objects in context. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham, 2014: 740–755.
[12]Wu C, Lu H, Yang D et al. An improved method MSS-YOLOv5 for object detection with balancing speed-accuracy. Frontiers in Physics, 2023, 10: 1101923.
[13]Zhao B, Song R. Enhancing two-stage object detection models via data-driven anchor box optimization in UAV-based maritime SAR. Scientific Reports, 2024, 14: 4765.
[14]Agac S, Incel O D. On the use of a convolutional block attention module in deep learning-based human activity recognition with motion sensors. Diagnostics, 2023, 13(11): 1861.
[15]Jiang P, Ergu D, Liu F et al. A review of YOLO algorithm developments. Procedia Computer Science, 2022, 199: 1066–1073.
[16]Zand M, Etemad A, Greenspan M. ObjectBox: From centers to boxes for anchor-free object detection. Computer Vision – ECCV 2022. Lecture Notes in Computer Science, vol. 13670. Springer, Cham, 2022: 390–406.
[17]Wang W, Tan X, Zhang P et al. A CBAM based multiscale transformer fusion approach for remote sensing image change detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15: 6817–6825.
[18]Wang B. A parallel implementation of computing mean average precision. arXiv preprint arXiv:2206.09504, 2022.