Combining Multi-Feature Regions for Fine-Grained Image Recognition

Full Text (PDF, 828KB), PP.15-25

Views: 0 Downloads: 0


Sun Fayou 1,* Hea Choon Ngo 1 Yong Wee Sek 1

1. Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

* Corresponding author.


Received: 4 Oct. 2021 / Revised: 6 Nov. 2021 / Accepted: 12 Dec. 2021 / Published: 8 Feb. 2022

Index Terms

MRA-CNN, reinforce significant features, feature scale dependent, multi-feature regions.


Fine-grained visual classification(FGVC) is challenging task duo to the subtle discriminative features.Recently, RA-CNN selects a single feature region of the image, and recursively learns the discriminative features. However, RA-CNN abandons most of feature regions, which is not only the inefficient but aslo ineffective.To address above issues,we design a noval fine-grained visual recognition model MRA-CNN,which associates multi-feature regions.To improve the feature representation,attention blocks are integrated into the backbone to reinforce significant features;To improve the classification accuracy, we design the feature scale dependent(FSD) algorithm to select the optimal outputs as the classifier inputs;To avoid missing features, we adopt the k-means algorithm to select multiple feature regions.We demonstrate the value of MRA-CNN by expensive experiments on three popular fine-grained benchmarks:CUB-200-2011,Cars196 and Aircrafts100 where we achieve state-of-the-art performance.Our codes can be found at

Cite This Paper

Sun Fayou, Hea Choon Ngo, Yong Wee Sek, " Combining Multi-Feature Regions for Fine-Grained Image Recognition", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.1, pp. 15-25, 2022. DOI: 10.5815/ijigsp.2022.01.02


[1] Chang Pengfei, Duan Yunlong. “Application of Faster R-CNN Model in Aircraft Target Detection in Remote Sensing Image [J],” Radio Engineering, 2019, 49(10): 925-929.

[2] Wah C,Branson S,Welinder P,et al.The Caltech-UCSD birds-200 ( 2011 dataset)[R]. Computation& Neural Systems Technical Report,CNS-TR-2011-001,California Institute of Technology,Pasadena,CA,2011

[3] Krause J,Stark M,Jia D,et al.3D object representations for fine-grained categorization[C]∥IEEE International Conference on Computer Vision Workshops,2013: 554-561

[4] Maji S,Rahtu E,Kannala J,et al. Fine-grained visual classification of aircraft [J].arXiv Preprint,2013,arXiv: 1306. 5151

[5] Zhang N, Donahue J, Girshick R., & Darrell, T. “Part-Based R-CNNs for Fine-Grained Category Detection,” In European Conference on Computer Vision, 2014, pp. 834-849.

[6] Tsung-Yu Lin, Aruni Roy Chowdhury, and Subhransu Maji. “Bilinear CNN models for fine-grained visual recognition,” In Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1449-1457.

[7] Fu J, Zheng H, Mei T. “Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition,” 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3.DOI: 10.1109 /CVPR.2017.476.

[8] Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Int. Conf. on Computer Vision(2017).

[9] XIUSHEN W,CHENWEI X,JIANXIN W, et al.Mask-CNN:Localizing parts and selecting descriptors for fine-grained bird species categorization[ J]. Pattern Recogni-tion,2018,76.

[10] ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-basedR-CNNs for fine-grained cat egory detection[C]//Euro-pean Conference on Computer Vision. Springer Interna-tional Publishing,2014:834 -849.

[11] BRANSON s,VAN HORN G,BELONGIE S,et al. Birdspecies categorization using pose normalized deep convo-lutional nets[J].2014.

[12] Hu Z W, Yang H, Huang J., & Xie, Q. “Fine-grained tomato disease recognition based on attention residual mechanism,” Journal of South China Agricultural University, 2019,40 (6), 124-132.

[13] Huo Y H, Xu Z J, “Photoelectric ship target identification method based on improved RA-CNN,” Journal of Shanghai Maritime University, 2019, (3), 38-43.

[14] Russakovsky, O., Deng, J., Su, H., et al. “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, 2015, 115(3), 211-252.

[15] Yang, F., Choi, W., Lin, Y. “Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, 2016, pp. 2129-2137.

[16] Xiong C Z, Jiang J. “Research on fine-grained classification algorithm of multi-scale regional features,” Journal of Zhengzhou University (Natural Science Edition), 2019, 51(3), 55-60.

[17] Qiao D, Liu G, Yang Z J,et al. “Ship target recognition based on transfer learning,” Application Research of Computers, 2020,37(1): 324-325+328.

[18] Zhang, L., Gan, C.,Hu, Y. “Ship detection algorithm research on high resolution optical remote sensing image,” Computer Engineering and Applications, 2017, 53(9), 184-189.

[19] Zhang, Z. Y, Jiao, S. H. “Infrared ship target detection method based on multiple feature fusion,” Infrared and Laser Engineering, 2015, 44(1), 29-34.

[20] Liu, X., Song, Y. “Classification of ship based on multi feature fusion,” Ship Science and Technology, 2016, 38(14), pp. 88-90.

[21] Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., & Zhang, Z. “The application of two-level attention models in deep convolutional neural network for fine-grained image classification,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 842–850.

[22] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. CBAM: Convolutional Block Attention Module. In The European Conference on Computer Vision (ECCV), September 2018.

[23] Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, and Zhi-Hua Zhou. Selective convolutional descriptor aggregation for fine-grained image retrieval. TIP, 26(6):2868–2881, 2017.

[24] Zhang, N., Donahue, J., Girshick, R., & Darrell, T. “Part-based R-CNNs for fine-grained category detection,” In European Conference on Computer Vision, 2014, pp. 834–849.

[25] Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. “Diversified visual attention networks for fine-grained object classification,” IEEE Transactions on Multimedia, 2017, 19(6), 1245-1256.

[26] Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. Learning multi-attention convolutional neural network for fine-grained image recognition. In ICCV, pages 5209–5217.2017. 1, 2, 3, 6, 7.

[27] Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition.abs/1409.1556.