Spoof-formerNet: The Face Anti Spoofing Identifier with a Two Stage High Resolution Vision Transformer (HR-ViT) Network

PDF (1330KB), PP.87-104

Views: 0 Downloads: 0

Author(s)

Mudunuru Suneel 1,* Tummala Ranga Babu 2

1. Department of Electronics and Communication Engineering, University College of Engineering, Acharya Nagarjuna University (ANU), Guntur, AP, India

2. Department of Electronics and Communication Engineering, RVR & JC College of Engineering, Guntur, AP, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2025.04.06

Received: 15 Mar. 2024 / Revised: 12 Jun. 2024 / Accepted: 20 May 2025 / Published: 8 Aug. 2025

Index Terms

Face Anti Spoofing (FAS), High Resolution Vision Transformer (HR-ViT), Token Embedding, Multi Head Self-Attention

Abstract

Face anti-spoofing (FAS) detection is essential for assuring the safety and dependability of facial identification systems. This study introduces the implementation of a new approach called Spoof-formerNet, which utilizes the high-resolution vision transformer (HR-ViT) system for detecting face anti-spoofing. The Vision Transformer (ViT) architecture has revealed remarkable execution in numerous computer vision applications, and we are now applying it to the intricate field of spoof detection. In order to distinguish between real faces and spoofing attempts, the Spoof-formerNet is engineered to detect minute details and subtle elements embedded in facial photos. We have conducted experimental research wherein the model is trained independently on color (RGB) and depth data in parallel using two streams of HR-ViT networks. Before applying to a classification head, the features from the two streams were concatenated. Spoof-formerNet is trained and tested using well-known benchmark datasets such as CelebA-Spoof, CASIA-SURF, WMCA, and MSU-MFSD, which are commonly used in the field of anti-face spoofing. The suggested model excels in performance and is cutting-edge in identifying genuine faces from spoofing assaults. We assess the model's efficacy by providing comprehensive findings, such as Area Under the Curve (AUC), Attack Presentation Classification Error Rate (APCER), Bona Fide Presentation Classification Error Rate (BPCER), Equal Error Rate (EER), and Average Classification Error Rate (ACER). The results of this work show how cascaded high-resolution vision transformer networks can be used to improve the safety of facial recognition approaches in real-world applications, in addition to advancing facial anti-spoofing technology. The Spoof-formerNet method for face anti-spoofing detection shows good results, with an average AUC of 99.22 and average APCER, BPCER, and ACER of 0.95, 0.66, and 0.81 correspondingly.

Cite This Paper

Mudunuru Suneel, Tummala Ranga Babu, "Spoof-formerNet: The Face Anti Spoofing Identifier with a Two Stage High Resolution Vision Transformer (HR-ViT) Network", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.4, pp. 87-104, 2025. DOI:10.5815/ijigsp.2025.04.06

Reference

[1]Li, J., Wang, Y., Tan, T., & Jain, A. K. (2004, August). Live face detection based on the analysis of fourier spectra. In Biometric technology for human identification (Vol. 5404, pp. 296-303). SPIE.
[2]Tan, X., Li, Y., Liu, J., & Jiang, L. (2010). Face liveness detection from a single image with sparse low rank bilinear discriminative model. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part VI 11 (pp. 504-517). Springer Berlin Heidelberg.
[3]Li, X., Komulainen, J., Zhao, G., Yuen, P. C., & Pietikäinen, M. (2016, December). Generalized face anti-spoofing by detecting pulse from face videos. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 4244-4249). IEEE.
[4]Chingovska, I., Anjos, A., & Marcel, S. (2012, September). On the effectiveness of local binary patterns in face anti-spoofing. In 2012 BIOSIG-proceedings of the international conference of biometrics special interest group (BIOSIG) (pp. 1-7). IEEE.
[5]Komulainen, J., Hadid, A., & Pietikäinen, M. (2013, September). Context based face anti-spoofing. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS) (pp. 1-8). IEEE.
[6]Yu, Z., Qin, Y., Li, X., Zhao, C., Lei, Z., & Zhao, G. (2022). Deep learning for face anti-spoofing: A survey. IEEE transactions on pattern analysis and machine intelligence, 45(5), 5609-5631.
[7]Meng, W., Wong, D. S., Furnell, S., & Zhou, J. (2014). Surveying the development of biometric user authentication on mobile phones. IEEE Communications Surveys & Tutorials, 17(3), 1268-1293.
[8]Khairnar, S., Gite, S., Kotecha, K., &Thepade, S. D. (2023). Face Liveness Detection Using Artificial Intelligence Techniques: A Systematic Literature Review and Future Directions. Big Data and Cognitive Computing, 7(1), 37.
[9]Yang, J., Xiao, S., Li, A., Lan, G., & Wang, H. (2021). Detecting fake images by identifying potential texture difference. Future Generation Computer Systems, 125, 127-135.
[10]Kollreider, K., Fronthaler, H., Faraj, M. I., &Bigun, J. (2007). Real-time face detection and motion analysis with application in “liveness” assessment. IEEE Transactions on Information Forensics and Security, 2(3), 548-558.
[11]Lakshminarayana, N. N., Narayan, N., Napp, N., Setlur, S., & Govindaraju, V. (2017, February). A discriminative spatio-temporal mapping of face for liveness detection. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1-7). IEEE.
[12]Schuckers, S. A. (2002). Spoofing and anti-spoofing measures. Information Security technical report, 7(4), 56-62.
[13]Kamble, M. R., Sailor, H. B., Patil, H. A., & Li, H. (2020). Advances in anti-spoofing: from the perspective of ASVspoof challenges. APSIPA Transactions on Signal and Information Processing, 9, e2.
[14]Juefei-Xu, F., Wang, R., Huang, Y., Guo, Q., Ma, L., & Liu, Y. (2022). Countering malicious deepfakes: Survey, battleground, and horizon. International journal of computer vision, 130(7), 1678-1734.
[15]Boulkenafet, Z., Komulainen, J., & Hadid, A. (2015, September). Face anti-spoofing based on color texture analysis. In 2015 IEEE international conference on image processing (ICIP) (pp. 2636-2640). IEEE.
[16]Galbally, J., Marcel, S., &Fierrez, J. (2013). Image quality assessment for fake biometric detection: Application to iris, fingerprint, and face recognition. IEEE transactions on image processing, 23(2), 710-724.
[17]Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... &Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
[18]Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., &Zagoruyko, S. (2020, August). End-to-end object detection with transformers. In European conference on computer vision (pp. 213-229). Cham: Springer International Publishing.
[19]Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., ... & Xiao, B. (2020). Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10), 3349-3364.
[20]Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5693-5703).
[21]Guo, S., Yang, Q., Xiang, S., Wang, P., & Wang, X. (2023). Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images. Remote Sensing, 15(9), 2293.
[22]Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., & Wang, J. (2021). Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10440-10450).
[23]Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y. H., ... & Pan, D. Z. (2022). Multi-scale high-resolution vision transformer for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12094-12103).
[24]Boulkenafet, Z., Komulainen, J., Feng, X., & Hadid, A. (2016). Scale space texture analysis for face anti-spoofing. 2016 International Conference on Biometrics (ICB), 1-6. https://doi.org/10.1109/ICB.2016.7550078
[25]Kim, I., Ahn, J., & Kim, D. (2016). Face spoofing detection with highlight removal effect and distortions. 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 004299-004304. https://doi.org/10.1109/SMC.2016.7844907.
[26]Beham, M., &Roomi, S. (2018). Anti-spoofing enabled face recognition based on aggregated local weighted gradient orientation. Signal, Image and Video Processing, 12, 531-538. https://doi.org/10.1007/s11760-017-1189-1
[27]Z. Boulkenafet et al. "Face anti-spoofing based on color texture analysis." 2015 IEEE International Conference on Image Processing (ICIP) (2015): 2636-2640. https://doi.org/10.1109/ICIP.2015.7351280.
[28]Shao, R., Lan, X., & Yuen, P. (2019). Joint Discriminative Learning of Deep Dynamic Textures for 3D Mask Face Anti-Spoofing. IEEE Transactions on Information Forensics and Security, 14, 923-938. https://doi.org/10.1109/TIFS.2018.2868230
[29]Pinto, A., Pedrini, H., Schwartz, W., & Rocha, A. (2015). Face Spoofing Detection Through Visual Codebooks of Spectral Temporal Cubes. IEEE Transactions on Image Processing, 24, 4726-4740. https://doi.org/10.1109/TIP.2015.2466088
[30]Seokjae Lim et al. "One-Class Learning Method Based on Live Correlation Loss for Face Anti-Spoofing." IEEE Access, 8 (2020): 201635-201648. https://doi.org/10.1109/ACCESS.2020.3035747.
[31]Wen, D., Han, H., & Jain, A. (2015). Face Spoof Detection With Image Distortion Analysis. IEEE Transactions on Information Forensics and Security, 10, 746-761. https://doi.org/10.1109/TIFS.2015.2400395
[32]de Freitas Pereira, T., Anjos, A., De Martino, J. M., & Marcel, S. (2013). LBP− TOP based countermeasure against face spoofing attacks. In Computer Vision-ACCV 2012 Workshops: ACCV 2012 International Workshops, Daejeon, Korea, November 5-6, 2012, Revised Selected Papers, Part I 11 (pp. 121-132). Springer Berlin Heidelberg.
[33]Patel, K., Han, H., & Jain, A. (2016). Secure Face Unlock: Spoof Detection on Smartphones. IEEE Transactions on Information Forensics and Security, 11, 2268-2283. https://doi.org/10.1109/TIFS.2016.2578288.
[34]Boulkenafet, Z., Komulainen, J., & Hadid, A. (2017). Face Antispoofing Using Speeded-Up Robust Features and Fisher Vector Encoding. IEEE Signal Processing Letters, 24, 141-145. https://doi.org/10.1109/LSP.2016.2630740.
[35]Komulainen, J., Hadid, A., & Pietikäinen, M. (2013). Context based face anti-spoofing. 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 1-8. https://doi.org/10.1109/BTAS.2013.6712690.
[36]Chen, H., Hu, G., Lei, Z., Chen, Y., Robertson, N., & Li, S. (2020). Attention-Based Two-Stream Convolutional Networks for Face Spoofing Detection. IEEE Transactions on Information Forensics and Security, 15, 578-593. https://doi.org/10.1109/TIFS.2019.2922241.
[37]Alotaibi, A., & Mahmood, A. (2017). Deep face liveness detection based on nonlinear diffusion using convolution neural network. Signal, Image and Video Processing, 11, 713-720. https://doi.org/10.1007/s11760-016-1014-2.
[38]Sun, W., Song, Y., Chen, C., Huang, J., &Kot, A. (2020). Face Spoofing Detection Based on Local Ternary Label Supervision in Fully Convolutional Networks. IEEE Transactions on Information Forensics and Security, 15, 3181-3196. https://doi.org/10.1109/TIFS.2020.2985530.
[39]Yang, D., Lai, J., & Mei, L. (2016). Deep Representations Based on Sparse Auto-Encoder Networks for Face Spoofing Detection. , 620-627. https://doi.org/10.1007/978-3-319-46654-5_68.
[40]Nagpal, C., & Dubey, S. (2018). A Performance Evaluation of Convolutional Neural Networks for Face Anti Spoofing. 2019 International Joint Conference on Neural Networks (IJCNN), 1-8. https://doi.org/10.1109/IJCNN.2019.8852422.
[41]Feng, H., Hong, Z., Yue, H., Chen, Y., Wang, K., Han, J., Liu, J., & Ding, E. (2020). Learning Generalized Spoof Cues for Face Anti-spoofing. ArXiv, abs/2005.03922.
[42]Yu, Z., Cai, R., Cui, Y., Liu, X., Hu, Y., &Kot, A. (2023). Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. arXiv preprint arXiv:2302.05744.
[43]George, A., & Marcel, S. (2023). On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing. 2021 IEEE International Joint Conference on Biometrics (IJCB), 1-8. https://doi.org/10.1109/IJCB52358.2021.9484333.
[44]Liu, A., Tan, Z., Yu, Z., Zhao, C., Wan, J., Lei, Y. L. Z., ... & Guo, G. (2023). FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing. IEEE Transactions on Information Forensics and Security.
[45]Liao, C. H., Chen, W. C., Liu, H. T., Yeh, Y. R., Hu, M. C., & Chen, C. S. (2023). Domain Invariant Vision Transformer Learning for Face Anti-spoofing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 6098-6107).
[46]Wang, Z., Wang, Q., Deng, W., & Guo, G. (2022). Learning Multi-Granularity Temporal Characteristics for Face Anti-Spoofing. IEEE Transactions on Information Forensics and Security, 17, 1254-1269. https://doi.org/10.1109/tifs.2022.3158062.
[47]Lee, Y., Kwak, Y., & Shin, J. (2023). Robust face anti-spoofing framework with Convolutional Vision Transformer. arXiv preprint arXiv:2307.12459.
[48]Zhang, Yuanhan, ZhenFei Yin, Yidong Li, Guojun Yin, Junjie Yan, Jing Shao, and Ziwei Liu. "Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations." In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pp. 70-85. Springer International Publishing, 2020.
[49]Zhang, Shifeng, Ajian Liu, Jun Wan, Yanyan Liang, Guodong Guo, Sergio Escalera, Hugo Jair Escalante, and Stan Z. Li. "Casia-surf: A large-scale multi-modal benchmark for face anti-spoofing." IEEE Transactions on Biometrics, Behavior, and Identity Science 2, no. 2 (2020): 182-193.
[50]Anjith George, Zohreh Mostaani, David Geissbühler, Olegs Nikisins, André Anjos, Sébastien Marcel, "Biometric Face Presentation Attack Detection with Multi-Channel Convolutional Neural Network", in IEEE Transactions on Information Forensics and Security, 2019.
[51]D. Wen, H. Han, and A. K. Jain, "Face Spoof Detection with Image Distortion Analysis", IEEE Transactions on Information Forensics and Security, Vol. 10, No. 4, pp.746-761, April 2015.
[52]Alshaikhli, Mays, et al. "Face-Fake-Net: The Deep Learning Method for Image Face Anti-Spoofing Detection: Paper ID 45." 2021 9th European Workshop on Visual Information Processing (EUVIP). IEEE, 2021.
[53]Belli, Davide, Debasmit Das, Bence Major, and Fatih Porikli. "A personalized benchmark for face anti-spoofing." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 338-348. 2022.
[54]Muhtasim, D.A., Pavel, M.I. and Tan, S.Y., 2022. A patch-based CNN built on the VGG-16 architecture for real-time facial liveness detection. Sustainability, 14(16), p.10024.
[55]Shi, Lei, Zhuo Zhou, and Zhenhua Guo. "Face anti-spoofing using spatial pyramid pooling." In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2126-2133. IEEE, 2021.
[56]Belli, Davide, Debasmit Das, Bence Major, and Fatih Porikli. "A personalized benchmark for face anti-spoofing." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 338-348. 2022.
[57]Zhang, Wentian, Haozhe Liu, Feng Liu, Raghavendra Ramachandra, and Christoph Busch. "Effective Presentation Attack Detection Driven by Face Related Task." In European Conference on Computer Vision, pp. 408-423. Cham: Springer Nature Switzerland, 2022.
[58]Alshaikhli, Mays, et al. "Face-Fake-Net: The Deep Learning Method for Image Face Anti-Spoofing Detection: Paper ID 45." 2021 9th European Workshop on Visual Information Processing (EUVIP). IEEE, 2021.
[59]Sanchez-S ´ anchez, M. A., Conde, C., G ´ omez-Ayll ´ on, B., ´ Ortega-DelCampo, D., Tsitiridis, A., Palacios-Alonso, D., Cabello, E. (2020). Convolutional Neural Network Approach for Multispectral Facial Presentation Attack Detection in Automated Border Control Systems. Entropy, 22(11), 1296.
[60]J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. CVPR, 2018, pp. 7132–7141.
[61]H. Li, W. Li, H. Cao, S. Wang, F. Huang, and A. C. Kot, “Unsupervised domain adaptation for face anti-spoofing,” IEEE Trans. Inf. Forensics Security, vol. 13, no. 7, pp. 1794–1809, Jul. 2018.
[62]S. Zhang, X. Wang, A. Liu, C. Zhao, J. Wan, S. Escalera, H. Shi, Z. Wang, and S. Z. Li, ‘‘A dataset and benchmark for large-scale multi-modal face anti-spoofing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 919–928. 
[63]T. Shen, Y. Huang, and Z. Tong, ‘‘FaceBagNet: Bag-of-local-features model for multi-modal face anti-spoofing,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019.
[64]Chen, Xudong, Shugong Xu, Qiaobin Ji, and Shan Cao. "A dataset and benchmark towards multi-modal face anti-spoofing under surveillance scenarios." IEEE Access 9 (2021): 28140-28155.
[65]Z. Yu et al., “Multi-modal face anti-spoofing based on central difference networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, 2020, pp. 2766–2774.
[66]G. Heusch, A. George, D. Geissbühler, Z. Mostaani, and S. Marcel, “Deep models and shortwave infrared information to detect face presentation attacks,” IEEE Trans. Biom., Behav., Ident. Sci., vol. 2, no. 4, pp. 399–409, Oct. 2020.
[67]George, Anjith, and Sébastien Marcel. "Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks." IEEE Transactions on Information Forensics and Security 16 (2020): 361-375.
[68]Liu, Ajian, and Yanyan Liang. "Ma-vit: Modality-agnostic vision transformers for face anti-spoofing." arXiv preprint arXiv:2304.07549 (2023).
[69]Yu, Z.; Zhao, C.; Wang, Z.; Qin, Y.; Su, Z.; Li, X.; Zhou, F.; Zhao, G. Searching central difference convolutional networks for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5295–5305.
[70]Liu, Y.; Stehouwer, J.; Jourabloo, A.; Liu, X. Deep tree learning for zero-shot face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4680–4689.
[71]Qin, Y.; Zhao, C.; Zhu, X.; Wang, Z.; Yu, Z.; Fu, T.; Zhou, F.; Shi, J.; Lei, Z. Learning meta model for zero-and few-shot face anti-spoofing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11916–11923.
[72]Yu, Z.; Li, X.; Niu, X.; Shi, J.; Zhao, G. Face anti-spoofing with human material perception. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 557–575.
[73]Benlamoudi, Azeddine, Salah Eddine Bekhouche, Maarouf Korichi, Khaled Bensid, Abdeldjalil Ouahabi, Abdenour Hadid, and Abdelmalik Taleb-Ahmed. "Face Presentation Attack Detection Using Deep Background Subtraction." Sensors 22, no. 10 (2022): 3760.