Depth-guided Hybrid Attention Swin Transformer for Physics-guided Self-supervised Image Dehazing

PDF (1809KB), PP.75-89

Views: 0 Downloads: 0

Author(s)

Rahul Vishnoi 1 Alka Verma 1 Vibhor Kumar Bhardwaj 1,*

1. Department of Electronics & Communication Engineering, Teerthanker Mahaveer University, Moradabad, Uttar Pradesh, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.01.06

Received: 24 Jul. 2025 / Revised: 20 Sep. 2025 / Accepted: 26 Dec. 2025 / Published: 8 Feb. 2026

Index Terms

Image Dehazing, Self-supervised, Depth Guidance, Transformer, Hybrid Attention

Abstract

Image dehazing is a critical preprocessing step in computer vision, enhancing visibility in degraded conditions. Conventional supervised methods often struggle with generalization and computational efficiency. This paper introduces a self-supervised image dehazing framework leveraging a depth-guided Swin Transformer with hybrid attention. The proposed hybrid attention explicitly integrates CNN-style channel and spatial attention with Swin Transformer window-based self-attention, enabling simultaneous local feature recalibration and global context aggregation. By integrating a pre-trained monocular depth estimation model and a Swin Transformer architecture with shifted window attention, our method efficiently models global context and preserves fine details. Here, depth is used as a relative structural prior rather than a metric quantity, enabling robust guidance without requiring haze-invariant depth estimation. Experimental results on synthetic and real-world benchmarks demonstrate superior performance, with a PSNR of 23.01 dB and SSIM of 0.879 on the RESIDE SOTS-indoor dataset, outperforming classical physics-based dehazing (DCP) and recent self-supervised approaches such as SLAD, achieving a PSNR gain of 2.52 dB over SLAD and 6.39 dB over DCP. Our approach also significantly improves object detection accuracy by 0.15 mAP@0.5 (+32.6%) under hazy conditions, and achieves near real-time inference (≈35 FPS at 256x256 resolution on a single GPU), confirming the practical utility of depth-guided features. Here, we show that our method achieves an SSIM of 0.879 on SOTS-Indoor, indicating strong structural and color fidelity for a self-supervised dehazing framework.

Cite This Paper

Rahul Vishnoi, Alka Verma, Vibhor Kumar Bhardwaj, "Depth-guided Hybrid Attention Swin Transformer for Physics-guided Self-supervised Image Dehazing", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.1, pp.75-89, 2026. DOI:10.5815/ijisa.2026.01.06

Reference

[1]K. He, J. Sun, and X. Tang. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell., 33(12):2341–2353, 2011.
[2]D. Berman, T. Treibitz, and S. Avidan. Non-local image dehazing. In CVPR, pages 1674–1682, 2016.
[3]W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, and M.H. Yang. Single image dehazing via multi-scale convolutional neural networks. In ECCV, pages 154–169, 2016.
[4]B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng. AOD-Net: All-in-one dehazing network. In ICCV, pages 4780–4788, 2017.
[5]H. Zhang and V. M. Patel. Densely connected pyramid dehazing network. In CVPR, pages 3194–3203, 2018.
[6]B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao. DehazeNet: An end-to-end system for single image haze removal. IEEE Trans. Image Process., 25(11):5187–5198, 2016.
[7]B. Li, Y. Gou, S. Gu, J.Z. Liu, J.T. Zhou, and X. Peng. You only look yourself: Unsupervised and untrained single image dehazing neural network. Int. J. Comput. Vis., 129(5):1754–1767, 2021.
[8]Z. Jiang, X. Cheng, Z. Niu, and L. Li. Enhancing bottleneck concept learning in image classification. Sensors, 25(8):2398, April 2025.
[9]Y. Leng and E.J. Lee. SLRBC: Self-supervised learning for red blood cell segmentation and classification with a biased contrastive loss. In Proc. SPIE Med. Imaging, 2025.
[10]I. Fatnassi, S. Khamekhem Jemni, and S. Ammar. St-Wid: Self-supervised transformer for writer identification in Arabic handwritten scripts. SSRN, April 2025.
[11]A. Bell-Navas, M. Villalba-Orero, and E. Lara-Pezzi. Heart failure prediction using modal decomposition and masked autoencoders for scarce echocardiography databases. arXiv preprint arXiv:2504.07606, 2025.
[12]J. Jiang, D. Sun, T. Wang, and Y. Pei. SCCS: Deep neural spectral clustering for self-supervised subcellular structure segmentation. In Proc. AAAI Conf. Artif. Intell., 39(4):32419–34574, 2025.
[13]S. Liang, J. Tang, G. Wang, and J. Ma. STEAMBOAT: Attention-based multiscale delineation of cellular interactions in tissues. bioRxiv, April 2025.
[14]H. Hu, Y. Xie, D. Lian, and K. Han. Modality-disentangled feature extraction via knowledge distillation in multimodal recommendation systems. IEEE Trans. Neural Netw. Learn. Syst., April 2025.
[15]K. Prakash and K. Nasir. Physics-guided machine learning is unlocking new capabilities in modeling complex systems. EngrXiv, April 2025.
[16]J. Zhuang et al. On the practice of deep hierarchical ensemble network for ad conversion rate prediction. arXiv preprint arXiv:2504.08169, 2025.
[17]X. Wang, G. Yang, T. Ye, and Y. Liu. Dehaze-RetinexGAN: Real-world image dehazing via Retinex-based generative adversarial network. In Proc. AAAI Conf. Artif. Intell., 2025.
[18]C. Chen et al. Contrastive self-supervised learning for unpaired image dehazing. IEEE Trans. Image Process., 32:5110–5123, 2023.
[19]M. Liu and L. Zhang. Self-supervised domain adaptive image dehazing via cycle-consistent GANs. In Proc. CVPR Workshops, 2024.
[20]Y. Zhou et al. D-Former: Self-supervised vision transformer for image dehazing. arXiv preprint arXiv:2403.11245, 2025.
[21]L. Ren et al. Joint dehazing and depth estimation via multi-task self-supervised learning. IEEE Access, 12:12133–12145, 2024.
[22]K. Zhang et al. Self-supervised knowledge distillation for low-light and hazy image enhancement. In Proc. ECCV, 2024.
[23]T. Hu et al. Frequency-aware self-supervised learning for robust image dehazing. Pattern Recognit., 139:109488, 2023.
[24]Y. Liang, S. Li, D. Cheng, W. Wang, D. Li, and J. Liang. Image dehazing via self-supervised depth guidance. Pattern Recognit., 145, 2025.
[25]K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Trans. Image Process., 26(7):3142–3155, 2017.
[26]O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas. DeblurGAN: Blind motion deblurring using conditional adversarial networks. In Proc. CVPR, 2018.
[27]A. Krull, T. Buchholz, and F. Jug. Noise2Void - Learning denoising from single noisy images. In Proc. CVPR, 2019.
[28]D. Ulyanov, A. Vedaldi, and V. Lempitsky. Deep image prior. In Proc. CVPR, 2018.
[29]A. Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. ICLR, 2021.
[30]S. Zhou, D. Chen, J. Pan, and J. Shi. Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration. In Proc. CVPR, 2024.
[31]S. Zhang, Q. Dong, and W. Mao. A unified accelerator for all-in-one image restoration based on prompt degradation learning. IEEE Trans. Pattern Anal. Mach. Intell., 2025.
[32]A. Kulkarni, S.S. Phutke, and S.K. Vipparthi. Multi-medium image enhancement with attentive deformable transformers. IEEE Trans. Image Process., 2024.
[33]N. Malothu and R. Jatoth. Opti-transforming visibility: Dehazing images with transformer based generative adversarial networks. SSRN Electron. J., 2023.
[34]C. Gao et al. Uformer: A transformer-based restoration framework for image degradation tasks. arXiv preprint arXiv:2106.03106, 2023.
[35]L. Wang, J. Zhang, and X. Tang. Dehamer: Dehazing via high-order feature aggregation in transformer encoder-decoder. In Proc. CVPR, 2023.
[36]Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. ICCV, pages 10012–10022, 2021.
[37]Y. Song, L. Wang, and X. He. Dehazing with shifted window transformers. IEEE Trans. Image Process., 32:4567–4578, 2023.
[38]J. Zhang, W. Xu, and H. Zhang. SwinDehazing: Transformer-based image dehazing via hierarchical feature learning. In Proc. CVPR, 2024.
[39]B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process., 28(1):492–505, 2019.
[40]D. Berman and S. Avidan. Air-light field estimation for hazy images. In Proc. ICCP, 2021.
[41]C. Godard, O.M. Aodha, and G.J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proc. CVPR, pages 6602–6611, 2017.
[42]R. Ranftl, A. Bochkovskiy, and V. Koltun. Vision transformers for dense prediction. In Proc. ICCV, pages 12179–12188, 2021.
[43]S. Woo, J. Park, J.-Y. Lee, and I.S. Kweon. CBAM: Convolutional block attention module. In Proc. ECCV, pages 3–19, 2018.
[44]J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In Proc. CVPR, pages 7132–7141, 2018.
[45]C.R. Qi, H. Su, K. Mo, and L.J. Guibas. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proc. CVPR, pages 652–660, 2017.
[46]T. Zhou, M. Brown, N. Snavely, and D.G. Lowe. Unsupervised learning of depth and ego-motion from video. In Proc. CVPR, pages 6612–6619, 2017.
[47]Y. Liang, B. Wang, W. Zuo, J. Liu, and W. Ren. Self-supervised learning and adaptation for single image dehazing. In Proc. IJCAI, pages 1137–1143, 2022.
[48]Y. Shao, L. Li, W. Ren, C. Gao, and N. Sang. Domain adaptation for image dehazing. In Proc. CVPR, pages 2808–2817, 2020.
[49]Y. Liu, L. Zhu, S. Pei, H. Fu, J. Qin, Q. Zhang, L. Wan, and W. Feng. From synthetic to real: Image dehazing collaborating with unlabeled real data. In Proc. ACM Int. Conf. Multimedia, pages 50–58, 2021.
[50]C.O. Ancuti, C. Ancuti, R. Timofte, and C. De Vleeschouwer. O-haze: a dehazing benchmark with real hazy and haze-free outdoor images. In Proc. CVPR Workshops, pages 754–762, 2018.
[51]S. Anwar, C. Li, and F. Porikli. Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images. In Proc. ICIP, pages 1014–1018, 2019.
[52]C. Li, J. Guo, S. Anwar, and F. Porikli. Self-supervised learning and adaptation for single image dehazing. In Proc. IJCAI, pages 1115–1121, 2022.
[53]B. Li, Y. Gou, J.Z. Liu, H. Zhu, J.T. Zhou, and X. Peng. Zero-shot image dehazing. IEEE Trans. Image Process., 29:8457–8466, 2020.