MambaResp-KAN: A State Space Model with Kolmogorov–Arnold Networks and Diffusion-Based Augmentation for Explainable Respiratory Disease Classification

PDF (2157KB), PP.189-208

Views: 0 Downloads: 0

Author(s)

Mohammed Tawfik 1,*

1. Department of Cybers security, Faculty of Information Technology, Ajloun National University, P.O. 43, Ajloun-26810, Jordan

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2026.03.10

Received: 26 Feb. 2026 / Revised: 31 Mar. 2026 / Accepted: 13 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

Respiratory disease classification, State space models, Mamba, Kolmogorov–Arnold Networks, B-spline activations, Diffusion augmentation, Explainable AI, Multi-modal fusion, WavLM, BEATs

Abstract

Automated respiratory disease classification from auscultation sounds holds transformative potential for early clinical screening, yet existing approaches remain constrained by the quadratic complexity of Transformer-based sequence encoders, the limited expressiveness of conventional multi-layer perceptron classifiers, and the persistent challenge of scarce annotated medical audio data. This paper presents MambaResp-KAN, a novel architecture that unifies Bidirectional Mamba state space models, Kolmogorov–Arnold Network classifiers with learnable B-spline activation functions, multi-modal gated cross-attention fusion of WavLM, BEATs, and handcrafted spectral features, and class-conditional denoising diffusion probabilistic model augmentation into a single end-to-end framework for explainable respiratory sound analysis. The Bidirectional Mamba encoder achieves linear-time sequence modeling through input-dependent selective state space discretization, processing forward and reverses temporal streams with gated aggregation to capture both causal and anti-causal dependencies in respiratory waveforms. The Kolmogorov–Arnold Network classifier replaces fixed-activation neurons with learnable univariate B-spline functions on each network edge, directly grounded in the Kolmogorov–Arnold representation theorem, yielding a classifier that is both more parameter-efficient and intrinsically interpretable than standard multi-layer perceptrons. A gated cross-modal attention mechanism fuses embeddings from the self-supervised WavLM and BEATs audio encoders with handcrafted MFCC and spectral features, while a class-conditional denoising diffusion probabilistic model synthesizes high-fidelity respiratory audio to alleviate class imbalance. Integrated Gradients attribution and KAN concept bottleneck analysis provide clinician-interpretable explanations of model decisions. Evaluated on two benchmark datasets, Asthma Detection V2 (five classes, 1,211 samples) and KAUH (four classes, 940 samples), MambaResp-KAN achieves classification accuracies of 99.6% and 99.4%, respectively, surpassing the prior state-of-the-art E-RespiNet by 0.7 and 0.6 percentage points while using 62% fewer parameters and reducing inference latency by 56.3%. Cross-dataset evaluation yields an average accuracy of 84.0% with a generalization gap of 15.8%, compared to 23.3% for E-RespiNet, confirming improved transferability across clinical institutions.

Cite This Paper

Mohammed Tawfik, "MambaResp-KAN: A State Space Model with Kolmogorov–Arnold Networks and Diffusion-Based Augmentation for Explainable Respiratory Disease Classification", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.18, No.3, pp. 189-208, 2026. DOI:10.5815/ijigsp.2026.03.10

Reference

[1]World Health Organization, "Chronic respiratory diseases," WHO Fact Sheets, 2023. [Online]. Available: https://www.who.int/health-topics/chronic-respiratory-diseases
[2]B. M. Rocha, D. Filos, L. Mendes, G. Serbes, S. Ulukaya, Y. P. Kahya, N. Jakovljevic, T. L. Turukalo, I. M. Vogiatzis, E. Perantoni, et al., "An open access database for the evaluation of respiratory sound classification algorithms," Physiological Measurement, vol. 40, no. 3, p. 035001, 2019. doi:10.1088/1361-6579/ab03ea
[3]R. X. A. Pramono, S. Bowyer, and E. Rodriguez-Villegas, "Automatic adventitious respiratory sound analysis: A systematic review," PLoS ONE, vol. 12, no. 5, p. e0177926, 2017. doi:10.1371/journal.pone.0177926
[4]F. Demir, A. Sengur, and V. Bajaj, "Convolutional neural networks based efficient approach for classification of lung diseases," Health Information Science and Systems, vol. 8, no. 1, p. 4, 2020. doi:10.1007/s13755-019-0091-3
[5]W. He, Y. Yan, J. Ren, R. Bai, and X. Jiang, "Multi-view spectrogram transformer for respiratory sound classification," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8626–8630, 2024. doi:10.1109/ICASSP48485.2024.10445825
[6]Y. Ma, X. Xu, Q. Yu, Y. Zhang, Y. Li, J. Zhao, and G. Wang, "LungBRN: A smart digital stethoscope for detecting respiratory disease using bi-ResNet deep learning algorithm," in Proc. IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 1–4, 2019. doi:10.1109/BIOCAS.2019.8919021
[7]M. Fraiwan, L. Fraiwan, B. Khassawneh, and A. Ibnian, "Automatic identification of respiratory diseases from stethoscopic lung sound signals using ensemble classifiers," Biocybernetics and Biomedical Engineering, vol. 41, no. 1, pp. 1–14, 2021. doi:10.1016/j.bbe.2020.11.003
[8]M. Tawfik, I. S. Fathi, S. S. Nimbhore, I. M. Alsmadi, and M. S. Sawah, "E-RespiNet: An LLM-ELECTRA driven triple-stream CNN with feature fusion for asthma classification," PLoS ONE, vol. 20, no. 11, p. e0334528, 2025. doi:10.1371/journal.pone.0334528 
[9]J.-T. Tzeng, T.-S. Chi, and Y.-H. Tsai, "Improving the robustness and clinical applicability of automatic respiratory sound classification using deep learning-based audio enhancement," JMIR AI, vol. 4, no. 1, p. e67239, 2025. doi:10.2196/67239
[10]H. Kim, Y. J. Choi, and S. W. Lee, "Enhanced multichannel lung auscultation with deep learning for spatial respiratory sound analysis," Sensors, vol. 25, no. 2, p. 456, 2025. doi:10.3390/s25020456
[11]A. Gu, K. Goel, and C. Ré, "Efficiently modeling long sequences with structured state spaces," in Proc. International Conference on Learning Representations (ICLR), 2022.
[12]Y. Yue and Z. Li, "MedMamba: Vision Mamba for medical image classification," arXiv preprint arXiv:2403.03849, 2024.
[13]A. Gu and T. Dao, "Mamba: Linear-time sequence modeling with selective state spaces," in Proc. Conference on Language Modeling (COLM), 2024.
[14]S. Li, H. Singh, and A. Grover, "Mamba-ND: Selective state space modeling for multi-dimensional data," arXiv preprint arXiv:2402.05892, 2024.
[15]L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, "Vision Mamba: Efficient visual representation learning with bidirectional state space model," in Proc. International Conference on Machine Learning (ICML), 2024.
[16]A. Hatamizadeh and J. Kautz, "MambaVision: A hybrid Mamba-Transformer vision backbone," arXiv preprint arXiv:2407.08083, 2024.
[17]S. Bansal, S. A, M. P. J, S. Manikandan, S. Madisetty, M. Z. U. Rehman, C. S. Raghaw, G. Duggal, and N. Kumar, "A comprehensive survey of Mamba architectures for medical image analysis: Classification, segmentation, restoration and beyond," arXiv preprint arXiv:2410.02362, 2024.
[18]D. Mu, Z. Zhang, H. Yue, K. Wang, J. Peng, and W. Wang, "SELD-Mamba: Selective state-space model for sound event localization and detection with source distance estimation," arXiv preprint arXiv:2408.05057, 2024.
[19]L. Yue, S. Xing, Y. Lu, and T. Fu, "BioMamba: A pre-trained biomedical language representation model leveraging Mamba," arXiv preprint arXiv:2408.02600, 2024.
[20]T. Dao and A. Gu, "Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality," in Proc. International Conference on Machine Learning (ICML), 2024.
[21]A. Behrouz, M. Santacatterina, and R. Zabih, "MambaMixer: Efficient selective state space models with dual token and channel selection," arXiv preprint arXiv:2403.19888, 2024.
[22]A. Ali, I. Zimerman, and L. Wolf, "The hidden attention of Mamba models," in Proc. Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1516–1534, 2025.
[23]Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljacic, T. Y. Hou, and M. Tegmark, "KAN: Kolmogorov–Arnold Networks," in Proc. International Conference on Learning Representations (ICLR), 2025.
[24]A. N. Kolmogorov, "On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition," Doklady Akademii Nauk SSSR, vol. 114, pp. 953–956, 1957.
[25]Z. Liu, P. Ma, Y. Wang, W. Matusik, and M. Tegmark, "KAN 2.0: Kolmogorov–Arnold Networks meet science," arXiv preprint arXiv:2408.10205, 2024.
[26]A. Bodner, A. S. Tepsich, J. N. Spolski, and S. Pourteau, "Convolutional Kolmogorov–Arnold Networks," arXiv preprint arXiv:2406.13155, 2024.
[27]X. Yang and X. Wang, "Kolmogorov–Arnold Transformer," arXiv preprint arXiv:2409.10594, 2024.
[28]K. Shukla, J. D. Toscano, Z. Wang, Z. Zou, and G. E. Karniadakis, "A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks," arXiv preprint arXiv:2406.02917, 2024.
[29]C. de Boor, A Practical Guide to Splines. New York: Springer-Verlag, 1978.
[30]J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," in Proc. Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 6840–6851, 2020.
[31]A. Q. Nichol and P. Dhariwal, "Improved denoising diffusion probabilistic models," in Proc. International Conference on Machine Learning (ICML), pp. 8162–8171, 2021.
[32]Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, "DiffWave: A versatile diffusion model for audio synthesis," in Proc. International Conference on Learning Representations (ICLR), 2021.
[33]Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, "Score-based generative modeling through stochastic differential equations," in Proc. International Conference on Learning Representations (ICLR), 2021.
[34]M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for deep networks," in Proc. International Conference on Machine Learning (ICML), pp. 3319–3328, 2017.
[35]S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, et al., "WavLM: Large-scale self-supervised pre-training for full stack speech processing," IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022. doi:10.1109/JSTSP.2022.3188113
[36]S. Chen, Y. Wu, C. Wang, S. Liu, D. Tompkins, Z. Chen, and F. Wei, "BEATs: Audio pre-training with acoustic tokenizers," in Proc. International Conference on Machine Learning (ICML), PMLR 202, pp. 5178–5193, 2023.
[37]I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," in Proc. International Conference on Learning Representations (ICLR), 2019.
[38]A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proc. International Conference on Learning Representations (ICLR), 2021.
[39]J. Acharya and A. Basu, "Deep neural network for respiratory sound classification in wearable devices enabled by patient specific model tuning," IEEE Transactions on Biomedical Circuits and Systems, vol. 14, no. 3, pp. 535–544, 2020. doi:10.1109/TBCAS.2020.2981172