Work place: School of Electronics Engineering, VIT-AP University, Amaravati-522237, Andhra Pradesh, India
E-mail: areefabegum.24phd7023@vitap.ac.in
Website: https://orcid.org/0009-0009-0541-7132
Research Interests:
Biography
Shaik Areefa Begam was born in Jaggayyapet, Andhra Pradesh, India. She received her B.Tech. degree in Electronics and Communication Engineering from JNTU Kakinada, India, in 2012, and her M.Tech. degree in VLSI Design from JNTU Kakinada, India, in 2014. Her research interests include speech enhancement, statistical signal processing, blind source separation, and machine learning.
By Shaik Areefa Begam Sunnydayal Vanambathina
DOI: https://doi.org/10.5815/ijigsp.2026.02.07, Pub. Date: 8 Apr. 2026
Speech enhancement plays a vital role in improving the perceptual quality and intelligibility of speech signals degraded by environmental noise, particularly in modern network-based and signal processing systems. Traditional U-Net architectures capture local spectral details effectively but struggle to model long-range dependencies and may propagate residual noise through skip connections. Transformer-based models provide strong global context modeling but often fail to retain fine-grained spectral cues. To overcome these limitations, this paper presents a Nested U-Net–based network-oriented speech enhancement framework that incorporates Multi-Scale Feature Extraction, Feature Calibration, and a Dual-Path Higher-Order Information Interaction with Time-Frequency Attention module. The Multi-Scale Feature Extraction blocks in both encoder and decoder extract multi-resolution spectral patterns, while the nested topology strengthens hierarchical feature reuse. At the bottleneck, a stack of four Dual-Path Higher-Order Information Interaction with Time-Frequency Attention modules captures long-range temporal and spectral dependencies, and feature calibration adaptively filters encoder features to reduce noise transfer. Extensive experiments on Common Voice and LibriSpeech datasets demonstrate that the proposed model achieves superior perceptual evaluation of speech quality, short-time objective intelligibility, and signal-to-distortion ratio scores, particularly under moderate (0dB) signal-to-noise ratio conditions. The results confirm that the framework provides robust enhancement performance and consistently outperforms several recent state-of-the-art methods in terms of speech quality, intelligibility, and noise suppression.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals