Hanna Deepa Mallolu

Work place: School of Electronics Engineering, VIT-AP University, Amaravati, India

E-mail: deepa.24phd7028@vitap.ac.in

Website:

Research Interests:

Biography

Hanna Deepa Mallolu was born in Guntur, Andhra Pradesh, India, and she received her B. Tech Degree in Electrical and Electronics Engineering from JNTU Kakinada, India in 2016 and received her Master of Technol- ogy in Power system control and Automation from JNTU Kakinada, India in 2021. Her Current areas of research interest are speech enhancement, statistical signal processing, blind source separation, and machine learning techniques.

Author Articles
Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using TS-Conformer

By Hanna Deepa Mallolu Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.03.07, Pub. Date: 8 Jun. 2026

Transformers, while powerful in capturing long-range dependencies with self-attention mechanisms, face several limitations in speech processing tasks. Moreover, transformers can lack inherent inductive biases to efficiently model local and fine-grained temporal and spectral structures critical for speech perception, resulting in suboptimal handling of fine details. To address this issue, this paper introduces a speech enhancement (SE) network that builds on a two-branch nested U-Net framework integrated with a two-stage conformer (TS-Conformer) for robust speech enhancement. The nested U-Net employs dual decoding branches for simultaneous spectral mapping and mask estimation, enabling complementary learning of speech characteristics. The TS-Conformer sequentially models temporal and frequency dependencies to improve contextual representation while maintaining local continuity. In addition, a complex feature extraction unit (CFEU-i) is incorporated to enhance multi-scale feature learning in the complex domain. By combining hierarchical feature extraction with sequential spectro-temporal modeling, the proposed method effectively suppresses noise while preserving speech quality. Experimental results demonstrate that the proposed NUNet-Conformer effectively achieves superior performance compared to recent SE approaches in terms of Signal-to-Distortion Ratio(SDR), Short-Time Objective Intelligibility(STOI), and Perceptual Evaluation of Speech Quality (PESQ).

[...] Read more.
Other Articles