Sunnydayal Vanambathina

Work place: School of Electronics Engineering, VIT-AP University, Amaravati-522237, Andhra Pradesh, India

E-mail: sunny.dayal@vitap.ac.in

Website: https://orcid.org/0000-0002-2668-1727

Research Interests:

Biography

Sunnydayal Vanambathina was born in Vijayawada, Andhra Pradesh, India. He received the B.Tech. degree in Electronics and Communication Engineering from JNTU Hyderabad, India, in 2007, and the M.Tech. degree in Signal Processing from the National Institute of Technology Calicut, India, in 2010. He received the Ph.D. degree from the National Institute of Technology, Warangal, India. He was a visiting researcher at the University of Seville, Spain, under the Erasmus Mundus Ph.D. Exchange Programme (2013–2014). His research interests include speech enhancement, statistical signal processing, blind source separation, and machine learning.

Author Articles

Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using TS-Conformer

By Hanna Deepa Mallolu Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.03.07, Pub. Date: 8 Jun. 2026

Transformers, while powerful in capturing long-range dependencies with self-attention mechanisms, face several limitations in speech processing tasks. Moreover, transformers can lack inherent inductive biases to efficiently model local and fine-grained temporal and spectral structures critical for speech perception, resulting in suboptimal handling of fine details. To address this issue, this paper introduces a speech enhancement (SE) network that builds on a two-branch nested U-Net framework integrated with a two-stage conformer (TS-Conformer) for robust speech enhancement. The nested U-Net employs dual decoding branches for simultaneous spectral mapping and mask estimation, enabling complementary learning of speech characteristics. The TS-Conformer sequentially models temporal and frequency dependencies to improve contextual representation while maintaining local continuity. In addition, a complex feature extraction unit (CFEU-i) is incorporated to enhance multi-scale feature learning in the complex domain. By combining hierarchical feature extraction with sequential spectro-temporal modeling, the proposed method effectively suppresses noise while preserving speech quality. Experimental results demonstrate that the proposed NUNet-Conformer effectively achieves superior performance compared to recent SE approaches in terms of Signal-to-Distortion Ratio(SDR), Short-Time Objective Intelligibility(STOI), and Perceptual Evaluation of Speech Quality (PESQ).

[...] Read more.

Nested U-Net-Based Speech Enhancement with Multi-Scale Feature Extraction and Dual-Path Time-Frequency Feature Modeling

By Shaik AreefaBegam Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.02.07, Pub. Date: 8 Apr. 2026

Speech enhancement plays a vital role in improving the perceptual quality and intelligibility of speech signals degraded by environmental noise, particularly in modern network-based and signal processing systems. Traditional U-Net architectures capture local spectral details effectively but struggle to model long-range dependencies and may propagate residual noise through skip connections. Transformer-based models provide strong global context modeling but often fail to retain fine-grained spectral cues. To overcome these limitations, this paper presents a Nested U-Net–based network-oriented speech enhancement framework that incorporates Multi-Scale Feature Extraction, Feature Calibration, and a Dual-Path Higher-Order Information Interaction with Time-Frequency Attention module. The Multi-Scale Feature Extraction blocks in both encoder and decoder extract multi-resolution spectral patterns, while the nested topology strengthens hierarchical feature reuse. At the bottleneck, a stack of four Dual-Path Higher-Order Information Interaction with Time-Frequency Attention modules captures long-range temporal and spectral dependencies, and feature calibration adaptively filters encoder features to reduce noise transfer. Extensive experiments on Common Voice and LibriSpeech datasets demonstrate that the proposed model achieves superior perceptual evaluation of speech quality, short-time objective intelligibility, and signal-to-distortion ratio scores, particularly under moderate (0dB) signal-to-noise ratio conditions. The results confirm that the framework provides robust enhancement performance and consistently outperforms several recent state-of-the-art methods in terms of speech quality, intelligibility, and noise suppression.

[...] Read more.

MECS Press Menu

Sunnydayal Vanambathina

Author Articles

Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using TS-Conformer

Nested U-Net-Based Speech Enhancement with Multi-Scale Feature Extraction and Dual-Path Time-Frequency Feature Modeling

Other Articles