International Journal of Image, Graphics and Signal Processing (IJIGSP)

IJIGSP Vol. 18, No. 3, Jun. 2026

Cover page and Table of Contents: PDF (size: 1051KB)

Table Of Contents

REGULAR PAPERS

A Feature-Enhanced Hybrid CNN-BiLSTM Framework for Multi-Label Classification of Pathological High-Frequency Oscillations in Intracranial EEG Signals

By Rahma Maalej Abir Hadriche Mohamed Amine Ben Msarra Nawel Jmail

DOI: https://doi.org/10.5815/ijigsp.2026.03.01, Pub. Date: 8 Jun. 2026

Interictal high-frequency oscillations, including ripples in the frequency range of 80-250 hertz and fast ripples between 250-500 hertz, are increasingly recognized as reliable electrophysiological biomarkers for delineating the epileptogenic zone in patients with drug-resistant epilepsy. However, their routine clinical exploitation remains limited due to pronounced morphological variability, low signal-to-noise ratios, and the difficulty of identifying overlapping events in which ripples and fast ripples occur simultaneously.
This paper presents an automated deep learning framework designed for the multi-label classification of pathological high-frequency oscillations in intracranial electroencephalographic signals. The proposed approach integrates advanced nonlinear statistical descriptors, including entropy- and complexity-based measures, in order to enhance the discriminative representation of the signals. These features are processed using a hybrid deep learning architecture that combines convolutional neural networks for local morphological feature extraction with bidirectional long short-term memory networks to capture long-range temporal dependencies in non-stationary neural signals.
The proposed framework was evaluated using the publicly available multi-patient intracranial electroencephalography dataset provided by the Collaborative Research in Computational Neuroscience initiative. Experimental results demonstrate a classification accuracy of 98.3 %, along with high precision and balanced performance across all pathological classes. These findings indicate that the proposed method offers a robust and objective solution for the automated identification of high-frequency oscillations, with strong potential for improving presurgical evaluation and decision-making in epilepsy surgery.

[...] Read more.
Methods and Tools for Identifying Human Resource Lesions in Emergency Based on Multimodal Analysis and Deep Learning

By Yurii Ushenko Dmytro Uhryn Victoria Vysotska Lyubomyr Chyrun Zhengbing Hu Tetiana Rekunenko

DOI: https://doi.org/10.5815/ijigsp.2026.03.02, Pub. Date: 8 Jun. 2026

Emergencies of natural, technological, and military origin require rapid and accurate assessment of victims' conditions to support effective rescue and medical response. Traditional visual examination methods are often limited by stress, time pressure, and incomplete information, leading to delayed or inaccurate decisions. This study proposes a multimodal deep learning approach for automated identification of human resource lesions in emergency scenarios. The developed framework integrates visual, audio, and text/sensory data using convolutional neural networks, Transformer-based models, and a Transformer Cross-Attention fusion mechanism. The proposed architecture enables effective extraction and integration of heterogeneous features for lesion classification, severity estimation, and automated medical triage. Experimental evaluation was conducted on multimodal datasets containing injury images, audio recordings, and symptom descriptions. The model was trained using a combined loss function and evaluated with classification, regression, and triage metrics. The results demonstrate high system performance, achieving a macro-F1 score of 0.87, validation accuracy of 86–87%, and triage accuracy above 90%, including 95% for the RED category. The regression model for severity prediction achieved an R² value of 0.92, while modality importance analysis confirmed the dominant contribution of visual information. The experiments also showed stable model convergence and strong generalisation ability without significant overfitting. The proposed multimodal framework confirms the effectiveness of deep learning and cross-attention mechanisms for automated lesion identification and emergency medical triage. The developed approach can be applied in decision-support systems for rescue operations, emergency medicine, and intelligent VR/AR training simulators.

[...] Read more.
Efficient Road Cracks Segmentation Using Physics Informed Neural Network Approach

By Omar Knnou Rachid Benoudi Mourad Haddioui Said Agoujil Youssef Qaraai

DOI: https://doi.org/10.5815/ijigsp.2026.03.03, Pub. Date: 8 Jun. 2026

Herein, we propose a mathematical model for road crack segmentation in images, focusing on the difficul- ties of the real world road conditions, such as the lighting and color changes, complex crack shape etc. The proposed model belongs to the family of nonlinear partial differential equations (PDEs), involving edge-aware anisotropic diffu- sion, curvature-driven contour evolution, high order biharmonic regularization, and feature-driven attraction force for capturing the crack regions. A theoretical analysis is conducted to show the well-posedness of the model. In addition, a physics-informed neural network (PINN) version of the model is presented which allows us to discretize the PDEs in a mesh-free fashion and to approximate high order derivatives through the deep neural networks. Various numerical experi- ments on EdmCrack600 data are implemented for validating the proposed method. All the experimental results show that the proposed model is superior to the other segmentation models, and that our model achieves excellent performance in terms of the metrics, i.e., dice similarity, intersection over union, sensitivity, and specificity.

[...] Read more.
Optimized Classification of Steel Surface Defects via Hybrid Features and Neighborhood Component Analysis

By Ritu Juneja Anil Dudy

DOI: https://doi.org/10.5815/ijigsp.2026.03.04, Pub. Date: 8 Jun. 2026

This paper establishes a new process of surface defect detection of steel products with both integrated image processing and image vision capabilities. The approach which incorporates Multi-Scale Local Binary Pattern (MSLBP), Dual-Tree Complex Wavelet Transform (DTCWT), and Gabor Wavelet in extracting features, whilst the Neighborhood Component Analysis (NCA) approach is in selecting the features. Ensemble AdaBoost is employed as a comparative baseline classifier and the final defect detection performance is presented in the Enhanced Snake Optimized Support Vector Machines (ESO-SVM) model. The suggested approach is superior to the classical methods, as the results of the experiments show 98.8 percent accuracy and 98.5 percent F1-score at the process of detecting fine and irregular defects under different production conditions. The system improves reliability and scalability of automatic defect detection thus increasing the quality of steel products and decreasing wastes.

[...] Read more.
Precision Agriculture through Multispectral Imaging and Machine Learning for Paddy Field Health Assessment

By G. Ravi Kumar C. Sushama

DOI: https://doi.org/10.5815/ijigsp.2026.03.05, Pub. Date: 8 Jun. 2026

Monitoring the health of the paddy crop is crucial for maintaining agricultural productivity, especially in areas where crop losses due to the spread of diseases and infestation by weeds are common. The traditional method of manual inspection in the fields involves extensive manual labor and overhead, and remains slow and cumbersome for a large-scale monitoring approach. This paper presents a machine learning framework for computer-assisted detection of both weeds and diseases from multispectral satellite images. The method starts by applying extensive preprocessing steps encompassing radiometric correction, geometrical alignment, and noise reduction as a prelude to analysing the images. Following preprocessing, several vegetation indices like the Normalized Difference Vegetation Index (NDVI), the Soil Adjusted Vegetation Index (SAVI), and the Green NDVI (GNDVI) are used as features for extracting plant vigor and even early stress symptoms. These indices act as inputs for a set of classification models. Multiple machine learning classifier algorithms—the Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Gradient Boosting (GB), and Logistic Regression (LR)—are tried for classifying healthy crops from weed-infested areas and disease-infested regions. The system is trained and tested on a dataset consisting of Sentinel-2 multispectral imagery supplemented by labeled ground-truth map data from varied paddy cropland. Evaluation of performance has been made according to Accuracy, Precision, Recall, F1-Score, ROC-AUC, and Cohen’s Kappa scores. SVM proved the best among all the classifiers based on a reported accuracy of 91.3%, an average ROC-AUC measure of 0.94 as well as a measure of MCC as 0.85. These observations testify to the success of machine learning in formulating scalable, cost-effective, and dependable methodologies for precision crop monitoring and making decisions on time.

[...] Read more.
DS-MelNet: An Enhanced Dual Stream Semi-Supervised Mechanism for Melanoma Classification

By Apurva S. Shinde Sangita S. Chaudhari

DOI: https://doi.org/10.5815/ijigsp.2026.03.06, Pub. Date: 8 Jun. 2026

Melanoma skin disease is a major concern for skin cancer-related deaths worldwide. Early diagnosis and detection are crucial for improving patient outcomes. However, existing detection methods often result in false alarms, highlighting the need for more accurate and reliable approaches. This paper proposes a Dual-Stream Semi-Supervised Melanoma Network (DS-MelNet) for melanoma detection. The DS-MelNet utilizes a semi-supervised learning framework to incorporate both labeled and unlabeled data, enhancing detection accuracy. The model's performance is evaluated on the SIIM-ISIC Melanoma Classification Challenge dataset. The dataset undergoes hair detection and removal from skin lesion images using three algorithms proposed in literature viz. Modified Dull Razor, Modified E-shaver and Adaptive principle curvature with Modified dull razor fusion. Performance of the proposed models is assessed through commonly used metrics that include Accuracy, Recall, Precision, and F1-score. Comparative analysis of the DS-MelNet is performed against two benchmarks: Simple Convolutional Neural Network (SCNN) and a Fine-tuned VGG-16 model proposed in this paper. The results clearly indicate that the DS-MelNet demonstrates superior performance, achieving an accuracy of 86% and outperforming both SCNN (76%) and VGG-16 (82%) models. This exceptional performance underscores the potential of the DS-MelNet for effective melanoma classification. The study highlights the promise of semi-supervised learning frameworks and sophisticated neural networks in enhancing melanoma diagnostics. The ability of the proposed model to learn from a small set of labeled data makes it highly suitable for real-world applications where annotated datasets are limited.

[...] Read more.
Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using TS-Conformer

By Hanna Deepa Mallolu Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.03.07, Pub. Date: 8 Jun. 2026

Transformers, while powerful in capturing long-range dependencies with self-attention mechanisms, face several limitations in speech processing tasks. Moreover, transformers can lack inherent inductive biases to efficiently model local and fine-grained temporal and spectral structures critical for speech perception, resulting in suboptimal handling of fine details. To address this issue, this paper introduces a speech enhancement (SE) network that builds on a two-branch nested U-Net framework integrated with a two-stage conformer (TS-Conformer) for robust speech enhancement. The nested U-Net employs dual decoding branches for simultaneous spectral mapping and mask estimation, enabling complementary learning of speech characteristics. The TS-Conformer sequentially models temporal and frequency dependencies to improve contextual representation while maintaining local continuity. In addition, a complex feature extraction unit (CFEU-i) is incorporated to enhance multi-scale feature learning in the complex domain. By combining hierarchical feature extraction with sequential spectro-temporal modeling, the proposed method effectively suppresses noise while preserving speech quality. Experimental results demonstrate that the proposed NUNet-Conformer effectively achieves superior performance compared to recent SE approaches in terms of Signal-to-Distortion Ratio(SDR), Short-Time Objective Intelligibility(STOI), and Perceptual Evaluation of Speech Quality (PESQ).

[...] Read more.
PARSeq-GeoAware: Explicit Geometric Modeling for Robust Scene Text Recognition in the Wild

By Shilpi Goyal Deepak Motwani

DOI: https://doi.org/10.5815/ijigsp.2026.03.08, Pub. Date: 8 Jun. 2026

Scene text recognition in unconstrained environments remains challenging due to geometric distortions including arbitrary orientations, curved baselines, and perspective deformations. Transformer-based methods achieve strong performance on regular benchmarks through implicit spatial learning but suffer accuracy drops of 8–12% on heavily curved text, where attention weights become diffuse and fail to capture explicit geometric structure. No prior work quantifies the isolated contribution of explicit geometric modeling within transformer architectures. To address this, we propose PARSeq-GeoAware, a dual-branch scene text recognition framework integrating an Enhanced Geometric Feature Extractor (GFE), adaptive coarse-to-fine rectification (affine + TPS), and a cross-attention fusion module combining explicit geometric representations with ViT-based visual features decoded by a CTC head. Trained on 176,630 image-label pairs across three progressive stages and evaluated on six standard benchmarks, PARSeq-GeoAware achieves 89.87% on IIIT5K, 82.07% on SVT, 84.55% on ICDAR13, 68.90% on ICDAR15, 71.26% on ArT, and 81.27% on Total-Text. On irregular and curved text benchmarks — the primary target of this work — our ±1 character accuracy reaches 84.10% on ArT and 90.05% on Total-Text, exceeding PARSeq's published word accuracy of 79.3% and 87.1% respectively by +4.8pp and +2.95pp, without a language model. Ablation studies confirm that disabling all geometric components reduces ArT word accuracy from 71.26% to 42.89% (−28.37pp), establishing the GFE as the primary driver of irregular text performance. The adaptive rectification module achieves full-pipeline inference in 11.9 ± 1.4ms on Tesla T4, which is 6.5× faster than DAN (78ms). A three-stage progressive training curriculum prevents catastrophic forgetting, retaining 89.87% regular accuracy after irregular specialization versus 80.6% with joint training (+14.8pp). These results demonstrate that explicit geometric modeling enables a single architecture to handle synthetic, regular, and irregular scene text without specialized language model post-processing. The code is available at https://github.com/Arni-123/PARSeq-GeoAware.

[...] Read more.
Comparative Analysis and Ensemble Optimization of CNN Architectures for MRI-Based Brain Tumor Diagnosis

By Md. Tariqul Islam Pintu Chandra Shill Md Sadiq Iqbal

DOI: https://doi.org/10.5815/ijigsp.2026.03.09, Pub. Date: 8 Jun. 2026

Brain tumor detection and classification from MRI images is a challenging task. Early and accurate diagnosis are essential for selecting appropriate treatment plans and improving patient outcomes. Despite significant advances in deep learning for medical image recognition, comprehensive comparative analyses of brain tumor classification models, particularly regarding ensemble optimization, remain limited. This paper uses four state-of-the-art deep learning frameworks, namely EfficientNetB4, MobileNetV3, MobileNetV2, and EfficientNetB0, to classify brain MRI images into four categories: Glioma, Meningioma, Pituitary tumor, and Normal. It employs a two-phase transfer learning approach, followed by 5-fold cross-validation on 875 MRI images. A unified experimental framework is employed, incorporating a two-phase transfer learning approach, consistent preprocessing, and a rigorous evaluation protocol with 5-fold cross-validation and an independent test set to prevent data leakage. Both full and selective ensemble strategies are examined to improve the robustness and stability. The models are evaluated using accuracy, precision, recall, F-1 score, confusion matrices, and accuracy curves, and statistical validation using McNemar’s test. MobileNetV3 achieves the highest test accuracy of 98.76%, followed by EfficientNetB4 (97.89%) and EfficientNetB0 (93.48%). MobileNetV2 performs significantly worse, with an accuracy of less than 80%. The selective ensemble technique (which uses the best models) attains the highest accuracy of 92.97%, compared to the full ensemble (84.40%), which improves prediction robustness but does not surpass the best individual model in peak accuracy. Overall, it can be concluded that MobileNetV3 is the most suitable architecture for brain tumor classification, delivering high accuracy with minimal computational complexity. The selective ensemble approach also enhances performance, maintaining computational efficiency, emphasizing the importance of informed model selection in neuro-oncological image analysis and clinical decision-support systems.

[...] Read more.
MambaResp-KAN: A State Space Model with Kolmogorov–Arnold Networks and Diffusion-Based Augmentation for Explainable Respiratory Disease Classification

By Mohammed Tawfik

DOI: https://doi.org/10.5815/ijigsp.2026.03.10, Pub. Date: 8 Jun. 2026

Automated respiratory disease classification from auscultation sounds holds transformative potential for early clinical screening, yet existing approaches remain constrained by the quadratic complexity of Transformer-based sequence encoders, the limited expressiveness of conventional multi-layer perceptron classifiers, and the persistent challenge of scarce annotated medical audio data. This paper presents MambaResp-KAN, a novel architecture that unifies Bidirectional Mamba state space models, Kolmogorov–Arnold Network classifiers with learnable B-spline activation functions, multi-modal gated cross-attention fusion of WavLM, BEATs, and handcrafted spectral features, and class-conditional denoising diffusion probabilistic model augmentation into a single end-to-end framework for explainable respiratory sound analysis. The Bidirectional Mamba encoder achieves linear-time sequence modeling through input-dependent selective state space discretization, processing forward and reverses temporal streams with gated aggregation to capture both causal and anti-causal dependencies in respiratory waveforms. The Kolmogorov–Arnold Network classifier replaces fixed-activation neurons with learnable univariate B-spline functions on each network edge, directly grounded in the Kolmogorov–Arnold representation theorem, yielding a classifier that is both more parameter-efficient and intrinsically interpretable than standard multi-layer perceptrons. A gated cross-modal attention mechanism fuses embeddings from the self-supervised WavLM and BEATs audio encoders with handcrafted MFCC and spectral features, while a class-conditional denoising diffusion probabilistic model synthesizes high-fidelity respiratory audio to alleviate class imbalance. Integrated Gradients attribution and KAN concept bottleneck analysis provide clinician-interpretable explanations of model decisions. Evaluated on two benchmark datasets, Asthma Detection V2 (five classes, 1,211 samples) and KAUH (four classes, 940 samples), MambaResp-KAN achieves classification accuracies of 99.6% and 99.4%, respectively, surpassing the prior state-of-the-art E-RespiNet by 0.7 and 0.6 percentage points while using 62% fewer parameters and reducing inference latency by 56.3%. Cross-dataset evaluation yields an average accuracy of 84.0% with a generalization gap of 15.8%, compared to 23.3% for E-RespiNet, confirming improved transferability across clinical institutions.

[...] Read more.
SWT-PnP-DnCNN: Medical Image Fusion Using Stationary Wavelet Transform and Plug-and-Play Deep Denoising Model

By Amit Pandey Prabhishek Singh Akansha Singh Achyut Shankar Manoj Diwakar

DOI: https://doi.org/10.5815/ijigsp.2026.03.11, Pub. Date: 8 Jun. 2026

This paper presents a hybrid medical image fusion (MIF) technique (SWT-PnP-DnCNN) that combines multiscale decomposition, spatial-frequency-driven fusion, and deep denoising priors to efficiently integrate MIF images. The SWT-PnP-DnCNN begins with the Stationary Wavelet Transform (SWT) to decompose input medical images into low-frequency (LFSBs) and high-frequency (HFSBs) subbands. The LFSBs are fused using spatial frequency-based weighted averaging, effectively integrating overall intensity and contrast information. For the HFSBs, a local energy and max-selection strategy is adopted to retain salient edge features from the source images. Following the initial fusion, a Plug-and-Play (PnP) optimization strategy is applied to improve this fused image. This step uses a pretrained DnCNN model as a deep denoiser, serving as an implicit image prior in a model-driven iterative framework. Each iteration alternates between a data consistency step and a denoising step, significantly reducing artifacts and enhancing structural fidelity in the result. The effectiveness of SWT-PnP-DnCNN is demonstrated on benchmark CT-MRI, MRI-PET, and MRI-PET datasets. Extensive evaluation against classical hybrid strategies and recent CNN-based fusion methods shows that SWT-PnP-DnCNN achieves the best performance across standard metrics. We further include mean±std reporting and paired t-tests, confirming statistically significant improvements (p < 0.05). Ablation studies validate each design choice by comparing SWT-only vs. SWT+PnP and evaluating denoiser alternatives, with sensitivity to PnP iterations, regularization strength, and SWT levels. The runtime analysis clarifies feasible deployment, particularly in offline or cloud-based environments. Overall, SWT-PnP-DnCNN emerges as a robust, interpretable, and clinically valuable solution for enhancing MIF in medical imaging applications.

[...] Read more.
Multi-channel Prediction Residue Modeling (MPRM) Using Second Order Residual Statistics for Enhanced CFA Artifact Based Forgery Detection

By Somendra Kumar Soni Mohammad Rafique Khan Vinay Kumar Singh

DOI: https://doi.org/10.5815/ijigsp.2026.03.12, Pub. Date: 8 Jun. 2026

In recent years advanced image editing tools are easily available to tamper the images in visually undetectable form. This created an urgent need of reliable and robust technique to authenticate image integrity. Digital camera produces the full color image through interpolating remaining channels which creates periodic artifacts known as Color Filter Array Artifacts (CFAA). In forged image these artifact consistency is disturbed, which is often used to detect and localize the forgery in tampered images. Existing CFAA based splicing detection methods often rely on single channel, exhibit high computational complexity and show degraded performance under JPEG compression. Although some work have explored multi-channel CFA based approaches but their ability to effectively capture cross channel dependencies and maintain robustness under heavy JPEG compression remains limited. To address these gaps, we propose a splicing detection framework that performs second order statistical analysis on residuals extracted from all color channels. Unlike existing multichannel CFAA techniques, this work explicitly models inter-channel relationships through the Error Variance Ratio (EVR) and introduces a novel Inter Block Mean Square Error (IBMSE) metric. This formulation enhances the characterization of CFAA periodicity and improves discrimination between authentic and tampered regions. The proposed technique is evaluated on CUISDE, RTD and IMD datasets and compared with existing CFA based localization methods using ROC, precision-recall and AUC metric. Experimental results demonstrate that the proposed method improves localization performance and shows robustness against varying levels of JPEG compression. 

[...] Read more.