International Journal of Image, Graphics and Signal Processing (IJIGSP)

ISSN: 2074-9074 (Print)

ISSN: 2074-9082 (Online)

DOI: https://doi.org/10.5815/ijigsp

Website: https://www.mecs-press.org/ijigsp

Published By: MECS Press

Frequency: 6 issues per year

Number(s) Available: 143

(IJIGSP) in Google Scholar Citations / h5-index

IJIGSP is committed to bridge the theory and practice of images, graphics, and signal processing. From innovative ideas to specific algorithms and full system implementations, IJIGSP publishes original, peer-reviewed, and high quality articles in the areas of images, graphics, and signal processing. IJIGSP is a well-indexed scholarly journal and is indispensable reading and references for people working at the cutting edge of images, graphics, and signal processing applications.

 

IJIGSP has been abstracted or indexed by several world class databases: Scopus, Google Scholar, Microsoft Academic Search, CrossRef, Baidu Wenku, IndexCopernicus, IET Inspec, EBSCO, JournalSeek, ULRICH's Periodicals Directory, WorldCat, Scirus, Academic Journals Database, Stanford University Libraries, Cornell University Library, UniSA Library, CNKI Scholar, ProQuest, J-Gate, ZDB, BASE, OhioLINK, iThenticate, Open Access Articles, Open Science Directory, National Science Library of Chinese Academy of Sciences, The HKU Scholars Hub, etc..

Latest Issue
Most Viewed
Most Downloaded

IJIGSP Vol. 18, No. 3, Jun. 2026

REGULAR PAPERS

A Feature-Enhanced Hybrid CNN-BiLSTM Framework for Multi-Label Classification of Pathological High-Frequency Oscillations in Intracranial EEG Signals

By Rahma Maalej Abir Hadriche Mohamed Amine Ben Msarra Nawel Jmail

DOI: https://doi.org/10.5815/ijigsp.2026.03.01, Pub. Date: 8 Jun. 2026

Interictal high-frequency oscillations, including ripples in the frequency range of 80-250 hertz and fast ripples between 250-500 hertz, are increasingly recognized as reliable electrophysiological biomarkers for delineating the epileptogenic zone in patients with drug-resistant epilepsy. However, their routine clinical exploitation remains limited due to pronounced morphological variability, low signal-to-noise ratios, and the difficulty of identifying overlapping events in which ripples and fast ripples occur simultaneously.
This paper presents an automated deep learning framework designed for the multi-label classification of pathological high-frequency oscillations in intracranial electroencephalographic signals. The proposed approach integrates advanced nonlinear statistical descriptors, including entropy- and complexity-based measures, in order to enhance the discriminative representation of the signals. These features are processed using a hybrid deep learning architecture that combines convolutional neural networks for local morphological feature extraction with bidirectional long short-term memory networks to capture long-range temporal dependencies in non-stationary neural signals.
The proposed framework was evaluated using the publicly available multi-patient intracranial electroencephalography dataset provided by the Collaborative Research in Computational Neuroscience initiative. Experimental results demonstrate a classification accuracy of 98.3 %, along with high precision and balanced performance across all pathological classes. These findings indicate that the proposed method offers a robust and objective solution for the automated identification of high-frequency oscillations, with strong potential for improving presurgical evaluation and decision-making in epilepsy surgery.

[...] Read more.
Methods and Tools for Identifying Human Resource Lesions in Emergency Based on Multimodal Analysis and Deep Learning

By Yurii Ushenko Dmytro Uhryn Victoria Vysotska Lyubomyr Chyrun Zhengbing Hu Tetiana Rekunenko

DOI: https://doi.org/10.5815/ijigsp.2026.03.02, Pub. Date: 8 Jun. 2026

Emergencies of natural, technological, and military origin require rapid and accurate assessment of victims' conditions to support effective rescue and medical response. Traditional visual examination methods are often limited by stress, time pressure, and incomplete information, leading to delayed or inaccurate decisions. This study proposes a multimodal deep learning approach for automated identification of human resource lesions in emergency scenarios. The developed framework integrates visual, audio, and text/sensory data using convolutional neural networks, Transformer-based models, and a Transformer Cross-Attention fusion mechanism. The proposed architecture enables effective extraction and integration of heterogeneous features for lesion classification, severity estimation, and automated medical triage. Experimental evaluation was conducted on multimodal datasets containing injury images, audio recordings, and symptom descriptions. The model was trained using a combined loss function and evaluated with classification, regression, and triage metrics. The results demonstrate high system performance, achieving a macro-F1 score of 0.87, validation accuracy of 86–87%, and triage accuracy above 90%, including 95% for the RED category. The regression model for severity prediction achieved an R² value of 0.92, while modality importance analysis confirmed the dominant contribution of visual information. The experiments also showed stable model convergence and strong generalisation ability without significant overfitting. The proposed multimodal framework confirms the effectiveness of deep learning and cross-attention mechanisms for automated lesion identification and emergency medical triage. The developed approach can be applied in decision-support systems for rescue operations, emergency medicine, and intelligent VR/AR training simulators.

[...] Read more.
Efficient Road Cracks Segmentation Using Physics Informed Neural Network Approach

By Omar Knnou Rachid Benoudi Mourad Haddioui Said Agoujil Youssef Qaraai

DOI: https://doi.org/10.5815/ijigsp.2026.03.03, Pub. Date: 8 Jun. 2026

Herein, we propose a mathematical model for road crack segmentation in images, focusing on the difficul- ties of the real world road conditions, such as the lighting and color changes, complex crack shape etc. The proposed model belongs to the family of nonlinear partial differential equations (PDEs), involving edge-aware anisotropic diffu- sion, curvature-driven contour evolution, high order biharmonic regularization, and feature-driven attraction force for capturing the crack regions. A theoretical analysis is conducted to show the well-posedness of the model. In addition, a physics-informed neural network (PINN) version of the model is presented which allows us to discretize the PDEs in a mesh-free fashion and to approximate high order derivatives through the deep neural networks. Various numerical experi- ments on EdmCrack600 data are implemented for validating the proposed method. All the experimental results show that the proposed model is superior to the other segmentation models, and that our model achieves excellent performance in terms of the metrics, i.e., dice similarity, intersection over union, sensitivity, and specificity.

[...] Read more.
Optimized Classification of Steel Surface Defects via Hybrid Features and Neighborhood Component Analysis

By Ritu Juneja Anil Dudy

DOI: https://doi.org/10.5815/ijigsp.2026.03.04, Pub. Date: 8 Jun. 2026

This paper establishes a new process of surface defect detection of steel products with both integrated image processing and image vision capabilities. The approach which incorporates Multi-Scale Local Binary Pattern (MSLBP), Dual-Tree Complex Wavelet Transform (DTCWT), and Gabor Wavelet in extracting features, whilst the Neighborhood Component Analysis (NCA) approach is in selecting the features. Ensemble AdaBoost is employed as a comparative baseline classifier and the final defect detection performance is presented in the Enhanced Snake Optimized Support Vector Machines (ESO-SVM) model. The suggested approach is superior to the classical methods, as the results of the experiments show 98.8 percent accuracy and 98.5 percent F1-score at the process of detecting fine and irregular defects under different production conditions. The system improves reliability and scalability of automatic defect detection thus increasing the quality of steel products and decreasing wastes.

[...] Read more.
Precision Agriculture through Multispectral Imaging and Machine Learning for Paddy Field Health Assessment

By G. Ravi Kumar C. Sushama

DOI: https://doi.org/10.5815/ijigsp.2026.03.05, Pub. Date: 8 Jun. 2026

Monitoring the health of the paddy crop is crucial for maintaining agricultural productivity, especially in areas where crop losses due to the spread of diseases and infestation by weeds are common. The traditional method of manual inspection in the fields involves extensive manual labor and overhead, and remains slow and cumbersome for a large-scale monitoring approach. This paper presents a machine learning framework for computer-assisted detection of both weeds and diseases from multispectral satellite images. The method starts by applying extensive preprocessing steps encompassing radiometric correction, geometrical alignment, and noise reduction as a prelude to analysing the images. Following preprocessing, several vegetation indices like the Normalized Difference Vegetation Index (NDVI), the Soil Adjusted Vegetation Index (SAVI), and the Green NDVI (GNDVI) are used as features for extracting plant vigor and even early stress symptoms. These indices act as inputs for a set of classification models. Multiple machine learning classifier algorithms—the Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Gradient Boosting (GB), and Logistic Regression (LR)—are tried for classifying healthy crops from weed-infested areas and disease-infested regions. The system is trained and tested on a dataset consisting of Sentinel-2 multispectral imagery supplemented by labeled ground-truth map data from varied paddy cropland. Evaluation of performance has been made according to Accuracy, Precision, Recall, F1-Score, ROC-AUC, and Cohen’s Kappa scores. SVM proved the best among all the classifiers based on a reported accuracy of 91.3%, an average ROC-AUC measure of 0.94 as well as a measure of MCC as 0.85. These observations testify to the success of machine learning in formulating scalable, cost-effective, and dependable methodologies for precision crop monitoring and making decisions on time.

[...] Read more.
DS-MelNet: An Enhanced Dual Stream Semi-Supervised Mechanism for Melanoma Classification

By Apurva S. Shinde Sangita S. Chaudhari

DOI: https://doi.org/10.5815/ijigsp.2026.03.06, Pub. Date: 8 Jun. 2026

Melanoma skin disease is a major concern for skin cancer-related deaths worldwide. Early diagnosis and detection are crucial for improving patient outcomes. However, existing detection methods often result in false alarms, highlighting the need for more accurate and reliable approaches. This paper proposes a Dual-Stream Semi-Supervised Melanoma Network (DS-MelNet) for melanoma detection. The DS-MelNet utilizes a semi-supervised learning framework to incorporate both labeled and unlabeled data, enhancing detection accuracy. The model's performance is evaluated on the SIIM-ISIC Melanoma Classification Challenge dataset. The dataset undergoes hair detection and removal from skin lesion images using three algorithms proposed in literature viz. Modified Dull Razor, Modified E-shaver and Adaptive principle curvature with Modified dull razor fusion. Performance of the proposed models is assessed through commonly used metrics that include Accuracy, Recall, Precision, and F1-score. Comparative analysis of the DS-MelNet is performed against two benchmarks: Simple Convolutional Neural Network (SCNN) and a Fine-tuned VGG-16 model proposed in this paper. The results clearly indicate that the DS-MelNet demonstrates superior performance, achieving an accuracy of 86% and outperforming both SCNN (76%) and VGG-16 (82%) models. This exceptional performance underscores the potential of the DS-MelNet for effective melanoma classification. The study highlights the promise of semi-supervised learning frameworks and sophisticated neural networks in enhancing melanoma diagnostics. The ability of the proposed model to learn from a small set of labeled data makes it highly suitable for real-world applications where annotated datasets are limited.

[...] Read more.
Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using TS-Conformer

By Hanna Deepa Mallolu Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.03.07, Pub. Date: 8 Jun. 2026

Transformers, while powerful in capturing long-range dependencies with self-attention mechanisms, face several limitations in speech processing tasks. Moreover, transformers can lack inherent inductive biases to efficiently model local and fine-grained temporal and spectral structures critical for speech perception, resulting in suboptimal handling of fine details. To address this issue, this paper introduces a speech enhancement (SE) network that builds on a two-branch nested U-Net framework integrated with a two-stage conformer (TS-Conformer) for robust speech enhancement. The nested U-Net employs dual decoding branches for simultaneous spectral mapping and mask estimation, enabling complementary learning of speech characteristics. The TS-Conformer sequentially models temporal and frequency dependencies to improve contextual representation while maintaining local continuity. In addition, a complex feature extraction unit (CFEU-i) is incorporated to enhance multi-scale feature learning in the complex domain. By combining hierarchical feature extraction with sequential spectro-temporal modeling, the proposed method effectively suppresses noise while preserving speech quality. Experimental results demonstrate that the proposed NUNet-Conformer effectively achieves superior performance compared to recent SE approaches in terms of Signal-to-Distortion Ratio(SDR), Short-Time Objective Intelligibility(STOI), and Perceptual Evaluation of Speech Quality (PESQ).

[...] Read more.
PARSeq-GeoAware: Explicit Geometric Modeling for Robust Scene Text Recognition in the Wild

By Shilpi Goyal Deepak Motwani

DOI: https://doi.org/10.5815/ijigsp.2026.03.08, Pub. Date: 8 Jun. 2026

Scene text recognition in unconstrained environments remains challenging due to geometric distortions including arbitrary orientations, curved baselines, and perspective deformations. Transformer-based methods achieve strong performance on regular benchmarks through implicit spatial learning but suffer accuracy drops of 8–12% on heavily curved text, where attention weights become diffuse and fail to capture explicit geometric structure. No prior work quantifies the isolated contribution of explicit geometric modeling within transformer architectures. To address this, we propose PARSeq-GeoAware, a dual-branch scene text recognition framework integrating an Enhanced Geometric Feature Extractor (GFE), adaptive coarse-to-fine rectification (affine + TPS), and a cross-attention fusion module combining explicit geometric representations with ViT-based visual features decoded by a CTC head. Trained on 176,630 image-label pairs across three progressive stages and evaluated on six standard benchmarks, PARSeq-GeoAware achieves 89.87% on IIIT5K, 82.07% on SVT, 84.55% on ICDAR13, 68.90% on ICDAR15, 71.26% on ArT, and 81.27% on Total-Text. On irregular and curved text benchmarks — the primary target of this work — our ±1 character accuracy reaches 84.10% on ArT and 90.05% on Total-Text, exceeding PARSeq's published word accuracy of 79.3% and 87.1% respectively by +4.8pp and +2.95pp, without a language model. Ablation studies confirm that disabling all geometric components reduces ArT word accuracy from 71.26% to 42.89% (−28.37pp), establishing the GFE as the primary driver of irregular text performance. The adaptive rectification module achieves full-pipeline inference in 11.9 ± 1.4ms on Tesla T4, which is 6.5× faster than DAN (78ms). A three-stage progressive training curriculum prevents catastrophic forgetting, retaining 89.87% regular accuracy after irregular specialization versus 80.6% with joint training (+14.8pp). These results demonstrate that explicit geometric modeling enables a single architecture to handle synthetic, regular, and irregular scene text without specialized language model post-processing. The code is available at https://github.com/Arni-123/PARSeq-GeoAware.

[...] Read more.
Comparative Analysis and Ensemble Optimization of CNN Architectures for MRI-Based Brain Tumor Diagnosis

By Md. Tariqul Islam Pintu Chandra Shill Md Sadiq Iqbal

DOI: https://doi.org/10.5815/ijigsp.2026.03.09, Pub. Date: 8 Jun. 2026

Brain tumor detection and classification from MRI images is a challenging task. Early and accurate diagnosis are essential for selecting appropriate treatment plans and improving patient outcomes. Despite significant advances in deep learning for medical image recognition, comprehensive comparative analyses of brain tumor classification models, particularly regarding ensemble optimization, remain limited. This paper uses four state-of-the-art deep learning frameworks, namely EfficientNetB4, MobileNetV3, MobileNetV2, and EfficientNetB0, to classify brain MRI images into four categories: Glioma, Meningioma, Pituitary tumor, and Normal. It employs a two-phase transfer learning approach, followed by 5-fold cross-validation on 875 MRI images. A unified experimental framework is employed, incorporating a two-phase transfer learning approach, consistent preprocessing, and a rigorous evaluation protocol with 5-fold cross-validation and an independent test set to prevent data leakage. Both full and selective ensemble strategies are examined to improve the robustness and stability. The models are evaluated using accuracy, precision, recall, F-1 score, confusion matrices, and accuracy curves, and statistical validation using McNemar’s test. MobileNetV3 achieves the highest test accuracy of 98.76%, followed by EfficientNetB4 (97.89%) and EfficientNetB0 (93.48%). MobileNetV2 performs significantly worse, with an accuracy of less than 80%. The selective ensemble technique (which uses the best models) attains the highest accuracy of 92.97%, compared to the full ensemble (84.40%), which improves prediction robustness but does not surpass the best individual model in peak accuracy. Overall, it can be concluded that MobileNetV3 is the most suitable architecture for brain tumor classification, delivering high accuracy with minimal computational complexity. The selective ensemble approach also enhances performance, maintaining computational efficiency, emphasizing the importance of informed model selection in neuro-oncological image analysis and clinical decision-support systems.

[...] Read more.
MambaResp-KAN: A State Space Model with Kolmogorov–Arnold Networks and Diffusion-Based Augmentation for Explainable Respiratory Disease Classification

By Mohammed Tawfik

DOI: https://doi.org/10.5815/ijigsp.2026.03.10, Pub. Date: 8 Jun. 2026

Automated respiratory disease classification from auscultation sounds holds transformative potential for early clinical screening, yet existing approaches remain constrained by the quadratic complexity of Transformer-based sequence encoders, the limited expressiveness of conventional multi-layer perceptron classifiers, and the persistent challenge of scarce annotated medical audio data. This paper presents MambaResp-KAN, a novel architecture that unifies Bidirectional Mamba state space models, Kolmogorov–Arnold Network classifiers with learnable B-spline activation functions, multi-modal gated cross-attention fusion of WavLM, BEATs, and handcrafted spectral features, and class-conditional denoising diffusion probabilistic model augmentation into a single end-to-end framework for explainable respiratory sound analysis. The Bidirectional Mamba encoder achieves linear-time sequence modeling through input-dependent selective state space discretization, processing forward and reverses temporal streams with gated aggregation to capture both causal and anti-causal dependencies in respiratory waveforms. The Kolmogorov–Arnold Network classifier replaces fixed-activation neurons with learnable univariate B-spline functions on each network edge, directly grounded in the Kolmogorov–Arnold representation theorem, yielding a classifier that is both more parameter-efficient and intrinsically interpretable than standard multi-layer perceptrons. A gated cross-modal attention mechanism fuses embeddings from the self-supervised WavLM and BEATs audio encoders with handcrafted MFCC and spectral features, while a class-conditional denoising diffusion probabilistic model synthesizes high-fidelity respiratory audio to alleviate class imbalance. Integrated Gradients attribution and KAN concept bottleneck analysis provide clinician-interpretable explanations of model decisions. Evaluated on two benchmark datasets, Asthma Detection V2 (five classes, 1,211 samples) and KAUH (four classes, 940 samples), MambaResp-KAN achieves classification accuracies of 99.6% and 99.4%, respectively, surpassing the prior state-of-the-art E-RespiNet by 0.7 and 0.6 percentage points while using 62% fewer parameters and reducing inference latency by 56.3%. Cross-dataset evaluation yields an average accuracy of 84.0% with a generalization gap of 15.8%, compared to 23.3% for E-RespiNet, confirming improved transferability across clinical institutions.

[...] Read more.
SWT-PnP-DnCNN: Medical Image Fusion Using Stationary Wavelet Transform and Plug-and-Play Deep Denoising Model

By Amit Pandey Prabhishek Singh Akansha Singh Achyut Shankar Manoj Diwakar

DOI: https://doi.org/10.5815/ijigsp.2026.03.11, Pub. Date: 8 Jun. 2026

This paper presents a hybrid medical image fusion (MIF) technique (SWT-PnP-DnCNN) that combines multiscale decomposition, spatial-frequency-driven fusion, and deep denoising priors to efficiently integrate MIF images. The SWT-PnP-DnCNN begins with the Stationary Wavelet Transform (SWT) to decompose input medical images into low-frequency (LFSBs) and high-frequency (HFSBs) subbands. The LFSBs are fused using spatial frequency-based weighted averaging, effectively integrating overall intensity and contrast information. For the HFSBs, a local energy and max-selection strategy is adopted to retain salient edge features from the source images. Following the initial fusion, a Plug-and-Play (PnP) optimization strategy is applied to improve this fused image. This step uses a pretrained DnCNN model as a deep denoiser, serving as an implicit image prior in a model-driven iterative framework. Each iteration alternates between a data consistency step and a denoising step, significantly reducing artifacts and enhancing structural fidelity in the result. The effectiveness of SWT-PnP-DnCNN is demonstrated on benchmark CT-MRI, MRI-PET, and MRI-PET datasets. Extensive evaluation against classical hybrid strategies and recent CNN-based fusion methods shows that SWT-PnP-DnCNN achieves the best performance across standard metrics. We further include mean±std reporting and paired t-tests, confirming statistically significant improvements (p < 0.05). Ablation studies validate each design choice by comparing SWT-only vs. SWT+PnP and evaluating denoiser alternatives, with sensitivity to PnP iterations, regularization strength, and SWT levels. The runtime analysis clarifies feasible deployment, particularly in offline or cloud-based environments. Overall, SWT-PnP-DnCNN emerges as a robust, interpretable, and clinically valuable solution for enhancing MIF in medical imaging applications.

[...] Read more.
Multi-channel Prediction Residue Modeling (MPRM) Using Second Order Residual Statistics for Enhanced CFA Artifact Based Forgery Detection

By Somendra Kumar Soni Mohammad Rafique Khan Vinay Kumar Singh

DOI: https://doi.org/10.5815/ijigsp.2026.03.12, Pub. Date: 8 Jun. 2026

In recent years advanced image editing tools are easily available to tamper the images in visually undetectable form. This created an urgent need of reliable and robust technique to authenticate image integrity. Digital camera produces the full color image through interpolating remaining channels which creates periodic artifacts known as Color Filter Array Artifacts (CFAA). In forged image these artifact consistency is disturbed, which is often used to detect and localize the forgery in tampered images. Existing CFAA based splicing detection methods often rely on single channel, exhibit high computational complexity and show degraded performance under JPEG compression. Although some work have explored multi-channel CFA based approaches but their ability to effectively capture cross channel dependencies and maintain robustness under heavy JPEG compression remains limited. To address these gaps, we propose a splicing detection framework that performs second order statistical analysis on residuals extracted from all color channels. Unlike existing multichannel CFAA techniques, this work explicitly models inter-channel relationships through the Error Variance Ratio (EVR) and introduces a novel Inter Block Mean Square Error (IBMSE) metric. This formulation enhances the characterization of CFAA periodicity and improves discrimination between authentic and tampered regions. The proposed technique is evaluated on CUISDE, RTD and IMD datasets and compared with existing CFA based localization methods using ROC, precision-recall and AUC metric. Experimental results demonstrate that the proposed method improves localization performance and shows robustness against varying levels of JPEG compression. 

[...] Read more.
Edibility Detection of Mushroom Using Ensemble Methods

By Nusrat Jahan Pinky S.M. Mohidul Islam Rafia Sharmin Alice

DOI: https://doi.org/10.5815/ijigsp.2019.04.05, Pub. Date: 8 Apr. 2019

Mushrooms are the most familiar delicious food which is cholesterol free as well as rich in vitamins and minerals. Though nearly 45,000 species of mushrooms have been known throughout the world, most of them are poisonous and few are lethally poisonous. Identifying edible or poisonous mushroom through the naked eye is quite difficult. Even there is no easy rule for edibility identification using machine learning methods that work for all types of data. Our aim is to find a robust method for identifying mushrooms edibility with better performance than existing works. In this paper, three ensemble methods are used to detect the edibility of mushrooms: Bagging, Boosting, and random forest. By using the most significant features, five feature sets are made for making five base models of each ensemble method. The accuracy is measured for ensemble methods using five both fixed feature set-based models and randomly selected feature set based models, for two types of test sets. The result shows that better performance is obtained for methods made of fixed feature sets-based models than randomly selected feature set-based models. The highest accuracy is obtained for the proposed model-based random forest for both test sets.

[...] Read more.
Mobile-Based Skin Disease Diagnosis System Using Convolutional Neural Networks (CNN)

By M.W.P Maduranga Dilshan Nandasena

DOI: https://doi.org/10.5815/ijigsp.2022.03.05, Pub. Date: 8 Jun. 2022

This paper presents a design and development of an Artificial Intelligence (AI) based mobile application to detect the type of skin disease. Skin diseases are a serious hazard to everyone throughout the world. However, it is difficult to make accurate skin diseases diagnosis. In this work, Deep learning algorithms Convolution Neural Networks (CNN) is proposed to classify skin diseases on the HAM10000 dataset. An extensive review of research articles on object identification methods and a comparison of their relative qualities were given to find a method that would work well for detecting skin diseases. The CNN-based technique was recognized as the best method for identifying skin diseases. A mobile application, on the other hand, is built for quick and accurate action. By looking at an image of the afflicted area at the beginning of a skin illness, it assists patients and dermatologists in determining the kind of disease present. Its resilience in detecting the impacted region considerably faster with nearly 2x fewer computations than the standard MobileNet model results in low computing efforts. This study revealed that MobileNet with transfer learning yielding an accuracy of about 85% is the most suitable model for automatic skin disease identification. According to these findings, the suggested approach can assist general practitioners in quickly and accurately diagnosing skin diseases using the smart phone.

[...] Read more.
Evolutionary Image Enhancement Using Multi-Objective Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2014.01.09, Pub. Date: 8 Nov. 2013

Image Processing is the art of examining, identifying and judging the significances of the Images. Image enhancement refers to attenuation, or sharpening, of image features such as edgels, boundaries, or contrast to make the processed image more useful for analysis. Image enhancement procedures utilize the computers to provide good and improved images for study by the human interpreters. In this paper we proposed a novel method that uses the Genetic Algorithm with Multi-objective criteria to find more enhance version of images. The proposed method has been verified with benchmark images in Image Enhancement. The simple Genetic Algorithm may not explore much enough to find out more enhanced image. In the proposed method three objectives are taken in to consideration. They are intensity, entropy and number of edgels. Proposed algorithm achieved automatic image enhancement criteria by incorporating the objectives (intensity, entropy, edges). We review some of the existing Image Enhancement technique. We also compared the results of our algorithms with another Genetic Algorithm based techniques. We expect that further improvements can be achieved by incorporating linear relationship between some other techniques.

[...] Read more.
A Review of Self-supervised Learning Methods in the Field of Medical Image Analysis

By Jiashu Xu

DOI: https://doi.org/10.5815/ijigsp.2021.04.03, Pub. Date: 8 Aug. 2021

In the field of medical image analysis, supervised deep learning strategies have achieved significant development, while these methods rely on large labeled datasets. Self-Supervised learning (SSL) provides a new strategy to pre-train a neural network with unlabeled data. This is a new unsupervised learning paradigm that has achieved significant breakthroughs in recent years. So, more and more researchers are trying to utilize SSL methods for medical image analysis, to meet the challenge of assembling large medical datasets. To our knowledge, so far there still a shortage of reviews of self-supervised learning methods in the field of medical image analysis, our work of this article aims to fill this gap and comprehensively review the application of self-supervised learning in the medical field. This article provides the latest and most detailed overview of self-supervised learning in the medical field and promotes the development of unsupervised learning in the field of medical imaging. These methods are divided into three categories: context-based, generation-based, and contrast-based, and then show the pros and cons of each category and evaluates their performance in downstream tasks. Finally, we conclude with the limitations of the current methods and discussed the future direction.

[...] Read more.
Text Region Extraction: A Morphological Based Image Analysis Using Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2015.02.06, Pub. Date: 8 Jan. 2015

Image analysis belongs to the area of computer vision and pattern recognition. These areas are also a part of digital image processing, where researchers have a great attention in the area of content retrieval information from various types of images having complex background, low contrast background or multi-spectral background etc. These contents may be found in any form like texture data, shape, and objects. Text Region Extraction as a content from an mage is a class of problems in Digital Image Processing Applications that aims to provides necessary information which are widely used in many fields medical imaging, pattern recognition, Robotics, Artificial intelligent Transport systems etc. To extract the text data information has becomes a challenging task. Since, Text extraction are very useful for identifying and analysis the whole information about image, Therefore, In this paper, we propose a unified framework by combining morphological operations and Genetic Algorithms for extracting and analyzing the text data region which may be embedded in an image by means of variety of texts: font, size, skew angle, distortion by slant and tilt, shape of the object which texts are on, etc. We have established our proposed methods on gray level image sets and make qualitative and quantitative comparisons with other existing methods and concluded that proposed method is better than others.

[...] Read more.
Improving Retinal Image Quality Using the Contrast Stretching, Histogram Equalization, and CLAHE Methods with Median Filters

By Erwin Dwi Ratna Ningsih

DOI: https://doi.org/10.5815/ijigsp.2020.02.04, Pub. Date: 8 Apr. 2020

This paper performs three different contrast testing methods, namely contrast stretching, histogram equalization, and CLAHE using a median filter. Poor quality images will be corrected and performed with a median filter removal filter. STARE dataset images that use images with different contrast values for each image. For this reason, evaluating the results of the three parameters tested are; MSE, PSNR, and SSIM. With the gray level scale image and contrast stretching which stretches the pixel value by stretching the stretchlim technique with the MSE result are 9.15, PSNR is 42.14 dB, and SSIM is 0.88. And the HE method and median filter with the results of the average value of MSE is 18.67, PSNR is 41.33 dB, and SSIM is 0.77. Whereas for CLAHE and median filters the average yield of MSE is 28.42, PSNR is 35.30 dB, and SSIM is 0.86. From the test results, it can be seen that the proposed method has MSE and PSNR values as well as SSIM values. 

[...] Read more.
An Efficient Brain Tumor Detection Algorithm Using Watershed & Thresholding Based Segmentation

By Anam Mustaqeem Engr Ali Javed Tehseen Fatima

DOI: https://doi.org/10.5815/ijigsp.2012.10.05, Pub. Date: 28 Sep. 2012

During past few years, brain tumor segmentation in magnetic resonance imaging (MRI) has become an emergent research area in the ?eld of medical imaging system. Brain tumor detection helps in finding the exact size and location of tumor. An efficient algorithm is proposed in this paper for tumor detection based on segmentation and morphological operators. Firstly quality of scanned image is enhanced and then morphological operators are applied to detect the tumor in the scanned image.

[...] Read more.
Image Denoising based on Enhanced Wavelet Global Thresholding Using Intelligent Signal Processing Algorithm

By Joseph Isabona Agbotiname Lucky Imoize Stephen Ojo

DOI: https://doi.org/10.5815/ijigsp.2023.05.01, Pub. Date: 8 Oct. 2023

Denoising is a vital aspect of image preprocessing, often explored to eliminate noise in an image to restore its proper characteristic formation and clarity. Unfortunately, noise often degrades the quality of valuable images, making them meaningless for practical applications. Several methods have been deployed to address this problem, but the quality of the recovered images still requires enhancement for efficient applications in practice. In this paper, a wavelet-based universal thresholding technique that possesses the capacity to optimally denoise highly degraded noisy images with both uniform and non-uniform variations in illumination and contrast is proposed. The proposed method, herein referred to as the modified wavelet-based universal thresholding (MWUT), compared to three state-of-the-art denoising techniques, was employed to denoise five noisy images. In order to appraise the qualities of the images obtained, seven performance indicators comprising the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Structural Content (SC), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Method (SSIM), Signal-to-Reconstruction-Error Ratio (SRER), Blind Spatial Quality Evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) were employed. The first five indicators – RMSE, MAE, SC, PSNR, SSIM, and SRER- are reference indicators, while the remaining two – NIQE and BRISQUE- are referenceless. For the superior performance of the proposed wavelet threshold algorithm, the SC, PSNR, SSIM, and SRER must be higher, while lower values of NIQE, BRISQUE, RMSE, and MAE are preferred. A higher and better value of PSNR, SSIM, and SRER in the final results shows the superior performance of our proposed MWUT denoising technique over the preliminaries. Lower NIQE, BRISQUE, RMSE, and MAE values also indicate higher and better image quality results using the proposed modified wavelet-based universal thresholding technique over the existing schemes. The modified wavelet-based universal thresholding technique would find practical applications in digital image processing and enhancement.

[...] Read more.
A Review on Graph Based Segmentation

By K. Santle Camilus V.K. Govindan

DOI: https://doi.org/10.5815/ijigsp.2012.05.01, Pub. Date: 8 Jun. 2012

Image segmentation plays a crucial role in effective understanding of digital images. Past few decades saw hundreds of research contributions in this field. However, the research on the existence of general purpose segmentation algorithm that suits for variety of applications is still very much active. Among the many approaches in performing image segmentation, graph based approach is gaining popularity primarily due to its ability in reflecting global image properties. This paper critically reviews existing important graph based segmentation methods. The review is done based on the classification of various segmentation algorithms within the framework of graph based approaches. The major four categorizations we have employed for the purpose of review are: graph cut based methods, interactive methods, minimum spanning tree based methods and pyramid based methods. This review not only reveals the pros in each method and category but also explores its limitations. In addition, the review highlights the need for creating a database for benchmarking intensity based algorithms, and the need for further research in graph based segmentation for automated real time applications.

[...] Read more.
A Review on Image Reconstruction through MRI k-Space Data

By Tanuj Kumar Jhamb Vinith Rejathalal V.K. Govindan

DOI: https://doi.org/10.5815/ijigsp.2015.07.06, Pub. Date: 8 Jun. 2015

Image reconstruction is the process of generating an image of an object from the signals captured by the scanning machine. Medical imaging is an interdisciplinary field combining physics, biology, mathematics and computational sciences. This paper provides a complete overview of image reconstruction process in MRI (Magnetic Resonance Imaging). It reviews the computational aspect of medical image reconstruction. MRI is one of the commonly used medical imaging techniques. The data collected by MRI scanner for image reconstruction is called the k-space data. For reconstructing an image from k-space data, there are various algorithms such as Homodyne algorithm, Zero Filling method, Dictionary Learning, and Projections onto Convex Set method. All the characteristics of k-space data and MRI data collection technique are reviewed in detail. The algorithms used for image reconstruction discussed in detail along with their pros and cons. Various modern magnetic resonance imaging techniques like functional MRI, diffusion MRI have also been introduced. The concepts of classical techniques like Expectation Maximization, Sensitive Encoding, Level Set Method, and the recent techniques such as Alternating Minimization, Signal Modeling, and Sphere Shaped Support Vector Machine are also reviewed. It is observed that most of these techniques enhance the gradient encoding and reduce the scanning time. Classical algorithms provide undesirable blurring effect when the degree of phase variation is high in partial k-space. Modern reconstructions algorithms such as Dictionary learning works well even with high phase variation as these are iterative procedures.

[...] Read more.
Edibility Detection of Mushroom Using Ensemble Methods

By Nusrat Jahan Pinky S.M. Mohidul Islam Rafia Sharmin Alice

DOI: https://doi.org/10.5815/ijigsp.2019.04.05, Pub. Date: 8 Apr. 2019

Mushrooms are the most familiar delicious food which is cholesterol free as well as rich in vitamins and minerals. Though nearly 45,000 species of mushrooms have been known throughout the world, most of them are poisonous and few are lethally poisonous. Identifying edible or poisonous mushroom through the naked eye is quite difficult. Even there is no easy rule for edibility identification using machine learning methods that work for all types of data. Our aim is to find a robust method for identifying mushrooms edibility with better performance than existing works. In this paper, three ensemble methods are used to detect the edibility of mushrooms: Bagging, Boosting, and random forest. By using the most significant features, five feature sets are made for making five base models of each ensemble method. The accuracy is measured for ensemble methods using five both fixed feature set-based models and randomly selected feature set based models, for two types of test sets. The result shows that better performance is obtained for methods made of fixed feature sets-based models than randomly selected feature set-based models. The highest accuracy is obtained for the proposed model-based random forest for both test sets.

[...] Read more.
Evolutionary Image Enhancement Using Multi-Objective Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2014.01.09, Pub. Date: 8 Nov. 2013

Image Processing is the art of examining, identifying and judging the significances of the Images. Image enhancement refers to attenuation, or sharpening, of image features such as edgels, boundaries, or contrast to make the processed image more useful for analysis. Image enhancement procedures utilize the computers to provide good and improved images for study by the human interpreters. In this paper we proposed a novel method that uses the Genetic Algorithm with Multi-objective criteria to find more enhance version of images. The proposed method has been verified with benchmark images in Image Enhancement. The simple Genetic Algorithm may not explore much enough to find out more enhanced image. In the proposed method three objectives are taken in to consideration. They are intensity, entropy and number of edgels. Proposed algorithm achieved automatic image enhancement criteria by incorporating the objectives (intensity, entropy, edges). We review some of the existing Image Enhancement technique. We also compared the results of our algorithms with another Genetic Algorithm based techniques. We expect that further improvements can be achieved by incorporating linear relationship between some other techniques.

[...] Read more.
Image Denoising based on Enhanced Wavelet Global Thresholding Using Intelligent Signal Processing Algorithm

By Joseph Isabona Agbotiname Lucky Imoize Stephen Ojo

DOI: https://doi.org/10.5815/ijigsp.2023.05.01, Pub. Date: 8 Oct. 2023

Denoising is a vital aspect of image preprocessing, often explored to eliminate noise in an image to restore its proper characteristic formation and clarity. Unfortunately, noise often degrades the quality of valuable images, making them meaningless for practical applications. Several methods have been deployed to address this problem, but the quality of the recovered images still requires enhancement for efficient applications in practice. In this paper, a wavelet-based universal thresholding technique that possesses the capacity to optimally denoise highly degraded noisy images with both uniform and non-uniform variations in illumination and contrast is proposed. The proposed method, herein referred to as the modified wavelet-based universal thresholding (MWUT), compared to three state-of-the-art denoising techniques, was employed to denoise five noisy images. In order to appraise the qualities of the images obtained, seven performance indicators comprising the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Structural Content (SC), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Method (SSIM), Signal-to-Reconstruction-Error Ratio (SRER), Blind Spatial Quality Evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) were employed. The first five indicators – RMSE, MAE, SC, PSNR, SSIM, and SRER- are reference indicators, while the remaining two – NIQE and BRISQUE- are referenceless. For the superior performance of the proposed wavelet threshold algorithm, the SC, PSNR, SSIM, and SRER must be higher, while lower values of NIQE, BRISQUE, RMSE, and MAE are preferred. A higher and better value of PSNR, SSIM, and SRER in the final results shows the superior performance of our proposed MWUT denoising technique over the preliminaries. Lower NIQE, BRISQUE, RMSE, and MAE values also indicate higher and better image quality results using the proposed modified wavelet-based universal thresholding technique over the existing schemes. The modified wavelet-based universal thresholding technique would find practical applications in digital image processing and enhancement.

[...] Read more.
Mobile-Based Skin Disease Diagnosis System Using Convolutional Neural Networks (CNN)

By M.W.P Maduranga Dilshan Nandasena

DOI: https://doi.org/10.5815/ijigsp.2022.03.05, Pub. Date: 8 Jun. 2022

This paper presents a design and development of an Artificial Intelligence (AI) based mobile application to detect the type of skin disease. Skin diseases are a serious hazard to everyone throughout the world. However, it is difficult to make accurate skin diseases diagnosis. In this work, Deep learning algorithms Convolution Neural Networks (CNN) is proposed to classify skin diseases on the HAM10000 dataset. An extensive review of research articles on object identification methods and a comparison of their relative qualities were given to find a method that would work well for detecting skin diseases. The CNN-based technique was recognized as the best method for identifying skin diseases. A mobile application, on the other hand, is built for quick and accurate action. By looking at an image of the afflicted area at the beginning of a skin illness, it assists patients and dermatologists in determining the kind of disease present. Its resilience in detecting the impacted region considerably faster with nearly 2x fewer computations than the standard MobileNet model results in low computing efforts. This study revealed that MobileNet with transfer learning yielding an accuracy of about 85% is the most suitable model for automatic skin disease identification. According to these findings, the suggested approach can assist general practitioners in quickly and accurately diagnosing skin diseases using the smart phone.

[...] Read more.
Breast Cancer Classification from Ultrasound Images using VGG16 Model based Transfer Learning

By A. B. M. Aowlad Hossain Jannatul Kamrun Nisha Fatematuj Johora

DOI: https://doi.org/10.5815/ijigsp.2023.01.02, Pub. Date: 8 Feb. 2023

Ultrasound based breast screening is gaining attention recently especially for dense breast. The technological advancement, cancer awareness, and cost-safety-availability benefits lead rapid rise of breast ultrasound market. The irregular shape, intensity variation, and additional blood vessels of malignant cancer are distinguishable in ultrasound images from the benign phase. However, classification of breast cancer using ultrasound images is a difficult process owing to speckle noise and complex textures of breast. In this paper, a breast cancer classification method is presented using VGG16 model based transfer learning approach. We have used median filter to despeckle the images. The layers for convolution process of the pretrained VGG16 model along with the maxpooling layers have been used as feature extractor and a proposed fully connected two layers deep neural network has been designed as classifier. Adam optimizer is used with learning rate of 0.001 and binary cross-entropy is chosen as the loss function for model optimization. Dropout of hidden layers is used to avoid overfitting. Breast Ultrasound images from two databases (total 897 images) have been combined to train, validate and test the performance and generalization strength of the classifier. Experimental results showed the training accuracy as 98.2% and testing accuracy as 91% for blind testing data with a reduced of computational complexity. Gradient class activation mapping (Grad-CAM) technique has been used to visualize and check the targeted regions localization effort at the final convolutional layer and found as noteworthy. The outcomes of this work might be useful for the clinical applications of breast cancer diagnosis.

[...] Read more.
A Review of Self-supervised Learning Methods in the Field of Medical Image Analysis

By Jiashu Xu

DOI: https://doi.org/10.5815/ijigsp.2021.04.03, Pub. Date: 8 Aug. 2021

In the field of medical image analysis, supervised deep learning strategies have achieved significant development, while these methods rely on large labeled datasets. Self-Supervised learning (SSL) provides a new strategy to pre-train a neural network with unlabeled data. This is a new unsupervised learning paradigm that has achieved significant breakthroughs in recent years. So, more and more researchers are trying to utilize SSL methods for medical image analysis, to meet the challenge of assembling large medical datasets. To our knowledge, so far there still a shortage of reviews of self-supervised learning methods in the field of medical image analysis, our work of this article aims to fill this gap and comprehensively review the application of self-supervised learning in the medical field. This article provides the latest and most detailed overview of self-supervised learning in the medical field and promotes the development of unsupervised learning in the field of medical imaging. These methods are divided into three categories: context-based, generation-based, and contrast-based, and then show the pros and cons of each category and evaluates their performance in downstream tasks. Finally, we conclude with the limitations of the current methods and discussed the future direction.

[...] Read more.
Text Region Extraction: A Morphological Based Image Analysis Using Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2015.02.06, Pub. Date: 8 Jan. 2015

Image analysis belongs to the area of computer vision and pattern recognition. These areas are also a part of digital image processing, where researchers have a great attention in the area of content retrieval information from various types of images having complex background, low contrast background or multi-spectral background etc. These contents may be found in any form like texture data, shape, and objects. Text Region Extraction as a content from an mage is a class of problems in Digital Image Processing Applications that aims to provides necessary information which are widely used in many fields medical imaging, pattern recognition, Robotics, Artificial intelligent Transport systems etc. To extract the text data information has becomes a challenging task. Since, Text extraction are very useful for identifying and analysis the whole information about image, Therefore, In this paper, we propose a unified framework by combining morphological operations and Genetic Algorithms for extracting and analyzing the text data region which may be embedded in an image by means of variety of texts: font, size, skew angle, distortion by slant and tilt, shape of the object which texts are on, etc. We have established our proposed methods on gray level image sets and make qualitative and quantitative comparisons with other existing methods and concluded that proposed method is better than others.

[...] Read more.
Retinal Image Segmentation for Diabetic Retinopathy Detection using U-Net Architecture

By Swapnil V. Deshmukh Apash Roy Pratik Agrawal

DOI: https://doi.org/10.5815/ijigsp.2023.01.07, Pub. Date: 8 Feb. 2023

Diabetic retinopathy is one of the most serious eye diseases and can lead to permanent blindness if not diagnosed early. The main cause of this is diabetes. Not every diabetic will develop diabetic retinopathy, but the risk of developing diabetes is undeniable. This requires the early diagnosis of Diabetic retinopathy. Segmentation is one of the approaches which is useful for detecting the blood vessels in the retinal image. This paper proposed the three models based on a deep learning approach for recognizing blood vessels from retinal images using region-based segmentation techniques. The proposed model consists of four steps preprocessing, Augmentation, Model training, and Performance measure. The augmented retinal images are fed to the three models for training and finally, get the segmented image. The proposed three models are applied on publically available data set of DRIVE, STARE, and HRF. It is observed that more thin blood vessels are segmented on the retinal image in the HRF dataset using model-3. The performance of proposed three models is compare with other state-of-art-methods of blood vessels segmentation of DRIVE, STARE, and HRF datasets.

[...] Read more.
A Review on Image Reconstruction through MRI k-Space Data

By Tanuj Kumar Jhamb Vinith Rejathalal V.K. Govindan

DOI: https://doi.org/10.5815/ijigsp.2015.07.06, Pub. Date: 8 Jun. 2015

Image reconstruction is the process of generating an image of an object from the signals captured by the scanning machine. Medical imaging is an interdisciplinary field combining physics, biology, mathematics and computational sciences. This paper provides a complete overview of image reconstruction process in MRI (Magnetic Resonance Imaging). It reviews the computational aspect of medical image reconstruction. MRI is one of the commonly used medical imaging techniques. The data collected by MRI scanner for image reconstruction is called the k-space data. For reconstructing an image from k-space data, there are various algorithms such as Homodyne algorithm, Zero Filling method, Dictionary Learning, and Projections onto Convex Set method. All the characteristics of k-space data and MRI data collection technique are reviewed in detail. The algorithms used for image reconstruction discussed in detail along with their pros and cons. Various modern magnetic resonance imaging techniques like functional MRI, diffusion MRI have also been introduced. The concepts of classical techniques like Expectation Maximization, Sensitive Encoding, Level Set Method, and the recent techniques such as Alternating Minimization, Signal Modeling, and Sphere Shaped Support Vector Machine are also reviewed. It is observed that most of these techniques enhance the gradient encoding and reduce the scanning time. Classical algorithms provide undesirable blurring effect when the degree of phase variation is high in partial k-space. Modern reconstructions algorithms such as Dictionary learning works well even with high phase variation as these are iterative procedures.

[...] Read more.
Real-Time Video based Human Suspicious Activity Recognition with Transfer Learning for Deep Learning

By Indhumathi .J Balasubramanian .M Balasaigayathri .B

DOI: https://doi.org/10.5815/ijigsp.2023.01.05, Pub. Date: 8 Feb. 2023

Nowadays, the primary concern of any society is providing safety to an individual. It is very hard to recognize the human behaviour and identify whether it is suspicious or normal. Deep learning approaches paved the way for the development of various machine learning and artificial intelligence. The proposed system detects real-time human activity using a convolutional neural network. The objective of the study is to develop a real-time application for Activity recognition using with and without transfer learning methods. The proposed system considers criminal, suspicious and normal categories of activities. Differentiate suspicious behaviour videos are collected from different peoples(men/women). This proposed system is used to detect suspicious activities of a person. The novel 2D-CNN, pre-trained VGG-16 and ResNet50 is trained on video frames of human activities such as normal and suspicious behaviour. Similarly, the transfer learning in VGG16 and ResNet50 is trained using human suspicious activity datasets. The results show that the novel 2D-CNN, VGG16, and ResNet50 without transfer learning achieve accuracy of 98.96%, 97.84%, and 99.03%, respectively. In Kaggle/real-time video, the proposed system employing 2D-CNN outperforms the pre-trained model VGG16. The trained model is used to classify the activity in the real-time captured video. The performance obtained on ResNet50 with transfer learning accuracy of 99.18% is higher than VGG16 transfer learning accuracy of 98.36%. 

[...] Read more.