International Journal of Image, Graphics and Signal Processing (IJIGSP)

ISSN: 2074-9074 (Print)

ISSN: 2074-9082 (Online)

DOI: https://doi.org/10.5815/ijigsp

Website: https://www.mecs-press.org/ijigsp

Published By: MECS Press

Frequency: 6 issues per year

Number(s) Available: 142

(IJIGSP) in Google Scholar Citations / h5-index

IJIGSP is committed to bridge the theory and practice of images, graphics, and signal processing. From innovative ideas to specific algorithms and full system implementations, IJIGSP publishes original, peer-reviewed, and high quality articles in the areas of images, graphics, and signal processing. IJIGSP is a well-indexed scholarly journal and is indispensable reading and references for people working at the cutting edge of images, graphics, and signal processing applications.

 

IJIGSP has been abstracted or indexed by several world class databases: Scopus, Google Scholar, Microsoft Academic Search, CrossRef, Baidu Wenku, IndexCopernicus, IET Inspec, EBSCO, JournalSeek, ULRICH's Periodicals Directory, WorldCat, Scirus, Academic Journals Database, Stanford University Libraries, Cornell University Library, UniSA Library, CNKI Scholar, ProQuest, J-Gate, ZDB, BASE, OhioLINK, iThenticate, Open Access Articles, Open Science Directory, National Science Library of Chinese Academy of Sciences, The HKU Scholars Hub, etc..

Latest Issue
Most Viewed
Most Downloaded

IJIGSP Vol. 18, No. 2, Apr. 2026

REGULAR PAPERS

Employing Counterfactual Methods to Interpret Convolutional Network Findings in X-Ray Image Detection

By Maider Abad Eusebio Garcia Ferran Prados Jordi Casas-Roma

DOI: https://doi.org/10.5815/ijigsp.2026.02.01, Pub. Date: 8 Apr. 2026

In the rapidly evolving landscape of medical diagnostics, efficient and accurate tools for disease identification are crucial. This study analyzes three convolutional neural network (CNN) architectures—IRV2, ResNet50, and DenseNet121—pre-trained on ImageNet and RadImageNet datasets for respiratory disease diagnosis using chest radiographs. We used over 10,000 chest X-ray images, including COVID-19, pneumonia, and control cases, to train and evaluate these models. RadImageNet-trained models, particularly ResNet50, achieved superior performance with 94.49% accuracy, 93.92% sensitivity, and 95.59% precision compared to ImageNet-trained counterparts, though the improvement was not statistically significant in most cases. To enhance interpretability, we developed a counterfactual-based method generating visual explanations of critical areas influencing diagnostic outcomes. This approach, not requiring access to training data or model internals, identifies image parts that could change the predicted diagnosis if altered. It aids in understanding model reasoning and can correct misclassifications, successfully reclassifying up to 40.91% of previously misclassified images through our masking method. By providing clear, independent visual explanations, our method aims to foster trust in AI-assisted diagnoses among medical professionals. While preliminary results are promising, further validation with medical experts will help confirm the clinical relevance of the highlighted regions. This will strengthen the transparency and interpretability of AI decision-making in healthcare. The visual nature of these explanations offers a valuable tool for interpreting complex medical image classification models and may enhance the synergy between AI systems and human expertise in diagnostic processes.

[...] Read more.
Semantic Segmentation of Multispectral Satellite Images Using Residual Convolutional Networks

By Abhinav Chandra Anuradha Chetan Phadke Vaidehi Deshmukh

DOI: https://doi.org/10.5815/ijigsp.2026.02.02, Pub. Date: 8 Apr. 2026

Satellite imagery is always used to study spatial geographies to find water, residential, farmland, and forest lands; which can be further used for township development and planning, landscape detection etc. Semantic segmentation and image classification are the two crucial procedures in determining the spatial geographies. In order to improve the generalization ability of semantic segmentation algorithms, a combined model of UNet_ResNet is used in this paper. The engineered model is a type of Convolutional Neural Networks using GeoGANs which detects semantic patches in neural networks with smaller sizes and regional characteristics within a certain spatial and pixel scale. However, it faces a semantic segmentation challenge of identifying roadways in metropolitan areas. The model shows an accuracy score from 93% to 97.3% for image classification and segmentation purposes which fares better than the implementation of various existing architectures.

[...] Read more.
Scaling of Digital Images by Adaptive and Combined Application of Interpolation Algorithms

By Serhiy Balovsyak Mariana Borcha Yurii Hnatiuk Khrystyna Odaiska Ihor Fodchuk

DOI: https://doi.org/10.5815/ijigsp.2026.02.03, Pub. Date: 8 Apr. 2026

The article describes the theoretical foundations and software tools for scaling digital images by adaptive and combined application of bilinear and bicubic interpolation algorithms. An analysis of modern algorithms and image scaling tools has been performed. The theoretical foundations of image scaling using interpolation algorithms are described. The root mean square error between the pixel values of the original and scaled images was used as the scaling error. The scaling of images was performed by a complex of two interpolation algorithms. The first algorithm reduces the image scale, after which the second algorithm increases the scale. Such image processing is performed, in particular, in telecommunication systems for transmitting images at reduced scales. A correlation was found between the values of the average spatial period of the image and the relative scaling error, which is equal to the ratio of the scaling errors for different interpolation algorithms. The spatial period of the image was calculated based on its energy spectrum. A regression analysis was performed to determine the dependence of the relative scaling error on the spatial period of the images. It is found that in most cases bicubic interpolation provides a smaller scaling error, but for some images with small spatial period, bilinear interpolation provides a smaller error. It is proposed to increase the scaling accuracy by adaptively selecting the image interpolation algorithm depending on its spatial period. A combined application of interpolation algorithms was performed, which consists of reducing the scale using the bilinear interpolation algorithm and increasing the scale using the bicubic interpolation algorithm. A statistical analysis of the results of image scaling was performed. It was found that the combined application of algorithms in most cases provides a smaller error than the separate application of the bicubic and bilinear interpolation algorithms.

[...] Read more.
A Federated Learning Framework with Metaheuristic Optimization for Heart Disease Prediction

By Bhaskar Adepu T. Archana

DOI: https://doi.org/10.5815/ijigsp.2026.02.04, Pub. Date: 8 Apr. 2026

Due to lifestyle changes and daily behavioural routines of people living across the globe, cardiovascular diseases (CVD) are increasing in the modern world. In the treatment process, the prediction level of CVD is significantly required. Incorporating machine learning algorithms into CVD prediction can provide advantages such as reduced time consumption in the diagnostic process and improved decision-making. Hence, this research aims to implement a novel Lion-based Federated Learning for Disease Prediction (LbFLDP) technique to predict CVD. The novel approach includes three local hospital models and one centralized global model. The local models are trained using CVD dataset obtained from the kaggle website. After the training phase, the local models are used to predict CVD. These prediction features are then updated in the global model from the local models to enhance the prediction features in the global model. The global model is then initiated for predicting CVD. At this time, the performance of the suggested technique is evaluated in terms of accuracy, F-score, Precision, recall, and error rate. The proposed approach has 98.41 recall, 99.6% accuracy, 98.57 F-score, 98.57 precision, and 0.4% error rate.

[...] Read more.
Image Denoising in the Contourlet Domain Using RBF Network and Kalman Filter

By Rachid Benoudi Youssef Qaraai Mohamed Ouhda

DOI: https://doi.org/10.5815/ijigsp.2026.02.05, Pub. Date: 8 Apr. 2026

Image denoising remains a fundamental challenge in image processing, particularly when dealing with additive white gaussian noise (AWGN) that degrades visual quality and information content. This paper introduces a novel multi-stage denoising framework that uniquely combines Contourlet transform, radial basis function neural networks (RBFNN), and kalman filtering to effectively preserve important image features while removing noise. The contourlet transform first decomposes images into multi-resolution, directional subbands, providing a sparse representation that better captures geometric structures compared to traditional wavelet approaches. We then employ an RBFNN trained through back-propagation to adaptively threshold the contourlet coefficients based on local image characteristics and noise levels. Finally, kalman filtering is applied as a post-processing step to further suppress residual noise artifacts. Comprehensive experiments conducted on standard benchmark datasets demonstrate that our approach outperforms several state-of-the-art methods, including BM3D and recent deep learning-based techniques, particularly at moderate to high noise levels (σ ≥ 15). Quantitative evaluations show our method achieves superior PSNR improvements of up to 2.4dB and SSIM improvements of 0.12 compared to recent competing approaches, while qualitative results confirm better preservation of edges and textural details. The proposed framework offers an effective balance between computational efficiency and denoising performance, making it suitable for various practical applications.

[...] Read more.
KanAVNet: A CNN-BiLSTM-CTC-Based Audio-Visual Speech Recognition System for Kannada to Assist the Hearing Impaired

By Divya Suresha D.

DOI: https://doi.org/10.5815/ijigsp.2026.02.06, Pub. Date: 8 Apr. 2026

This research outlines a comprehensive dual-modality speech recognition system designed specifically to support hearing-impaired students in understanding spoken Kannada through synchronized processing of auditory signals and visual articulatory cues. The approach capitalizes on deep learning capabilities to improve performance to extract speech-related features from spectrograms and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip movement discriminative features via CNNs and Temporal Convolutional Networks (TCNs) for visual input. A hybrid architecture, KanAVNet (Kannada Audio-Visual Network), based on a CNN–BiLSTM framework is integrated with a Connectionist Temporal Classification (CTC) loss function to enable robust sequence-to-sequence mapping while addressing temporal alignment challenges in audio-visual speech recognition. The system is fitted on a custom-developed Kannada audiovisual dataset, addressing the scarcity of regional-language AVSR resources. Empirical evidence shows that the model performs with a high degree of accuracy of 93.2%, a Word Error Rate (WER) of 9.8%, and an F1 score of 91.2%, outperforming baseline unimodal and existing multimodal models. This research highlights the effectiveness of multimodal fusion strategies in noisy environments and showcases the potential of AI-driven tools in promoting accessible and inclusive education for students with auditory impairments.

[...] Read more.
Nested U-Net-Based Speech Enhancement with Multi-Scale Feature Extraction and Dual-Path Time-Frequency Feature Modeling

By Shaik Areefa Begam Sunnydayal Vanambathina

DOI: https://doi.org/10.5815/ijigsp.2026.02.07, Pub. Date: 8 Apr. 2026

Speech enhancement plays a vital role in improving the perceptual quality and intelligibility of speech signals degraded by environmental noise, particularly in modern network-based and signal processing systems. Traditional U-Net architectures capture local spectral details effectively but struggle to model long-range dependencies and may propagate residual noise through skip connections. Transformer-based models provide strong global context modeling but often fail to retain fine-grained spectral cues. To overcome these limitations, this paper presents a Nested U-Net–based network-oriented speech enhancement framework that incorporates Multi-Scale Feature Extraction, Feature Calibration, and a Dual-Path Higher-Order Information Interaction with Time-Frequency Attention module. The Multi-Scale Feature Extraction blocks in both encoder and decoder extract multi-resolution spectral patterns, while the nested topology strengthens hierarchical feature reuse. At the bottleneck, a stack of four Dual-Path Higher-Order Information Interaction with Time-Frequency Attention modules captures long-range temporal and spectral dependencies, and feature calibration adaptively filters encoder features to reduce noise transfer. Extensive experiments on Common Voice and LibriSpeech datasets demonstrate that the proposed model achieves superior perceptual evaluation of speech quality, short-time objective intelligibility, and signal-to-distortion ratio scores, particularly under moderate (0dB) signal-to-noise ratio conditions. The results confirm that the framework provides robust enhancement performance and consistently outperforms several recent state-of-the-art methods in terms of speech quality, intelligibility, and noise suppression.

[...] Read more.
A Robust Hybrid Deep Learning Model for Multiclass Depression Classification from Speech Audio

By Neny Sulistianingsih Galih Hendro Martono

DOI: https://doi.org/10.5815/ijigsp.2026.02.08, Pub. Date: 8 Apr. 2026

Depression remains one of the most prevalent and underdiagnosed mental health disorders globally, necessitating scalable, objective, and non-invasive diagnostic tools. Speech, as a rich biomarker of emotional and psychological states, offers a promising avenue for automated depression detection. This study proposes a robust hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer architectures to classify depression severity into three levels: normal, mild, and severe. Using a curated multimodal dataset comprising 400 labeled audio recordings, we extract comprehensive acoustic features, including MFCC, Chroma, Spectrogram, Contrast, and Tonnetz representations. Models are evaluated using precision, recall, F1-score, and accuracy. Experimental results show that the proposed hybrid models outperform traditional architectures, achieving up to 99% accuracy and strong generalization across all classes. This study demonstrates the potential of attention-enhanced hybrid architectures in mental health assessment and provides a foundation for future deployment in clinical and real-world settings. Future work includes multimodal fusion with EEG data and the implementation of explainable AI for clinical interpretability.

[...] Read more.
Crowd Behaviour Analysis for Enhanced Event Safety and Management

By Evangeline D. Parkavi A. Jatin B. Manoj S. Pannaga N. Sanjeev G.

DOI: https://doi.org/10.5815/ijigsp.2026.02.09, Pub. Date: 8 Apr. 2026

Addressing crowd control and safety at large-scale events is the central focus of this study. The proposed methodology is tested on ShanghaiTechA, ShanghaiTechB and UCF CC50 datasets. Apart from VGG-16 referred as the baseline model, the study utilizes a Convolutional Neural Network (CNN) model like VGG with dilatable layers and Atrous Spatial Pyramid Pooling (ASPP) layers on these datasets to identify every individual in the crowd by their heads. Furthermore, optical flow analysis identifies fast-moving pixels, facilitating the detection of rapid movements within the crowd. YOLO tracking is additionally employed to monitor the direction of object movement within the crowd. By integrating these methodologies, the study aims to enhance overall safety and security of individuals in the crowd. VGG with dilatable layers gives the least Mean Absolute Error for ShanghaiTechA and ShanghaiTechB datasets. The ASPP approach demonstrates approximately 15% higher accuracy on average compared to the baseline model for the ShanghaiTechA and UCF CC 50 datasets.

[...] Read more.
Information Technology for VR Training Evaluation with First Aid Skills Improvement to Detecting Human Resource Damage in Emergencies based on Behavioural Methods

By Sofia Chyrun Victoria Vysotska Lyubomyr Chyrun Dmytro Uhryn Zhengbing Hu Yurii Ushenko

DOI: https://doi.org/10.5815/ijigsp.2026.02.10, Pub. Date: 8 Apr. 2026

Traditional first aid preparation methods often fail to reproduce realistic stress levels and to simulate visual difficulty in identifying lesions in critical situations. In emergencies, delays in recognising injuries or errors in protocols result in critical losses of human resources. The use of computer graphics and virtual reality technologies enables you to create a safe yet highly realistic environment for rescuers to test and improve their skills. The article presents an integrated methodological framework for assessing the effectiveness of VR first-aid training in conditions of damage to civilian infrastructure. The main focus is on developing mathematical models and algorithms to identify and evaluate the quality of rescuers' actions by analysing digital interaction signals in a virtual environment. A composite efficiency indicator is proposed that combines normalised parameters for reaction time, manipulation accuracy, stress level, and immersion. The work aims to formalise a mathematical model to assess the effectiveness of VR training in developing skills for lesion identification and first aid provision, using quantitative metrics. The study aims to identify statistically significant differences in learning speed and skill retention between groups using VR simulations and traditional methods. The project aims to validate innovative content creation methods, including mobile photogrammetry, to visualise damaged infrastructure and victim models. The study used a comprehensive approach that includes mobile photogrammetry and generative neural networks to create a library of 3D assets with varying degrees of detail. Performance score is based on composite indicator that integrates normalised data on reaction time, manipulation accuracy, error count, immersion rate. Linear mixed models, exponential approximations, and bootstrap estimation of effect stability were used to analyse hierarchical data and individual learning trajectories. The experimental part includes the use of mobile photogrammetry and generative neural networks to create realistic 3D models of affected environments and identify types of injuries (bleeding, burns, unconsciousness). To analyse the dynamics of learning and maintaining skills, models with mixed effects and exponential forgetting curves are used. The results confirm that the use of VR technologies provides a statistically significant acceleration in the development of automated skills for lesion identification and assistance compared to traditional methods. The proposed approach is a scalable tool for preparing civil and rescue services to act in critical situations. Experimental data showed that the integral performance score in the VR group increased from 0.42  0.10 to 0.76  0.08, while in the control group it increased from only 0.40  0.09 to 0.55  0.10 (p < 0.001). The largest effect was observed in the bleeding arrest scenario, where the effect size (Cohen's d) reached 2.3. The analysis of forgetting curves confirmed the superiority of VR: the skill loss rate in the VR group was 0.25, providing knowledge retention 1.8 times longer than in the control group (0.45). The study confirmed that VR simulations significantly accelerate the formation of automated behaviour patterns and reduce reaction time in extreme conditions. The proposed mathematical assessment model provides objective feedback and standardisation of the rescue training process. The results indicate the high practical value of introducing such tools into training programs for civilian and military structures to minimise losses in real emergencies.

[...] Read more.
A Data-Driven Temporal Framework for Water Consumption Monitoring with Spatial Visualization Using K-Means and STL-LSTM

By Salsabila Septi Sukmayanti Sudianto Sudianto Aminatus Saadah

DOI: https://doi.org/10.5815/ijigsp.2026.02.11, Pub. Date: 8 Apr. 2026

The water distribution sector in Indonesia still faces challenges in detecting leaks early due to manual data checks that are time-consuming and labor-intensive. PDAM (Regional Water Company) Tirta Wijaya Cilacap, Indonesia, faces similar problems. This study aims to implement a spatial customer prediction model to detect customer water usage and support data-driven operational decision-making. K-Means clustering groups customers by consumption patterns and geographic location, achieving a Silhouette Score of 0.4473 and a Davies–Bouldin Index of 0.7658, which indicates reasonably well-separated clusters in real-world data. In addition, water consumption forecasting was carried out with Seasonal–Trend Decomposition using Loess–Long Short-Term Memory (STL–LSTM) to predict trends and seasonality of water usage for each Customer Connection ID (CCID). The forecasting performance varies across CCIDs; the best case achieves an R2 of up to 0.95, while low-performing cases are discussed to clarify conditions where STL–LSTM is less reliable. The forecasting and clustering outputs are presented through a spatial visualization (map) of water-consumption categories and model results to support identifying areas that may require closer inspection for potential leakage and waste. This research contributes to strengthening technology-based public infrastructure, in line with SDG 9: Industry, Innovation, and Infrastructure, to promote sustainable water management.

[...] Read more.
Classification and Shelf Life Prediction of Bananas Using Thermal Imaging with Vision Transformer and Random Forest

By Amey Kulkarni Sejal Pathrabe Hans Gupta Gajanan K. Birajdar Sangita Chaudhari

DOI: https://doi.org/10.5815/ijigsp.2026.02.12, Pub. Date: 8 Apr. 2026

Classifying and predicting banana shelf life is vital for optimizing storage and distribution in agriculture. Traditional methods, relying on subjective visual inspection, are inconsistent and time-intensive. This study presents a new, non-destructive approach combining thermal imaging, and machine learning to classify naturally ripened and artificially ripened bananas and forecast their shelf life. Preprocessed thermal images are flattened, segmented into fixed-size patches, and then linearly projected into feature tokens. Position embeddings are incorporated to retain spatial information, and the sequence is processed by a Vision Transformer (ViT) encoder, which leverages self-attention mechanisms to model relationships between patches. The [CLS] token output is subsequently processed through fully connected layers for final classification, achieving 97.59% accuracy. Validation using t-SNE visualization demonstrated clear class separability, and receiver operating characteristic (ROC) curves confirmed robust performance. With an MSE of 0.10, MAE of 0.18, and R2 score of 0.85, the random forest algorithm performed exceptionally well at predicting the shelf life of artificially ripened bananas. This approach offers significant advantages, including improved accuracy, reduced subjectivity, and efficiency in data processing. By integrating thermal imaging with advanced models, the proposed method enhances agricultural supply chain management and promotes precision in ripening classification and shelf life prediction.

[...] Read more.
Edibility Detection of Mushroom Using Ensemble Methods

By Nusrat Jahan Pinky S.M. Mohidul Islam Rafia Sharmin Alice

DOI: https://doi.org/10.5815/ijigsp.2019.04.05, Pub. Date: 8 Apr. 2019

Mushrooms are the most familiar delicious food which is cholesterol free as well as rich in vitamins and minerals. Though nearly 45,000 species of mushrooms have been known throughout the world, most of them are poisonous and few are lethally poisonous. Identifying edible or poisonous mushroom through the naked eye is quite difficult. Even there is no easy rule for edibility identification using machine learning methods that work for all types of data. Our aim is to find a robust method for identifying mushrooms edibility with better performance than existing works. In this paper, three ensemble methods are used to detect the edibility of mushrooms: Bagging, Boosting, and random forest. By using the most significant features, five feature sets are made for making five base models of each ensemble method. The accuracy is measured for ensemble methods using five both fixed feature set-based models and randomly selected feature set based models, for two types of test sets. The result shows that better performance is obtained for methods made of fixed feature sets-based models than randomly selected feature set-based models. The highest accuracy is obtained for the proposed model-based random forest for both test sets.

[...] Read more.
Mobile-Based Skin Disease Diagnosis System Using Convolutional Neural Networks (CNN)

By M.W.P Maduranga Dilshan Nandasena

DOI: https://doi.org/10.5815/ijigsp.2022.03.05, Pub. Date: 8 Jun. 2022

This paper presents a design and development of an Artificial Intelligence (AI) based mobile application to detect the type of skin disease. Skin diseases are a serious hazard to everyone throughout the world. However, it is difficult to make accurate skin diseases diagnosis. In this work, Deep learning algorithms Convolution Neural Networks (CNN) is proposed to classify skin diseases on the HAM10000 dataset. An extensive review of research articles on object identification methods and a comparison of their relative qualities were given to find a method that would work well for detecting skin diseases. The CNN-based technique was recognized as the best method for identifying skin diseases. A mobile application, on the other hand, is built for quick and accurate action. By looking at an image of the afflicted area at the beginning of a skin illness, it assists patients and dermatologists in determining the kind of disease present. Its resilience in detecting the impacted region considerably faster with nearly 2x fewer computations than the standard MobileNet model results in low computing efforts. This study revealed that MobileNet with transfer learning yielding an accuracy of about 85% is the most suitable model for automatic skin disease identification. According to these findings, the suggested approach can assist general practitioners in quickly and accurately diagnosing skin diseases using the smart phone.

[...] Read more.
Evolutionary Image Enhancement Using Multi-Objective Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2014.01.09, Pub. Date: 8 Nov. 2013

Image Processing is the art of examining, identifying and judging the significances of the Images. Image enhancement refers to attenuation, or sharpening, of image features such as edgels, boundaries, or contrast to make the processed image more useful for analysis. Image enhancement procedures utilize the computers to provide good and improved images for study by the human interpreters. In this paper we proposed a novel method that uses the Genetic Algorithm with Multi-objective criteria to find more enhance version of images. The proposed method has been verified with benchmark images in Image Enhancement. The simple Genetic Algorithm may not explore much enough to find out more enhanced image. In the proposed method three objectives are taken in to consideration. They are intensity, entropy and number of edgels. Proposed algorithm achieved automatic image enhancement criteria by incorporating the objectives (intensity, entropy, edges). We review some of the existing Image Enhancement technique. We also compared the results of our algorithms with another Genetic Algorithm based techniques. We expect that further improvements can be achieved by incorporating linear relationship between some other techniques.

[...] Read more.
A Review of Self-supervised Learning Methods in the Field of Medical Image Analysis

By Jiashu Xu

DOI: https://doi.org/10.5815/ijigsp.2021.04.03, Pub. Date: 8 Aug. 2021

In the field of medical image analysis, supervised deep learning strategies have achieved significant development, while these methods rely on large labeled datasets. Self-Supervised learning (SSL) provides a new strategy to pre-train a neural network with unlabeled data. This is a new unsupervised learning paradigm that has achieved significant breakthroughs in recent years. So, more and more researchers are trying to utilize SSL methods for medical image analysis, to meet the challenge of assembling large medical datasets. To our knowledge, so far there still a shortage of reviews of self-supervised learning methods in the field of medical image analysis, our work of this article aims to fill this gap and comprehensively review the application of self-supervised learning in the medical field. This article provides the latest and most detailed overview of self-supervised learning in the medical field and promotes the development of unsupervised learning in the field of medical imaging. These methods are divided into three categories: context-based, generation-based, and contrast-based, and then show the pros and cons of each category and evaluates their performance in downstream tasks. Finally, we conclude with the limitations of the current methods and discussed the future direction.

[...] Read more.
Text Region Extraction: A Morphological Based Image Analysis Using Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2015.02.06, Pub. Date: 8 Jan. 2015

Image analysis belongs to the area of computer vision and pattern recognition. These areas are also a part of digital image processing, where researchers have a great attention in the area of content retrieval information from various types of images having complex background, low contrast background or multi-spectral background etc. These contents may be found in any form like texture data, shape, and objects. Text Region Extraction as a content from an mage is a class of problems in Digital Image Processing Applications that aims to provides necessary information which are widely used in many fields medical imaging, pattern recognition, Robotics, Artificial intelligent Transport systems etc. To extract the text data information has becomes a challenging task. Since, Text extraction are very useful for identifying and analysis the whole information about image, Therefore, In this paper, we propose a unified framework by combining morphological operations and Genetic Algorithms for extracting and analyzing the text data region which may be embedded in an image by means of variety of texts: font, size, skew angle, distortion by slant and tilt, shape of the object which texts are on, etc. We have established our proposed methods on gray level image sets and make qualitative and quantitative comparisons with other existing methods and concluded that proposed method is better than others.

[...] Read more.
Image Denoising based on Enhanced Wavelet Global Thresholding Using Intelligent Signal Processing Algorithm

By Joseph Isabona Agbotiname Lucky Imoize Stephen Ojo

DOI: https://doi.org/10.5815/ijigsp.2023.05.01, Pub. Date: 8 Oct. 2023

Denoising is a vital aspect of image preprocessing, often explored to eliminate noise in an image to restore its proper characteristic formation and clarity. Unfortunately, noise often degrades the quality of valuable images, making them meaningless for practical applications. Several methods have been deployed to address this problem, but the quality of the recovered images still requires enhancement for efficient applications in practice. In this paper, a wavelet-based universal thresholding technique that possesses the capacity to optimally denoise highly degraded noisy images with both uniform and non-uniform variations in illumination and contrast is proposed. The proposed method, herein referred to as the modified wavelet-based universal thresholding (MWUT), compared to three state-of-the-art denoising techniques, was employed to denoise five noisy images. In order to appraise the qualities of the images obtained, seven performance indicators comprising the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Structural Content (SC), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Method (SSIM), Signal-to-Reconstruction-Error Ratio (SRER), Blind Spatial Quality Evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) were employed. The first five indicators – RMSE, MAE, SC, PSNR, SSIM, and SRER- are reference indicators, while the remaining two – NIQE and BRISQUE- are referenceless. For the superior performance of the proposed wavelet threshold algorithm, the SC, PSNR, SSIM, and SRER must be higher, while lower values of NIQE, BRISQUE, RMSE, and MAE are preferred. A higher and better value of PSNR, SSIM, and SRER in the final results shows the superior performance of our proposed MWUT denoising technique over the preliminaries. Lower NIQE, BRISQUE, RMSE, and MAE values also indicate higher and better image quality results using the proposed modified wavelet-based universal thresholding technique over the existing schemes. The modified wavelet-based universal thresholding technique would find practical applications in digital image processing and enhancement.

[...] Read more.
An Efficient Brain Tumor Detection Algorithm Using Watershed & Thresholding Based Segmentation

By Anam Mustaqeem Engr Ali Javed Tehseen Fatima

DOI: https://doi.org/10.5815/ijigsp.2012.10.05, Pub. Date: 28 Sep. 2012

During past few years, brain tumor segmentation in magnetic resonance imaging (MRI) has become an emergent research area in the ?eld of medical imaging system. Brain tumor detection helps in finding the exact size and location of tumor. An efficient algorithm is proposed in this paper for tumor detection based on segmentation and morphological operators. Firstly quality of scanned image is enhanced and then morphological operators are applied to detect the tumor in the scanned image.

[...] Read more.
Improving Retinal Image Quality Using the Contrast Stretching, Histogram Equalization, and CLAHE Methods with Median Filters

By Erwin Dwi Ratna Ningsih

DOI: https://doi.org/10.5815/ijigsp.2020.02.04, Pub. Date: 8 Apr. 2020

This paper performs three different contrast testing methods, namely contrast stretching, histogram equalization, and CLAHE using a median filter. Poor quality images will be corrected and performed with a median filter removal filter. STARE dataset images that use images with different contrast values for each image. For this reason, evaluating the results of the three parameters tested are; MSE, PSNR, and SSIM. With the gray level scale image and contrast stretching which stretches the pixel value by stretching the stretchlim technique with the MSE result are 9.15, PSNR is 42.14 dB, and SSIM is 0.88. And the HE method and median filter with the results of the average value of MSE is 18.67, PSNR is 41.33 dB, and SSIM is 0.77. Whereas for CLAHE and median filters the average yield of MSE is 28.42, PSNR is 35.30 dB, and SSIM is 0.86. From the test results, it can be seen that the proposed method has MSE and PSNR values as well as SSIM values. 

[...] Read more.
Restoration of Degraded Gray Images Using Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2016.03.04, Pub. Date: 8 Mar. 2016

This Image deblurring aims to eliminate or decrease the degradations that has been occurred while the image has been obtained. In this paper, we proposed a unified framework for restoration process by enhancement and more quantified deblurred images with the help of Genetic Algorithm. The developed method uses an iterative procedure using evolutionary criteria and produce better images with most restored frequency-content. We have compared the proposed methods with Lucy-Richardson Restoration method, method proposed by W. Dong [34] and Inverse Filter Restoration Method; and demonstrated that the proposed method is more accurate by achieving high quality visualized restored images in terms of various statistical quality measures.

[...] Read more.
A Review on Graph Based Segmentation

By K. Santle Camilus V.K. Govindan

DOI: https://doi.org/10.5815/ijigsp.2012.05.01, Pub. Date: 8 Jun. 2012

Image segmentation plays a crucial role in effective understanding of digital images. Past few decades saw hundreds of research contributions in this field. However, the research on the existence of general purpose segmentation algorithm that suits for variety of applications is still very much active. Among the many approaches in performing image segmentation, graph based approach is gaining popularity primarily due to its ability in reflecting global image properties. This paper critically reviews existing important graph based segmentation methods. The review is done based on the classification of various segmentation algorithms within the framework of graph based approaches. The major four categorizations we have employed for the purpose of review are: graph cut based methods, interactive methods, minimum spanning tree based methods and pyramid based methods. This review not only reveals the pros in each method and category but also explores its limitations. In addition, the review highlights the need for creating a database for benchmarking intensity based algorithms, and the need for further research in graph based segmentation for automated real time applications.

[...] Read more.
Edibility Detection of Mushroom Using Ensemble Methods

By Nusrat Jahan Pinky S.M. Mohidul Islam Rafia Sharmin Alice

DOI: https://doi.org/10.5815/ijigsp.2019.04.05, Pub. Date: 8 Apr. 2019

Mushrooms are the most familiar delicious food which is cholesterol free as well as rich in vitamins and minerals. Though nearly 45,000 species of mushrooms have been known throughout the world, most of them are poisonous and few are lethally poisonous. Identifying edible or poisonous mushroom through the naked eye is quite difficult. Even there is no easy rule for edibility identification using machine learning methods that work for all types of data. Our aim is to find a robust method for identifying mushrooms edibility with better performance than existing works. In this paper, three ensemble methods are used to detect the edibility of mushrooms: Bagging, Boosting, and random forest. By using the most significant features, five feature sets are made for making five base models of each ensemble method. The accuracy is measured for ensemble methods using five both fixed feature set-based models and randomly selected feature set based models, for two types of test sets. The result shows that better performance is obtained for methods made of fixed feature sets-based models than randomly selected feature set-based models. The highest accuracy is obtained for the proposed model-based random forest for both test sets.

[...] Read more.
Evolutionary Image Enhancement Using Multi-Objective Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2014.01.09, Pub. Date: 8 Nov. 2013

Image Processing is the art of examining, identifying and judging the significances of the Images. Image enhancement refers to attenuation, or sharpening, of image features such as edgels, boundaries, or contrast to make the processed image more useful for analysis. Image enhancement procedures utilize the computers to provide good and improved images for study by the human interpreters. In this paper we proposed a novel method that uses the Genetic Algorithm with Multi-objective criteria to find more enhance version of images. The proposed method has been verified with benchmark images in Image Enhancement. The simple Genetic Algorithm may not explore much enough to find out more enhanced image. In the proposed method three objectives are taken in to consideration. They are intensity, entropy and number of edgels. Proposed algorithm achieved automatic image enhancement criteria by incorporating the objectives (intensity, entropy, edges). We review some of the existing Image Enhancement technique. We also compared the results of our algorithms with another Genetic Algorithm based techniques. We expect that further improvements can be achieved by incorporating linear relationship between some other techniques.

[...] Read more.
Image Denoising based on Enhanced Wavelet Global Thresholding Using Intelligent Signal Processing Algorithm

By Joseph Isabona Agbotiname Lucky Imoize Stephen Ojo

DOI: https://doi.org/10.5815/ijigsp.2023.05.01, Pub. Date: 8 Oct. 2023

Denoising is a vital aspect of image preprocessing, often explored to eliminate noise in an image to restore its proper characteristic formation and clarity. Unfortunately, noise often degrades the quality of valuable images, making them meaningless for practical applications. Several methods have been deployed to address this problem, but the quality of the recovered images still requires enhancement for efficient applications in practice. In this paper, a wavelet-based universal thresholding technique that possesses the capacity to optimally denoise highly degraded noisy images with both uniform and non-uniform variations in illumination and contrast is proposed. The proposed method, herein referred to as the modified wavelet-based universal thresholding (MWUT), compared to three state-of-the-art denoising techniques, was employed to denoise five noisy images. In order to appraise the qualities of the images obtained, seven performance indicators comprising the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Structural Content (SC), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Method (SSIM), Signal-to-Reconstruction-Error Ratio (SRER), Blind Spatial Quality Evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) were employed. The first five indicators – RMSE, MAE, SC, PSNR, SSIM, and SRER- are reference indicators, while the remaining two – NIQE and BRISQUE- are referenceless. For the superior performance of the proposed wavelet threshold algorithm, the SC, PSNR, SSIM, and SRER must be higher, while lower values of NIQE, BRISQUE, RMSE, and MAE are preferred. A higher and better value of PSNR, SSIM, and SRER in the final results shows the superior performance of our proposed MWUT denoising technique over the preliminaries. Lower NIQE, BRISQUE, RMSE, and MAE values also indicate higher and better image quality results using the proposed modified wavelet-based universal thresholding technique over the existing schemes. The modified wavelet-based universal thresholding technique would find practical applications in digital image processing and enhancement.

[...] Read more.
Mobile-Based Skin Disease Diagnosis System Using Convolutional Neural Networks (CNN)

By M.W.P Maduranga Dilshan Nandasena

DOI: https://doi.org/10.5815/ijigsp.2022.03.05, Pub. Date: 8 Jun. 2022

This paper presents a design and development of an Artificial Intelligence (AI) based mobile application to detect the type of skin disease. Skin diseases are a serious hazard to everyone throughout the world. However, it is difficult to make accurate skin diseases diagnosis. In this work, Deep learning algorithms Convolution Neural Networks (CNN) is proposed to classify skin diseases on the HAM10000 dataset. An extensive review of research articles on object identification methods and a comparison of their relative qualities were given to find a method that would work well for detecting skin diseases. The CNN-based technique was recognized as the best method for identifying skin diseases. A mobile application, on the other hand, is built for quick and accurate action. By looking at an image of the afflicted area at the beginning of a skin illness, it assists patients and dermatologists in determining the kind of disease present. Its resilience in detecting the impacted region considerably faster with nearly 2x fewer computations than the standard MobileNet model results in low computing efforts. This study revealed that MobileNet with transfer learning yielding an accuracy of about 85% is the most suitable model for automatic skin disease identification. According to these findings, the suggested approach can assist general practitioners in quickly and accurately diagnosing skin diseases using the smart phone.

[...] Read more.
A Review of Self-supervised Learning Methods in the Field of Medical Image Analysis

By Jiashu Xu

DOI: https://doi.org/10.5815/ijigsp.2021.04.03, Pub. Date: 8 Aug. 2021

In the field of medical image analysis, supervised deep learning strategies have achieved significant development, while these methods rely on large labeled datasets. Self-Supervised learning (SSL) provides a new strategy to pre-train a neural network with unlabeled data. This is a new unsupervised learning paradigm that has achieved significant breakthroughs in recent years. So, more and more researchers are trying to utilize SSL methods for medical image analysis, to meet the challenge of assembling large medical datasets. To our knowledge, so far there still a shortage of reviews of self-supervised learning methods in the field of medical image analysis, our work of this article aims to fill this gap and comprehensively review the application of self-supervised learning in the medical field. This article provides the latest and most detailed overview of self-supervised learning in the medical field and promotes the development of unsupervised learning in the field of medical imaging. These methods are divided into three categories: context-based, generation-based, and contrast-based, and then show the pros and cons of each category and evaluates their performance in downstream tasks. Finally, we conclude with the limitations of the current methods and discussed the future direction.

[...] Read more.
Breast Cancer Classification from Ultrasound Images using VGG16 Model based Transfer Learning

By A. B. M. Aowlad Hossain Jannatul Kamrun Nisha Fatematuj Johora

DOI: https://doi.org/10.5815/ijigsp.2023.01.02, Pub. Date: 8 Feb. 2023

Ultrasound based breast screening is gaining attention recently especially for dense breast. The technological advancement, cancer awareness, and cost-safety-availability benefits lead rapid rise of breast ultrasound market. The irregular shape, intensity variation, and additional blood vessels of malignant cancer are distinguishable in ultrasound images from the benign phase. However, classification of breast cancer using ultrasound images is a difficult process owing to speckle noise and complex textures of breast. In this paper, a breast cancer classification method is presented using VGG16 model based transfer learning approach. We have used median filter to despeckle the images. The layers for convolution process of the pretrained VGG16 model along with the maxpooling layers have been used as feature extractor and a proposed fully connected two layers deep neural network has been designed as classifier. Adam optimizer is used with learning rate of 0.001 and binary cross-entropy is chosen as the loss function for model optimization. Dropout of hidden layers is used to avoid overfitting. Breast Ultrasound images from two databases (total 897 images) have been combined to train, validate and test the performance and generalization strength of the classifier. Experimental results showed the training accuracy as 98.2% and testing accuracy as 91% for blind testing data with a reduced of computational complexity. Gradient class activation mapping (Grad-CAM) technique has been used to visualize and check the targeted regions localization effort at the final convolutional layer and found as noteworthy. The outcomes of this work might be useful for the clinical applications of breast cancer diagnosis.

[...] Read more.
Text Region Extraction: A Morphological Based Image Analysis Using Genetic Algorithm

By Dhirendra Pal Singh Ashish Khare

DOI: https://doi.org/10.5815/ijigsp.2015.02.06, Pub. Date: 8 Jan. 2015

Image analysis belongs to the area of computer vision and pattern recognition. These areas are also a part of digital image processing, where researchers have a great attention in the area of content retrieval information from various types of images having complex background, low contrast background or multi-spectral background etc. These contents may be found in any form like texture data, shape, and objects. Text Region Extraction as a content from an mage is a class of problems in Digital Image Processing Applications that aims to provides necessary information which are widely used in many fields medical imaging, pattern recognition, Robotics, Artificial intelligent Transport systems etc. To extract the text data information has becomes a challenging task. Since, Text extraction are very useful for identifying and analysis the whole information about image, Therefore, In this paper, we propose a unified framework by combining morphological operations and Genetic Algorithms for extracting and analyzing the text data region which may be embedded in an image by means of variety of texts: font, size, skew angle, distortion by slant and tilt, shape of the object which texts are on, etc. We have established our proposed methods on gray level image sets and make qualitative and quantitative comparisons with other existing methods and concluded that proposed method is better than others.

[...] Read more.
Retinal Image Segmentation for Diabetic Retinopathy Detection using U-Net Architecture

By Swapnil V. Deshmukh Apash Roy Pratik Agrawal

DOI: https://doi.org/10.5815/ijigsp.2023.01.07, Pub. Date: 8 Feb. 2023

Diabetic retinopathy is one of the most serious eye diseases and can lead to permanent blindness if not diagnosed early. The main cause of this is diabetes. Not every diabetic will develop diabetic retinopathy, but the risk of developing diabetes is undeniable. This requires the early diagnosis of Diabetic retinopathy. Segmentation is one of the approaches which is useful for detecting the blood vessels in the retinal image. This paper proposed the three models based on a deep learning approach for recognizing blood vessels from retinal images using region-based segmentation techniques. The proposed model consists of four steps preprocessing, Augmentation, Model training, and Performance measure. The augmented retinal images are fed to the three models for training and finally, get the segmented image. The proposed three models are applied on publically available data set of DRIVE, STARE, and HRF. It is observed that more thin blood vessels are segmented on the retinal image in the HRF dataset using model-3. The performance of proposed three models is compare with other state-of-art-methods of blood vessels segmentation of DRIVE, STARE, and HRF datasets.

[...] Read more.
A Review on Image Reconstruction through MRI k-Space Data

By Tanuj Kumar Jhamb Vinith Rejathalal V.K. Govindan

DOI: https://doi.org/10.5815/ijigsp.2015.07.06, Pub. Date: 8 Jun. 2015

Image reconstruction is the process of generating an image of an object from the signals captured by the scanning machine. Medical imaging is an interdisciplinary field combining physics, biology, mathematics and computational sciences. This paper provides a complete overview of image reconstruction process in MRI (Magnetic Resonance Imaging). It reviews the computational aspect of medical image reconstruction. MRI is one of the commonly used medical imaging techniques. The data collected by MRI scanner for image reconstruction is called the k-space data. For reconstructing an image from k-space data, there are various algorithms such as Homodyne algorithm, Zero Filling method, Dictionary Learning, and Projections onto Convex Set method. All the characteristics of k-space data and MRI data collection technique are reviewed in detail. The algorithms used for image reconstruction discussed in detail along with their pros and cons. Various modern magnetic resonance imaging techniques like functional MRI, diffusion MRI have also been introduced. The concepts of classical techniques like Expectation Maximization, Sensitive Encoding, Level Set Method, and the recent techniques such as Alternating Minimization, Signal Modeling, and Sphere Shaped Support Vector Machine are also reviewed. It is observed that most of these techniques enhance the gradient encoding and reduce the scanning time. Classical algorithms provide undesirable blurring effect when the degree of phase variation is high in partial k-space. Modern reconstructions algorithms such as Dictionary learning works well even with high phase variation as these are iterative procedures.

[...] Read more.
Real-Time Video based Human Suspicious Activity Recognition with Transfer Learning for Deep Learning

By Indhumathi .J Balasubramanian .M Balasaigayathri .B

DOI: https://doi.org/10.5815/ijigsp.2023.01.05, Pub. Date: 8 Feb. 2023

Nowadays, the primary concern of any society is providing safety to an individual. It is very hard to recognize the human behaviour and identify whether it is suspicious or normal. Deep learning approaches paved the way for the development of various machine learning and artificial intelligence. The proposed system detects real-time human activity using a convolutional neural network. The objective of the study is to develop a real-time application for Activity recognition using with and without transfer learning methods. The proposed system considers criminal, suspicious and normal categories of activities. Differentiate suspicious behaviour videos are collected from different peoples(men/women). This proposed system is used to detect suspicious activities of a person. The novel 2D-CNN, pre-trained VGG-16 and ResNet50 is trained on video frames of human activities such as normal and suspicious behaviour. Similarly, the transfer learning in VGG16 and ResNet50 is trained using human suspicious activity datasets. The results show that the novel 2D-CNN, VGG16, and ResNet50 without transfer learning achieve accuracy of 98.96%, 97.84%, and 99.03%, respectively. In Kaggle/real-time video, the proposed system employing 2D-CNN outperforms the pre-trained model VGG16. The trained model is used to classify the activity in the real-time captured video. The performance obtained on ResNet50 with transfer learning accuracy of 99.18% is higher than VGG16 transfer learning accuracy of 98.36%. 

[...] Read more.