International Journal of Image, Graphics and Signal Processing (IJIGSP)

IJIGSP Vol. 17, No. 6, Dec. 2025

Cover page and Table of Contents: PDF (size: 762KB)

Table Of Contents

REGULAR PAPERS

Web-Based Waste Detection Using YOLOv8 and Classification Performance Comparison: MobileNet and EfficientNet

By Apriandy Angdresey Indah Yessi Kairupan Andre Gabriel Mongkareng

DOI: https://doi.org/10.5815/ijigsp.2025.06.01, Pub. Date: 8 Dec. 2025

Environmental pollution resulting from waste is a critical global challenge that significantly affects both the environment and public health, especially in countries like Indonesia. Effective waste management and recycling depend on accurately detecting and classifying different waste types. This study tackles this challenge by evaluating the YOLOv8s algorithm for object detection and conducting a comparative analysis of two mobile-optimized convolutional neural networks (CNNs), MobileNetV2 and EfficientNet, for waste classification. The YOLOv8s model established a promising baseline for detection, achieving a mean Average Precision (mAP@50) of 0.621 on the hold-out test set. MobileNetV2 proved to be the superior architecture in the classification task, attaining a higher accuracy of 94.4% compared to EfficientNet’s 87.8%. Additionally, MobileNetV2 demonstrated significantly greater computational efficiency, with a processing time of 229 ms per step, in contrast to EfficientNet’s 606 ms per step. These findings confirm that combining YOLOv8s for detection and MobileNetV2 for classification provides a robust and efficient pathway for developing automated waste management systems.

[...] Read more.
Semantic Segmentation of Tuberculosis Bacilli from Microscopic Sputum Smear Images Using TransUNet

By Ashutosh Satapathy Praneeth Vallabhaneni Manisha Indugula

DOI: https://doi.org/10.5815/ijigsp.2025.06.02, Pub. Date: 8 Dec. 2025

According to the World Health Organization (WHO) touchstones of 2022, Tuberculosis is the second dominant disease after COVID-19. Around one-fourth of the comprehensive population is ascertained to have tuberculosis. Timely detection and prevention of tuberculosis is a must to overcome its harmful effects. The method most often used in ascertaining whether a patient has tuberculosis, is examining his or her sputum sample. In the process, the isolation of the bacilli is done manually, and hence it is prone to error. Segmentation illustrates and enlightens objects or particles within an image, thus extracting the Region of Interest (ROI). The contemplated study uses TransUNet architecture to segment tuberculosis bacilli from sputum images to increase diagnostic accuracy and performance. The attention mechanism used in the TransUNet model helps to identify the spatial hierarchies present in image. It is an extremely tough task for naive or traditional segmentation algorithms to deal with the inherent complexity of sputum images. Hence, this study introduces an approach to capture the intrinsic features and dependencies needed to segment mycobacterium or TB bacilli by leveraging the TransUNet model. The model achieved an average Dice Score of 92.795%, a mean Intersection over Union (IoU) of 88.845%, and a segmentation accuracy of  99.19% on the Mosaic and Ziehl-Neelsen datasets. These results surpassed several existing state-of-art methods like UNet, clustering and thresholding, depicting the superior capability of TransUNet in segmenting the TB bacilli. It deepens the potentiality of transformer-based CNN models, especially TransUNet, for improving the diagnosis of tuberculosis and supporting disease management.

[...] Read more.
Application of Large Language Models for Data-Driven Analytics in Oncology: Insights and Evidence Generation from Real-World Imaging Data

By Shobhit Shrotriya Nizar Banu P. K. Avi Kulkarni Vinod G. Kumar

DOI: https://doi.org/10.5815/ijigsp.2025.06.03, Pub. Date: 8 Dec. 2025

Breast cancer is one of the most common and serious types of cancer. It can affect people of all ages and genders around the world. The increasing incidence of breast cancer, coupled with its complexity, has placed a significant burden on healthcare systems and patients alike. Traditional diagnostic methods, while effective, often face limitations in early detection and accurate prognosis, which are critical for improving patient outcomes. In recent years, artificial intelligence (AI) and machine learning (ML) are changing the way we solve problems and make decisions in the field of medical diagnostics, enhancing the ability to detect, diagnose and predict breast cancer. However, there are still challenges, such as the need for large and diverse datasets to train these models, making AI tools work smoothly in hospitals, and addressing ethical concerns in healthcare. This paper looks at how AI and ML are used in breast cancer care, especially in analyzing real-world medical data like images, histopathology, and other datasets such as doctor notes & discharge summaries, to identify patterns that may be unnoticeable to medical experts. Large Language Models (LLMs) using embeddings, are highlighted for their capacity to improve the accuracy of image related interpretations, potentially detect early-stage tumours, and predict disease progression and treatment responses. Real-world medical datasets have been collected and analysed using different models. A publicly available Convolutional Neural Network (CNN) and a custom-built Large Language Model (LLM) with embeddings were tested. The Generative AI model achieved 98.44% accuracy, significantly higher than the traditional AI model's 61.72%. Future research can explore how Generative AI can help classify patients based on risk levels. This could lead to personalized treatment plans, reducing unnecessary treatments and improving patients' quality of life. Given the research is primarily focussed on breast cancer, there is an attempt to showcase that by harnessing the power of AI and ML, there is potential to significantly reduce the global burden of breast cancer, offering new avenues for early detection, accurate diagnosis, and tailored therapeutic strategies. Continued research and collaboration among oncologists, data scientists, and policymakers are essential to fully realize the benefits of AI in the fight against breast cancer, ultimately leading to better patient outcomes and a decrease in breast cancer-related mortality.

[...] Read more.
Noninvasive Hemoglobin Monitoring Device for Disease Detection

By Md. Altab Hossain Sheikh Md. Rabiul Islam

DOI: https://doi.org/10.5815/ijigsp.2025.06.04, Pub. Date: 8 Dec. 2025

A noninvasive blood hemoglobin monitoring device was designed specifically for monitoring anemia and polycythemia. Invasive techniques, which are painful and expensive, are commonly used to estimate blood hemoglobin concentrations. This paper presents a noninvasive method for monitoring blood hemoglobin values. A photodiode and a near-infrared (NIR) LED with a wavelength of 940 nm were used to construct a finger probe. At 940 nm wavelength shows distinct variation between oxygenated and deoxygenated hemoglobin and single-wavelength systems significantly reduce hardware complexity, cost, power consumption, and size. Use a continuous-wave NIR LED light through the finger to check the sensitivity of different hemoglobin concentrations. A total of 100 patients participated in our proposed device for evaluating noninvasive hemoglobin concentration. These participants collected both invasive and noninvasive hemoglobin concentration values. The correlation coefficient between the predicted (noninvasive) hemoglobin value and the reference (invasive) hemoglobin value was 0.9496, with a normalized root mean squared error (NRMSE) of 0.6504 and a mean absolute percentage error (MAPE) of 0.0505. The noninvasive blood hemoglobin level was classified using the k-nearest neighbour (kNN) classifier, and the proposed device accuracy was calculated at 90%. The Bland-Altman methodology evaluated differences between invasive and noninvasive blood hemoglobin concentrations. The absolute mean difference was 0.1124 (95% confidence interval [CI] -0.01535 to 0.2401), with an upper agreement limit of 1.374 (95% CI [1.153 - 1.595]) and a lower agreement limit of -1.149 (95% CI [-1.371 - 0.9282]).

[...] Read more.
Leveraging Deep Learning Approach for the Detection of Human Activities from Video Sequences

By Preethi Salian K. Karthik K.

DOI: https://doi.org/10.5815/ijigsp.2025.06.05, Pub. Date: 8 Dec. 2025

Using deep learning approaches, recognizing human actions from video sequences by automatically deriving significant representations has demonstrated effective results from unprocessed video information. Artificial intelligence (AI) systems, including monitoring, automation, and human-computer interface, have become crucial for security and human behaviour analysis. For the visual depiction of video clips during the training phase, the existing action identification algorithms mostly use pre-trained weights of various AI designs, which impact the characteristics discrepancies and perseverance, including the separation among the visual and temporal indicators. The research proposes a 3-dimensional Convolutional Neural Network and Long Short-Term Memory (3D-CNN-LSTM) network that strategically concentrates on useful information in the input frame to recognize the various human behaviours in the video frames to overcome this problem. The process utilizes stochastic gradient descent (SGD) optimization to identify the model parameters that best match the expected and observed outcomes. The proposed framework is trained, validated, and tested using publicly accessible UCF11 benchmark dataset. According to the experimental findings of this work, the accuracy rate was 93.72%, which is 2.42% higher compared to the state-of-the-art previous best result. When compared to several other relevant techniques that are already in use, the suggested approach achieved outstanding performance in terms of accuracy. 

[...] Read more.
Dual Attention Fusion-Net with Edge Attention Guidance Network based Segmentation for an Automatic Size Detection of Onions

By M. Mythili P. Vasanthi Kumari

DOI: https://doi.org/10.5815/ijigsp.2025.06.06, Pub. Date: 8 Dec. 2025

Onion size is a crucial physiological characteristic that can be explained by a number of factors, including diameter, weight, volume, and length. Determining the size of onions is frequently necessary for sorting them for a variety of reasons, including processing machine specifications, legal requirements for sorting standards, and consumer preferences. In the process of phenotyping onions, size is another crucial quantitative feature to consider. Traditionally, algorithms based on morphology, colour, thresholding, and geometric approaches have been used to estimate the shape and size of onions. However, research that relies on these geometric or colour-based functions is limited to approximations and frequently produces erroneous results when conducted at precisely controlled heights. Healthy onions are collected and utilized as an input dataset for this paper. The gathered images are pre-processed to reduce noise and improve contrast by applying the circular adaptive median filter and homomorphic filtering with Elk-herd optimization. Next, utilizing the dilated and deformable feature pyramid network, object detection is performed on the pre-processed images. To segment the onion from the image for removing the unwanted portions, an edge-based segmentation algorithm is used, such as an edge-attention guidance network. The dual attention fusion-net, which ranks data into labelled groups and measures onion size. Accuracy, confusion metrics, FDR, hit rate, and other performance metrics are assessed for both the current and proposed models in the proposed model. Consequently, the suggested onion size detection approach outperforms the current algorithm. This method produced 97.6% accuracy, 2.9% FDR, 96% Hit Rate, 98.5% Selectivity, and 97.3% NPV. Thus, this proposed approach is the best choice for detecting the size of the onion.

[...] Read more.
Stacking Based Ensemble Learning with Deer Hunting Optimization for Automatic Identification of Malvani Dialects

By Madhavi S. Pednekar Kaustubh Bhattacharyya

DOI: https://doi.org/10.5815/ijigsp.2025.06.07, Pub. Date: 8 Dec. 2025

Language Identification (LID) is a subset of Dialect Identification that addresses specific challenges and matters related to linguistic similarity between dialects. Various current approaches are used for dialect identification, but automated prediction is difficult because the clarity of voices is not in a perfect range, and inaccurate selection of features. It is essential to utilize an appropriate feature subset that contains sufficient signal information for the learning model to correctly recognize language dialects. So as to eradicate the mentioned issues, optimized stacking based ensemble learning is developed. The identification process initiates with the pre-processing by using an adaptive least mean square filter and a fractional bandpass filter. The features from the pre-processed audio signal will be extracted by using Gammatone frequency Cepstral coefficients (GFCC) and Shifted Delta Cepstral Coefficient (SDCC). Then, the extracted features will be reduced with the help of Independent Component Analysis (ICA). Furtherly, the classification of selected features will be further given to the Recurrent Neural Network (RNN), which acts as a meta-classifier and additionally gets information from a pair of distinct classifiers, such as Radial Basis Functional Neural Network (RBFNN) and Deep Belief Network (DBN). The hyperparameter present in the RNN classifier was tuned using the Deer Hunting Optimization Algorithm (DHOA). The proposed approach has an accuracy of 97%, a precision of 96%, also an F1-score of 97%. Therefore, for an automatic dialect identification, the suggested approach is the best option.

[...] Read more.
E-Chars74k: An Extended Scene Character Dataset with Augmentation Insights and Benchmarks

By Payel Sengupta Tauseef Khan Ayatullah Faruk Mollah

DOI: https://doi.org/10.5815/ijigsp.2025.06.08, Pub. Date: 8 Dec. 2025

Semantic understanding of camera-captured scene text images is an important problem in computer vision. Scene character recognition is the pivotal task in this problem, and deep learning is now-a-days the most prospective approach. However, limited sample-size of scene character datasets appear to be a major hindrance for training deep networks. In this paper, we present (i) various augmentation techniques for increasing the sample size of such datasets along with associated insights, (ii) an extended version of the popular Chars74k dataset (herein referred to as E-Chars74k), and (iii) the benchmark performance on the developed E-Chars74k dataset. Experiments on various sets of data such as digits, alphabets and their combination, belonging to the usual as well as wild scenarios, clearly reflect significant performance gain (20%-30% increase in scene character recognition accuracy). It is noteworthy to mention that in all these experiments, a deep convolutional neural network powered with two conv-pool pairs is trained with the uniform training test partition to foster comparison on equal bench.

[...] Read more.
A Robust Digital Image Watermarking Using Gorilla Troop Optimization Algorithm in Hybrid Frequency Domain

By Viswanathasarma Ch. Danish Ali Khan Chandramouli Pvssr

DOI: https://doi.org/10.5815/ijigsp.2025.06.09, Pub. Date: 8 Dec. 2025

Because of the nature of the Internet and the growing number of people using digital media, copyright protection is becoming more important. One of the most common ways to protect this is by implementing digital image watermarking. This protection method safeguards the image from unauthorized access. The Gorilla Troop Optimization Algorithm (GTO), a new evolutionary algorithm, is what we propose to be a powerful watermarking technique. Initially, we applied Discrete Wavelet Transform (DWT) to the cover image, followed by Singular Value Decomposition (SVD) for enhanced security, and finally, we applied SVD to the Watermark image for its embedding into the cover image. In this process, we aim to optimize the multiple scaling factors (MSFs) by applying the GTO algorithm and testing the proposed algorithm in the MATLAB environment using some standard images. We then evaluated the experiment using performance metrics such as Normalized Cross-Correlation (NCC), the Structural Similarity Index (SSIM), and the Peak Signal-to-Noise Ratio (PSNR). These metrics proved the imperceptibility of different attacks and the proposed algorithm’s performance.

[...] Read more.
A Fast Output Generating Set Partitioning in Hierarchical Trees Coding for Medical Image Compression

By Narayana Prakash S. Airani Mohammad Khan

DOI: https://doi.org/10.5815/ijigsp.2025.06.10, Pub. Date: 8 Dec. 2025

In this paper, we have presented Discrete Wavelet Transform (DWT) based Fast Output Generating Set Partitioning in Hierarchical Trees (FOGSPIHT) algorithm for MRI brain image compression. The FOGSPIHT is scalable, faster, and robust algorithm. Image compression is an important technique that enables fast and high throughput imaging applications by reducing the storage space or transmission bandwidth. DWT transforms the image to get a set of coefficients that are used for efficient compression. The Set Partitioning In Hierarchical Trees (SPIHT) algorithm is an efficient algorithm used for DWT based image compression. The limitations of SPIHT coding are the complexity and memory requirements. To reduce the complexity, we propose the FOGSPIHT algorithm that works on the basic principles of SPIHT. The FOGSPIHT algorithm works on coefficients that are converted to bit planes. FOGSPIHT eliminates the comparison operations in the compression process of SPIHT by simple logical operations on bits. The values of Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM) are calculated and plotted against Compression Ratio (CR). The result obtained with the FOGSPIHT algorithm is equal to or better than the SPIHT algorithm. The FOGSPIHT algorithm is faster which has reduced encoding and decoding time. The implementation of the FOGSPIHT algorithm with an 8x8 image DWT coefficient on FPGA requires the lower amount of resource and power requirements in comparison with the SPIHT algorithm.

[...] Read more.