IJIGSP Vol. 18, No. 2, Apr. 2026
Cover page and Table of Contents: PDF (size: 937KB)
REGULAR PAPERS
In the rapidly evolving landscape of medical diagnostics, efficient and accurate tools for disease identification are crucial. This study analyzes three convolutional neural network (CNN) architectures—IRV2, ResNet50, and DenseNet121—pre-trained on ImageNet and RadImageNet datasets for respiratory disease diagnosis using chest radiographs. We used over 10,000 chest X-ray images, including COVID-19, pneumonia, and control cases, to train and evaluate these models. RadImageNet-trained models, particularly ResNet50, achieved superior performance with 94.49% accuracy, 93.92% sensitivity, and 95.59% precision compared to ImageNet-trained counterparts, though the improvement was not statistically significant in most cases. To enhance interpretability, we developed a counterfactual-based method generating visual explanations of critical areas influencing diagnostic outcomes. This approach, not requiring access to training data or model internals, identifies image parts that could change the predicted diagnosis if altered. It aids in understanding model reasoning and can correct misclassifications, successfully reclassifying up to 40.91% of previously misclassified images through our masking method. By providing clear, independent visual explanations, our method aims to foster trust in AI-assisted diagnoses among medical professionals. While preliminary results are promising, further validation with medical experts will help confirm the clinical relevance of the highlighted regions. This will strengthen the transparency and interpretability of AI decision-making in healthcare. The visual nature of these explanations offers a valuable tool for interpreting complex medical image classification models and may enhance the synergy between AI systems and human expertise in diagnostic processes.
[...] Read more.Satellite imagery is always used to study spatial geographies to find water, residential, farmland, and forest lands; which can be further used for township development and planning, landscape detection etc. Semantic segmentation and image classification are the two crucial procedures in determining the spatial geographies. In order to improve the generalization ability of semantic segmentation algorithms, a combined model of UNet_ResNet is used in this paper. The engineered model is a type of Convolutional Neural Networks using GeoGANs which detects semantic patches in neural networks with smaller sizes and regional characteristics within a certain spatial and pixel scale. However, it faces a semantic segmentation challenge of identifying roadways in metropolitan areas. The model shows an accuracy score from 93% to 97.3% for image classification and segmentation purposes which fares better than the implementation of various existing architectures.
[...] Read more.The article describes the theoretical foundations and software tools for scaling digital images by adaptive and combined application of bilinear and bicubic interpolation algorithms. An analysis of modern algorithms and image scaling tools has been performed. The theoretical foundations of image scaling using interpolation algorithms are described. The root mean square error between the pixel values of the original and scaled images was used as the scaling error. The scaling of images was performed by a complex of two interpolation algorithms. The first algorithm reduces the image scale, after which the second algorithm increases the scale. Such image processing is performed, in particular, in telecommunication systems for transmitting images at reduced scales. A correlation was found between the values of the average spatial period of the image and the relative scaling error, which is equal to the ratio of the scaling errors for different interpolation algorithms. The spatial period of the image was calculated based on its energy spectrum. A regression analysis was performed to determine the dependence of the relative scaling error on the spatial period of the images. It is found that in most cases bicubic interpolation provides a smaller scaling error, but for some images with small spatial period, bilinear interpolation provides a smaller error. It is proposed to increase the scaling accuracy by adaptively selecting the image interpolation algorithm depending on its spatial period. A combined application of interpolation algorithms was performed, which consists of reducing the scale using the bilinear interpolation algorithm and increasing the scale using the bicubic interpolation algorithm. A statistical analysis of the results of image scaling was performed. It was found that the combined application of algorithms in most cases provides a smaller error than the separate application of the bicubic and bilinear interpolation algorithms.
[...] Read more.Due to lifestyle changes and daily behavioural routines of people living across the globe, cardiovascular diseases (CVD) are increasing in the modern world. In the treatment process, the prediction level of CVD is significantly required. Incorporating machine learning algorithms into CVD prediction can provide advantages such as reduced time consumption in the diagnostic process and improved decision-making. Hence, this research aims to implement a novel Lion-based Federated Learning for Disease Prediction (LbFLDP) technique to predict CVD. The novel approach includes three local hospital models and one centralized global model. The local models are trained using CVD dataset obtained from the kaggle website. After the training phase, the local models are used to predict CVD. These prediction features are then updated in the global model from the local models to enhance the prediction features in the global model. The global model is then initiated for predicting CVD. At this time, the performance of the suggested technique is evaluated in terms of accuracy, F-score, Precision, recall, and error rate. The proposed approach has 98.41 recall, 99.6% accuracy, 98.57 F-score, 98.57 precision, and 0.4% error rate.
[...] Read more.Image denoising remains a fundamental challenge in image processing, particularly when dealing with additive white gaussian noise (AWGN) that degrades visual quality and information content. This paper introduces a novel multi-stage denoising framework that uniquely combines Contourlet transform, radial basis function neural networks (RBFNN), and kalman filtering to effectively preserve important image features while removing noise. The contourlet transform first decomposes images into multi-resolution, directional subbands, providing a sparse representation that better captures geometric structures compared to traditional wavelet approaches. We then employ an RBFNN trained through back-propagation to adaptively threshold the contourlet coefficients based on local image characteristics and noise levels. Finally, kalman filtering is applied as a post-processing step to further suppress residual noise artifacts. Comprehensive experiments conducted on standard benchmark datasets demonstrate that our approach outperforms several state-of-the-art methods, including BM3D and recent deep learning-based techniques, particularly at moderate to high noise levels (σ ≥ 15). Quantitative evaluations show our method achieves superior PSNR improvements of up to 2.4dB and SSIM improvements of 0.12 compared to recent competing approaches, while qualitative results confirm better preservation of edges and textural details. The proposed framework offers an effective balance between computational efficiency and denoising performance, making it suitable for various practical applications.
[...] Read more.This research outlines a comprehensive dual-modality speech recognition system designed specifically to support hearing-impaired students in understanding spoken Kannada through synchronized processing of auditory signals and visual articulatory cues. The approach capitalizes on deep learning capabilities to improve performance to extract speech-related features from spectrograms and Mel-Frequency Cepstral Coefficients (MFCC) for audio, and lip movement discriminative features via CNNs and Temporal Convolutional Networks (TCNs) for visual input. A hybrid architecture, KanAVNet (Kannada Audio-Visual Network), based on a CNN–BiLSTM framework is integrated with a Connectionist Temporal Classification (CTC) loss function to enable robust sequence-to-sequence mapping while addressing temporal alignment challenges in audio-visual speech recognition. The system is fitted on a custom-developed Kannada audiovisual dataset, addressing the scarcity of regional-language AVSR resources. Empirical evidence shows that the model performs with a high degree of accuracy of 93.2%, a Word Error Rate (WER) of 9.8%, and an F1 score of 91.2%, outperforming baseline unimodal and existing multimodal models. This research highlights the effectiveness of multimodal fusion strategies in noisy environments and showcases the potential of AI-driven tools in promoting accessible and inclusive education for students with auditory impairments.
[...] Read more.Speech enhancement plays a vital role in improving the perceptual quality and intelligibility of speech signals degraded by environmental noise, particularly in modern network-based and signal processing systems. Traditional U-Net architectures capture local spectral details effectively but struggle to model long-range dependencies and may propagate residual noise through skip connections. Transformer-based models provide strong global context modeling but often fail to retain fine-grained spectral cues. To overcome these limitations, this paper presents a Nested U-Net–based network-oriented speech enhancement framework that incorporates Multi-Scale Feature Extraction, Feature Calibration, and a Dual-Path Higher-Order Information Interaction with Time-Frequency Attention module. The Multi-Scale Feature Extraction blocks in both encoder and decoder extract multi-resolution spectral patterns, while the nested topology strengthens hierarchical feature reuse. At the bottleneck, a stack of four Dual-Path Higher-Order Information Interaction with Time-Frequency Attention modules captures long-range temporal and spectral dependencies, and feature calibration adaptively filters encoder features to reduce noise transfer. Extensive experiments on Common Voice and LibriSpeech datasets demonstrate that the proposed model achieves superior perceptual evaluation of speech quality, short-time objective intelligibility, and signal-to-distortion ratio scores, particularly under moderate (0dB) signal-to-noise ratio conditions. The results confirm that the framework provides robust enhancement performance and consistently outperforms several recent state-of-the-art methods in terms of speech quality, intelligibility, and noise suppression.
[...] Read more.Depression remains one of the most prevalent and underdiagnosed mental health disorders globally, necessitating scalable, objective, and non-invasive diagnostic tools. Speech, as a rich biomarker of emotional and psychological states, offers a promising avenue for automated depression detection. This study proposes a robust hybrid deep learning framework that integrates Convolutional Neural Networks (CNN), Gated Recurrent Units (GRU), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer architectures to classify depression severity into three levels: normal, mild, and severe. Using a curated multimodal dataset comprising 400 labeled audio recordings, we extract comprehensive acoustic features, including MFCC, Chroma, Spectrogram, Contrast, and Tonnetz representations. Models are evaluated using precision, recall, F1-score, and accuracy. Experimental results show that the proposed hybrid models outperform traditional architectures, achieving up to 99% accuracy and strong generalization across all classes. This study demonstrates the potential of attention-enhanced hybrid architectures in mental health assessment and provides a foundation for future deployment in clinical and real-world settings. Future work includes multimodal fusion with EEG data and the implementation of explainable AI for clinical interpretability.
[...] Read more.Addressing crowd control and safety at large-scale events is the central focus of this study. The proposed methodology is tested on ShanghaiTechA, ShanghaiTechB and UCF CC50 datasets. Apart from VGG-16 referred as the baseline model, the study utilizes a Convolutional Neural Network (CNN) model like VGG with dilatable layers and Atrous Spatial Pyramid Pooling (ASPP) layers on these datasets to identify every individual in the crowd by their heads. Furthermore, optical flow analysis identifies fast-moving pixels, facilitating the detection of rapid movements within the crowd. YOLO tracking is additionally employed to monitor the direction of object movement within the crowd. By integrating these methodologies, the study aims to enhance overall safety and security of individuals in the crowd. VGG with dilatable layers gives the least Mean Absolute Error for ShanghaiTechA and ShanghaiTechB datasets. The ASPP approach demonstrates approximately 15% higher accuracy on average compared to the baseline model for the ShanghaiTechA and UCF CC 50 datasets.
[...] Read more.Traditional first aid preparation methods often fail to reproduce realistic stress levels and to simulate visual difficulty in identifying lesions in critical situations. In emergencies, delays in recognising injuries or errors in protocols result in critical losses of human resources. The use of computer graphics and virtual reality technologies enables you to create a safe yet highly realistic environment for rescuers to test and improve their skills. The article presents an integrated methodological framework for assessing the effectiveness of VR first-aid training in conditions of damage to civilian infrastructure. The main focus is on developing mathematical models and algorithms to identify and evaluate the quality of rescuers' actions by analysing digital interaction signals in a virtual environment. A composite efficiency indicator is proposed that combines normalised parameters for reaction time, manipulation accuracy, stress level, and immersion. The work aims to formalise a mathematical model to assess the effectiveness of VR training in developing skills for lesion identification and first aid provision, using quantitative metrics. The study aims to identify statistically significant differences in learning speed and skill retention between groups using VR simulations and traditional methods. The project aims to validate innovative content creation methods, including mobile photogrammetry, to visualise damaged infrastructure and victim models. The study used a comprehensive approach that includes mobile photogrammetry and generative neural networks to create a library of 3D assets with varying degrees of detail. Performance score is based on composite indicator that integrates normalised data on reaction time, manipulation accuracy, error count, immersion rate. Linear mixed models, exponential approximations, and bootstrap estimation of effect stability were used to analyse hierarchical data and individual learning trajectories. The experimental part includes the use of mobile photogrammetry and generative neural networks to create realistic 3D models of affected environments and identify types of injuries (bleeding, burns, unconsciousness). To analyse the dynamics of learning and maintaining skills, models with mixed effects and exponential forgetting curves are used. The results confirm that the use of VR technologies provides a statistically significant acceleration in the development of automated skills for lesion identification and assistance compared to traditional methods. The proposed approach is a scalable tool for preparing civil and rescue services to act in critical situations. Experimental data showed that the integral performance score in the VR group increased from 0.42 0.10 to 0.76 0.08, while in the control group it increased from only 0.40 0.09 to 0.55 0.10 (p < 0.001). The largest effect was observed in the bleeding arrest scenario, where the effect size (Cohen's d) reached 2.3. The analysis of forgetting curves confirmed the superiority of VR: the skill loss rate in the VR group was 0.25, providing knowledge retention 1.8 times longer than in the control group (0.45). The study confirmed that VR simulations significantly accelerate the formation of automated behaviour patterns and reduce reaction time in extreme conditions. The proposed mathematical assessment model provides objective feedback and standardisation of the rescue training process. The results indicate the high practical value of introducing such tools into training programs for civilian and military structures to minimise losses in real emergencies.
[...] Read more.The water distribution sector in Indonesia still faces challenges in detecting leaks early due to manual data checks that are time-consuming and labor-intensive. PDAM (Regional Water Company) Tirta Wijaya Cilacap, Indonesia, faces similar problems. This study aims to implement a spatial customer prediction model to detect customer water usage and support data-driven operational decision-making. K-Means clustering groups customers by consumption patterns and geographic location, achieving a Silhouette Score of 0.4473 and a Davies–Bouldin Index of 0.7658, which indicates reasonably well-separated clusters in real-world data. In addition, water consumption forecasting was carried out with Seasonal–Trend Decomposition using Loess–Long Short-Term Memory (STL–LSTM) to predict trends and seasonality of water usage for each Customer Connection ID (CCID). The forecasting performance varies across CCIDs; the best case achieves an R2 of up to 0.95, while low-performing cases are discussed to clarify conditions where STL–LSTM is less reliable. The forecasting and clustering outputs are presented through a spatial visualization (map) of water-consumption categories and model results to support identifying areas that may require closer inspection for potential leakage and waste. This research contributes to strengthening technology-based public infrastructure, in line with SDG 9: Industry, Innovation, and Infrastructure, to promote sustainable water management.
[...] Read more.Classifying and predicting banana shelf life is vital for optimizing storage and distribution in agriculture. Traditional methods, relying on subjective visual inspection, are inconsistent and time-intensive. This study presents a new, non-destructive approach combining thermal imaging, and machine learning to classify naturally ripened and artificially ripened bananas and forecast their shelf life. Preprocessed thermal images are flattened, segmented into fixed-size patches, and then linearly projected into feature tokens. Position embeddings are incorporated to retain spatial information, and the sequence is processed by a Vision Transformer (ViT) encoder, which leverages self-attention mechanisms to model relationships between patches. The [CLS] token output is subsequently processed through fully connected layers for final classification, achieving 97.59% accuracy. Validation using t-SNE visualization demonstrated clear class separability, and receiver operating characteristic (ROC) curves confirmed robust performance. With an MSE of 0.10, MAE of 0.18, and R2 score of 0.85, the random forest algorithm performed exceptionally well at predicting the shelf life of artificially ripened bananas. This approach offers significant advantages, including improved accuracy, reduced subjectivity, and efficiency in data processing. By integrating thermal imaging with advanced models, the proposed method enhances agricultural supply chain management and promotes precision in ripening classification and shelf life prediction.
[...] Read more.