Huynh Vo-Thuy

Work place: Can Tho General Hospital, Can Tho, Vietnam

E-mail: vthuynh2310@gmail.com

Website:

Research Interests:

Biography

Huynh Vo-Thuy is an IT manager at Can Tho City General Hospital, Vietnam. She received a Master’s Degree in Information Systems at Can Tho University, Vietnam. She is passionate about applying information systems to address real-world challenges, particularly in data management and decision support systems. With over 10 years of experience at Can Tho City General Hospital, she has played a key role in implementing IT solutions to enhance medical examination, treatment management, and patient data storage.

Author Articles
A ViT-based Model for Detecting Kidney Stones in Coronal CT Images

By A. Cong Tran Huynh Vo-Thuy

DOI: https://doi.org/10.5815/ijitcs.2025.05.01, Pub. Date: 8 Oct. 2025

Detecting kidney stones in coronal CT images remains challenging due to the small size of stones, anatomical complexity, and noise from surrounding objects. To address these challenges, we propose a deep learning architecture that augments a Vision Transformer (ViT) with a pre-processing module. This module integrates CSPDarknet for efficient feature extraction, a Feature Pyramid Network (FPN), and Path Aggregation Network (PANet) for multi-scale context aggregation, along with convolutional layers for spatial refinement. Together, these trained components filter irrelevant background regions and highlight kidney-specific features before classification by ViT, thereby improving accuracy and efficiency. This design leverages ViT’s global context modeling while mitigating its sensitivity to irrelevant regions and limited data. The proposed model was evaluated on two coronal CT datasets (one public and one private dataset) comprising 6,532 images under six experimental scenarios with varying training and testing conditions. It achieved 99.3% accuracy, 98.7% F1-score, and 99.4% mAP@0.5, higher than both YOLOv10 and the baseline ViT. The model contains 61.2 million parameters and has a computational cost of 37.3 GFLOPs, striking a balance between ViT (86.0M, 17.6 GFLOPs) and YOLOv10 (22.4M, 92.0GFLOPs). Despite having more parameters than YOLOv10, the model achieved a lower inference time than YOLOv10, approximately 0.06 seconds per image on an NVIDIA RTX 3060 GPU. These findings suggest the potential of our approach as a foundation for clinical decision-support tools, pending further validation on heterogeneous and challenging clinical datasets such as small (<2 mm) or low-contrast stones.

[...] Read more.
Other Articles