Work place: School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology (VIT), Vellore, 632 014, India
E-mail: sarathi.g2021@vitstudent.ac.in
Website:
Research Interests: Artificial Intelligence, Cloud Computing
Biography
Sarathi G. holds a Bachelor’s degree in Computer Applications (BCA) from VIT Vellore University, India. Currently pursuing his Master of Computer Applications (MCA) at VIT Vellore, his research interests focus on gesture recognition systems, artificial intelligence in visual technologies, and cloud computing. He completed an internship at CodeClause, where he worked on projects related to Python programming and software development.
By Vijayan R. Mareeswari V. Sarathi G. Sathya Nikethan R. V.
DOI: https://doi.org/10.5815/ijitcs.2025.05.02, Pub. Date: 8 Oct. 2025
The rise of virtual instruments has revolutionized music production, providing new avenues for creating music without the need for physical instruments. However, these systems rely on costly hardware, such as MIDI controllers, limiting accessibility. As an alternative, 3D gesture-based virtual instruments have been explored to emulate the immersive experience of MIDI controllers. Yet, these approaches introduce accessibility challenges by requiring specialized hardware, such as depth-sensing cameras and motion sensors. In contrast, 2D gesture systems using RGB cameras are more affordable but often lack extended functionalities. To address these challenges, this study presents a 2D virtual piano system that utilizes hand gesture recognition. The system enables accurate gesture-based control, real-time volume adjustments, control over multiple octaves and instruments, and automatic sheet music generation. OpenCV, an open-source computer vision library, and Google’s MediaPipe are employed for real-time hand tracking. The extracted hand landmark coordinates are normalized based on the wrist and scaled for consistent performance across various RGB camera setups. A bidirectional long short-term memory (Bi-LSTM) network is used to evaluate the approach. Experimental results show 95% accuracy on a public Kaggle dynamic gesture dataset and 97% on a custom-designed dataset for virtual piano gestures. Future work will focus on integrating the system with Digital Audio Workstations (DAWs), adding advanced musical features, and improving scalability for multiple-player use.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals