IJITCS Vol. 17, No. 5, 8 Oct. 2025
Cover page and Table of Contents: PDF (size: 953KB)
PDF (953KB), PP.12-24
Views: 0 Downloads: 0
2D Virtual Piano, Hand Gesture Recognition, Mediapipe, OpenCV, Coordinate Normalization, Bi-LSTM
The rise of virtual instruments has revolutionized music production, providing new avenues for creating music without the need for physical instruments. However, these systems rely on costly hardware, such as MIDI controllers, limiting accessibility. As an alternative, 3D gesture-based virtual instruments have been explored to emulate the immersive experience of MIDI controllers. Yet, these approaches introduce accessibility challenges by requiring specialized hardware, such as depth-sensing cameras and motion sensors. In contrast, 2D gesture systems using RGB cameras are more affordable but often lack extended functionalities. To address these challenges, this study presents a 2D virtual piano system that utilizes hand gesture recognition. The system enables accurate gesture-based control, real-time volume adjustments, control over multiple octaves and instruments, and automatic sheet music generation. OpenCV, an open-source computer vision library, and Google’s MediaPipe are employed for real-time hand tracking. The extracted hand landmark coordinates are normalized based on the wrist and scaled for consistent performance across various RGB camera setups. A bidirectional long short-term memory (Bi-LSTM) network is used to evaluate the approach. Experimental results show 95% accuracy on a public Kaggle dynamic gesture dataset and 97% on a custom-designed dataset for virtual piano gestures. Future work will focus on integrating the system with Digital Audio Workstations (DAWs), adding advanced musical features, and improving scalability for multiple-player use.
Vijayan R., Mareeswari V., Sarathi G., Sathya Nikethan R. V., "Hand Gesture-controlled 2D Virtual Piano with Volume Control", International Journal of Information Technology and Computer Science(IJITCS), Vol.17, No.5, pp.12-24, 2025. DOI:10.5815/ijitcs.2025.05.02
[1]“What are virtual instruments? How to make digital music,” RouteNote Blog, https://routenote.com/blog/what-are-virtual-instruments/, last accessed on 2024/04/25.
[2]Jain, N., Gupta, V., Temperini, V., Meissner, D. and D’angelo, E, "Human machine interactions: from past to future- a systematic literature review", Journal of Management History, Vol. 30 No. 2, pp. 263-302. 2024, doi:10.1108/JMH-12-2022-0085.
[3]Oudah M., Al-Naji A. and Chahl J., “Hand Gesture Recognition Based on Computer Vision: A Review of Techniques,” Journal of Imaging 6, no. 8:73, 2020, doi:10.3390/jimaging6080073.
[4]Qi, J., Ma, L., Cui, Z., and Yu, Y., “Computer vision-based hand gesture recognition for human-robot interaction: a review,” Complex Intell. Syst. 10, pp. 1581–1606, 2024, doi:10.1007/s40747-023-01173-6.
[5]L. Guo, Z. Lu, and L. Yao, "Human-Machine Interaction Sensing Technology Based on Hand Gesture Recognition: A Review," in IEEE Transactions on Human-Machine Systems, vol. 51, no. 4, pp. 300-309, August 2021, doi: 10.1109/THMS.2021.3086003.
[6]Thottathil, Isaac Abraham, and S. Thivaharan, "Virtual Musical Instruments with Python and OpenCV," Journal of Ubiquitous Computing and Communication Technologies 5.1: pp. 1-20, 2023.
[7]Hanuˇs, Jiˇr´ı., “Virtual piano using image processing,” Master’s thesis. Czech Technical University in Prague, Faculty of Information Technology, 2021.
[8]Wang, Yajing and Liang Song, “Virtual Piano System Based on Monocular Camera,” In Huang, DS., Jo, KH., Li, J., Gribova, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12837. Springer, Cham. 2021, doi:10.1007/978-3-030-84529-2_10.
[9]Hiranaka, A., Grown-Haeberli, E., and Xue, K., “AR Piano Playing Using Real-Time Hand Tracking,” 2022.
[10]C. C. Hui Fen and W. N. Lim, "Virtual Piano Dynamics Enhancement with Electromyography," Innovations in Intelligent Systems and Applications Conference (ASYU), Elazig, Turkey, pp. 1-6, 2021, doi:10.1109/ASYU52992.2021.9598948.
[11]K. Shang and Z. Wang, "A Music Performance Method Based on Visual Gesture Recognition," 2022 China Automation Congress (CAC), Xiamen, China, pp. 2624-2631, 2022, doi:10.1109/CAC57257.2022.10055445.
[12]M. Meier, P. Streli, A. Fender and C. Holz, "TapID: Rapid Touch Interaction in Virtual Reality using Wearable Sensing," 2021 IEEE Virtual Reality and 3D User Interfaces (VR), Lisboa, Portugal, pp. 519-528, 2021, doi:10.1109/VR50410.2021.00076.
[13]Wang, M., “Influence analysis of piano music immersion virtual reality cooperation based on mapping equation,” Applied Mathematics and Nonlinear Sciences, 8(1), pp. 1499-1508, 2022, doi:10.2478/amns.2022.2.0138.
[14]Y. Hu, B. Wang, C. Wu and K. J. R. Liu, "mmKey: Universal Virtual Keyboard Using a Single Millimeter-Wave Radio," in IEEE Internet of Things Journal, vol. 9, no. 1, pp. 510-524, 2022, doi:10.1109/JIOT.2021.3084560.
[15]Yang, Yue, Zhaowen Wang, and Zijin Li, "MuGeVI: A Multi-Functional Gesture-Controlled Virtual Instrument," Proceedings of the International Conference on New Interfaces for Musical Expression, 2023.
[16]Eswaran, K.C.A., Srivastava, A.P., Gayathri, M., “Hand Gesture Recognition for Human-Computer Interaction Using Computer Vision” In: Kottursamy, K., Bashir, A.K., Kose, U., Uthra, A. (eds) Deep Sciences for Computing and Communications. IconDeepCom 2022. Communications in Computer and Information Science, vol 1719. Springer, Cham, 2023, doi:10.1007/978-3-031-27622-4_7.
[17]Devendra Singh, Gaurav Das, Saad Y. Sait, “Hand Gesture Recognition System using Convolutional Neural Network,” Turkish Journal of Computer and Mathematics Education (TURCOMAT). 12, 11, pp. 5609–5616, May 2021, doi:10.17762/turcomat.v12i11.6811.
[18]Mali, Chirag, et al., “Design and Implementation of Hand Gesture Assistant Command Control Video Player Interface for Physically Challenged People,” Proceedings of the 5th International Conference on Communication and Information Processing (ICCIP), June 2023, doi:dx.doi.org/10.2139/ssrn.4626576.
[19]Paniti Netinant, Yannakorn Tuaktao, and Meennapa Rukhiran, “Development of Real-Time Hand Gesture for Volume Control Application using Python on Raspberry Pi,” In Proceedings of the 2022 5th International Conference on Software Engineering and Information Management (ICSIM '22). Association for Computing Machinery, New York, NY, USA, 1–5, 2022, doi:10.1145/3520084.3520085.
[20]“Mediapipe Machine Learning Solutions,” https://developers.google.com/mediapipe, last accessed 2024/04/25
[21]Remiro MÁ, Gil-Martín M, and San-Segundo R., “Improving Hand Pose Recognition Using Localization and Zoom Normalizations over MediaPipe Landmarks,” Engineering Proceedings; 58(1):69. 2023, doi:10.3390/ecsa-10-16215.
[22]Pople, V., Kanaujia, A., Vijayan, R. and Mareeswari, V., “Face Mask Detection for Real Time Video Streams,” ECS Transactions, 107(1), p.8275, 2022, doi:10.1149/10701.8275ecst.
[23]Damkjær, E. L., and Arleth, L., ““But Wait, There’s More!”–A Deeper Look into Gestures on Touch Interfaces,” 2019.
[24]“LilyPond – Music notation for everyone,” https://lilypond.org/, last accessed on 2024/04/25.
[25]https://www.kaggle.com/datasets/imsparsh/gesture-recognition, last accessed on 2024/07/20
[26]Singh, R.P., Singh, L.D., “Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methods,” Vis Comput, 2024, doi:10.1007/s00371-024-03307-4.
[27]Ramaraj, Vijayan, Mareeswari Venkatachala Appa Swamy, Ephzibah Evan Prince, and Chandan Kumar. "Improving the BERT model for long text sequences in question answering domain," Int J Adv Appl Sci ISSN 2252, no. 8814: 8814, 2023, doi:10.11591/ijaas.v13.i1.pp106-115.
[28]D. R. T. Hax, P. Penava, S. Krodel, L. Razova, and R. Buettner, "A Novel Hybrid Deep Learning Architecture for Dynamic Hand Gesture Recognition," in IEEE Access, vol. 12, pp. 28761-28774, 2024, doi:10.1109/ACCESS.2024.3365274.