Mareeswari V.

Work place: School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology (VIT), Vellore, 632 014, India

E-mail: vmareeswari@vit.ac.in

Website: https://orcid.org/0000-0002-2768-943X

Research Interests:

Biography

Dr. Mareeswari V. is working as an Associate Professor at the School of Computer Science Engineering and Information Systems (SCORE), Vellore Institute of Technology (VIT), Vellore, Tamilnadu, India - 632014. She has over 21 years of academic and research experience. She received her Ph.D. in Information Technology and Engineering in the Web Service domain. She has produced several national and international articles in reputed journals, conferences, and book chapters. Her area of interest includes Medical Image processing using Machine and Deep Learning, Data analytics on Blockchain, Prediction systems on Web Service Ranking, Internet of Things (IoT), Routing on Mobile Ad-hoc networks, and E-Commerce systems. She is a life member of the Computer Society of India (CSI) and the Soft Computing and Research Society (SCRS).

Author Articles

Hand Gesture-controlled 2D Virtual Piano with Volume Control

By Vijayan R. Mareeswari V. Sarathi G. Sathya Nikethan R. V.

DOI: https://doi.org/10.5815/ijitcs.2025.05.02, Pub. Date: 8 Oct. 2025

The rise of virtual instruments has revolutionized music production, providing new avenues for creating music without the need for physical instruments. However, these systems rely on costly hardware, such as MIDI controllers, limiting accessibility. As an alternative, 3D gesture-based virtual instruments have been explored to emulate the immersive experience of MIDI controllers. Yet, these approaches introduce accessibility challenges by requiring specialized hardware, such as depth-sensing cameras and motion sensors. In contrast, 2D gesture systems using RGB cameras are more affordable but often lack extended functionalities. To address these challenges, this study presents a 2D virtual piano system that utilizes hand gesture recognition. The system enables accurate gesture-based control, real-time volume adjustments, control over multiple octaves and instruments, and automatic sheet music generation. OpenCV, an open-source computer vision library, and Google’s MediaPipe are employed for real-time hand tracking. The extracted hand landmark coordinates are normalized based on the wrist and scaled for consistent performance across various RGB camera setups. A bidirectional long short-term memory (Bi-LSTM) network is used to evaluate the approach. Experimental results show 95% accuracy on a public Kaggle dynamic gesture dataset and 97% on a custom-designed dataset for virtual piano gestures. Future work will focus on integrating the system with Digital Audio Workstations (DAWs), adding advanced musical features, and improving scalability for multiple-player use.

[...] Read more.

Developing Audio-to-Text Converters with Natural Language Processing for Smart Assistants

By Pratistha Tulsyan Mareeswari V. Vijayan Ramaraj Suji R.

DOI: https://doi.org/10.5815/ijmecs.2025.04.05, Pub. Date: 8 Aug. 2025

In recent years, smart assistants have transformed human interaction with technology, offering voice-controlled interactions like music playback and information retrieval. However, existing systems often struggle with accurately interpreting natural language input. To address it, this proposed work aims to develop an audio-to-text converter integrated with natural language processing (NLP) capabilities to enhance interactions of smart assistants. Additionally, the system will incorporate intent recognition to discern user intentions and generate relevant responses. The proposed work commenced with a literature survey to gather insights into existing smart assistant systems. Based on the findings, a comprehensive architecture was designed, integrating NLP techniques like tokenization and lemmatization. The implementation phase involved developing and training a Feedforward Neural Network (FNN) model tailored for NLP tasks, leveraging Python and libraries like TensorFlow and NLTK. Testing evaluated the system's performance using standard evaluation metrics, including Word Error Rate (WER) and Character Error Rate (CER), across various audio input conditions. The system exhibited higher WER and CER with accented speech (15.3% and 7.9% respectively) while the clean audio dataset produced WER and CER of 4.7% and 2.55% respectively. The proposed work also involved training the FNN model while monitoring training loss and accuracy to ensure model performance. Ultimately, the model achieved an accuracy of 97.62% with training loss reduced to 1.45%. Insights from the training phase inform further optimization efforts to improve system performance. It practices the Google WebSpeech API and compares it with other Speech-to-text models. In conclusion, our proposed work represents a significant step towards realizing seamless voice-controlled interactions with smart assistants and enhancing user experiences and productivity. Future work includes refining the system architecture, optimizing model performance, and expanding the capabilities of the smart assistant for broader application domains.

[...] Read more.

Traffic Sign Detection and Recognition Using Yolo Models

By Mareeswari V. Vijayan R. Shajith Nisthar Rahul Bala Krishnan

DOI: https://doi.org/10.5815/ijitcs.2025.03.02, Pub. Date: 8 Jun. 2025

With the proliferation of advanced driver assistance systems and continued advances in autonomous vehicle technology, there is a need for accurate, real-time methods of identifying and interpreting traffic signs. The importance of traffic sign detection can't be overstated, as it plays a pivotal role in improving road safety and traffic management. This proposed work suggests a unique real-time traffic sign detection and recognition approach using the YOLOv8 algorithm. Utilizing the integrated webcams of personal computers and laptops, we capture live traffic scenes and train our model using a meticulously curated dataset from Roboflow. Through extensive training, our YOLOv8 version achieves an excellent accuracy rate of 94% compared to YOLOV7 at 90.1% and YOLOv5 at 81.3%, ensuring reliable detection and recognition across various environmental conditions. Additionally, this proposed work introduces an auditory alert feature that notifies the driver with a voice alert upon detecting traffic signs, enhancing driver awareness and safety. Through rigorous experimentation and evaluation, we validate the effectiveness of our approach, highlighting the importance of utilizing available hardware resources to deploy traffic sign detection systems with minimal infrastructure requirements. Our findings underscore the robustness of YOLOv8 in handling challenging traffic sign recognition tasks, paving the way for widespread adoption of intelligent transportation technologies and fostering the introduction of safer and more efficient road networks. In this paper, we compare the unique model of YOLO with YOLOv5, YOLOv7, and YOLOv8, and find that YOLOv8 outperforms its predecessors, YOLOv7 and YOLOv5, in traffic sign detection with an excellent overall mean average precision of 0.945. Notably, it demonstrates advanced precision and recall, especially in essential sign classes like "No overtaking" and "Stop," making it the favored preference for accurate and dependable traffic sign detection tasks.

[...] Read more.

MECS Press Menu

Mareeswari V.

Author Articles

Hand Gesture-controlled 2D Virtual Piano with Volume Control

Developing Audio-to-Text Converters with Natural Language Processing for Smart Assistants

Traffic Sign Detection and Recognition Using Yolo Models

Other Articles