Spatial-Temporal Shape and Motion Features for Dynamic Hand Gesture Recognition in Depth Video

Full Text (PDF, 932KB), PP.17-26

Views: 0 Downloads: 0


Vo Hoai Viet 1,* Nguyen Thanh Thien Phuc 1 Pham Minh Hoang 1 Liu Kim Nghia 1

1. University of Science, Ho Chi Minh City, 700000, Viet Nam

* Corresponding author.


Received: 2 Mar. 2018 / Revised: 27 Apr. 2018 / Accepted: 22 Jun. 2018 / Published: 8 Sep. 2018

Index Terms

HCI, dynamic hand gesture, depth sequences, HOG2, HOF2


Human-Computer Interaction (HCI) is one of the most interesting and challenging research topics in computer vision community. Among different HCI methods, hand gesture is the natural way of human-computer interaction and is focused on by many researchers. It allows the human to use their hand movements to interact with machine easily and conveniently. With the birth of depth sensors, many new techniques have been developed and gained a lot of achievements. In this work, we propose a set of features extracted from depth maps for dynamic hand gesture recognition. We extract HOG2 for shape and appearance of hand in gesture representation. Moreover, to capture the movement of the hands, we propose a new feature named HOF2, which is extracted based on optical flow algorithm. These spatial-temporal descriptors are easy to comprehend and implement but perform very well in multi-class classification. They also have a low computational cost, so it is suitable for real-time recognition systems. Furthermore, we applied Robust PCA to reduce feature’s dimension to build robust and compact gesture descriptors. The robust results are evaluated by cross-validation scheme using a SVM classifier, which shows good outcome on challenging MSR Hand Gestures Dataset and VIVA Challenge Dataset with 95.51% and 55.95% in accuracy, respectively. 

Cite This Paper

Vo Hoai Viet, Nguyen Thanh Thien Phuc, Pham Minh Hoang, Liu Kim Nghia, " Spatial-Temporal Shape and Motion Features for Dynamic Hand Gesture Recognition in Depth Video", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.10, No.9, pp. 17-26, 2018. DOI: 10.5815/ijigsp.2018.09.03


[1]Arjunlal, and Minu Lalitha Madhavu, “A survey on hand gesture recognition and hand tracking”, International Journal of Scientific Engineering and Applied Science, 2016.

[2]Alexander Cardona López, “Hand Recognition Using Depth Cameras”, TECCIENCIA, 2015.

[3]Arpita Ray Sarkar, G. Ganyal, and S. Majumder, “Hand Gesture Recognition Systems: A survey”, International Journal of Computer Application, vol 71, no.15, 2013.

[4]Chong Wang, Zhong Liu, and Shing-Chow Chan, “Superpixel-Based Hand Gesture Recognition with Kinect Depth Camera”, IEEE Transactions on Multimedia, vol 17, pp 29 - 39, 2015.

[5]Zhou Ren, Junsong Yuan, Jingjing Meng, and Zhengyou Zhang, “Robust Part-Based Hand Gesture Recognition Using Kinect Sensor”,  IEEE Transactions on Multimedia, vol 15, no 5, pp 1110 - 1120, 2013.

[6]Minoo Hamissi, and Karim Faez, “Real-time Hand Gesture Recognition Based on the Depth Map for Human Robot Interaction”, International Journal of Electrical and Computer Engineering (IJECE), vol 3, no 6, 2013.

[7]F. Dominio, M. Donadeo, G. Marin, P. Zanuttigh, and G. M. Cortelazzo, “Hand gesture recognition with depth data”, 4th IEEE international workshop on Analysis and retrieval of tracked events and motion in imagery stream, pp 9-16, 2013.

[8]Hasan Mahmud, Md. Kamrul Hasan, and Abdullah-Al-Tariq, “Hand Gesture Recognition Using SIFT Features on Depth Image”, The 9th International Conference on Advances in Computer-Human Interactions (ACHI), pp 59-64, 2016.

[9]Cliff Chan, Seyed Sepehr Mirfakharaei, “Hand Gesture Recognition using Kinect”, Boston University, 2013.

[10]Yi Li, “Hand gesture recognition using Kinect”, Thesis, University of Louisville, 2012.

[11]De Gu, “Fingertip Tracking and Hand Gesture Recognition by 3D Vision”, International Journal of Computer Science and Information Technologies (IJCSIT), vol. 6 (6), pp 5421-5424, 2015.

[12]S. Poularakis, and I. Katsavounidis, “Fingertip Detection and Hand Posture Recognition based on Depth Information”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2014.

[13]Zhou Ren, Jingjing Meng, and Junsong Yuan, “Depth Camera Based Hand Gesture Recognition and its Applications in Human-Computer-Interaction”, 8th International Conference on Information”, Communications and Signal Processing (ICICS), 2011.

[14]Zahid Halim, and Ghulam Abbas, “A Kinect-Based Sign Language Hand Gesture Recognition System for Hearing and Speech Impaired: A Pilot Study of Pakistani Sign Language”, Assistive Technology, the Official Journal of RESNA, vol 27 no 1, pp 34-43 2014.

[15]Hui Liang, Junsong Yuan, and Daniel Thalmann, “3D Fingertip and Palm Tracking in Depth Image Sequences”, 20th ACM International Conference on Multimedia, pp 785-788, 2012.

[16]A. Kurakin, Z. Zhang, and Z. Liu, “A Realtime System for Dynamic Hand Gesture Recognition with a Depth Sensor”, 20th European Signal Conference (EUSIPCO), 2012.

[17]Hironori Takimoto, Jaemin Lee, and Akihiro Kanagawa, “A Robust Gesture Recognition Using Depth Data”, International Journal of Machine Learning and Computing, 2013.

[18]Quentin D. Smedt, Hazem Wannous, and J. P. Vandeborre, “Skeleton-based Dynamic hand gesture recognition”, IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016.

[19]Diego G. Santos, Bruno J. T. Fernandes, Byron L. D. Bezzerra, “HAGR-D: A Novel Approach for Gesture Recognition with Depth Maps”, SENSORS-BASEL, vol 15, no 11, 2015.

[20]Daniel James Ryan, “Finger and gesture recognition with Microsoft Kinect”, Thesis, University of Stavanger, 2012.

[21]Toyin Osunkoya, John-Chern Chern, “Gesture-Based Human-Computer-Interaction Using Kinect for Windows Mouse Control and PowerPoint Presentation”, Chicago State University, 2013.

[22]Tomoya Murata, Jungpil Shin, “Hand Gesture and Character Recognition Based on Kinect Sensor”, International Journal of Distributed Sensor Network, 2014.

[23]Eshed Ohn-Bar, and Mohan M. Trivedi, “Joint Angles Similarities and HOG2 for Action Recognition”, IEEE Conference on Computer Vision and Pattern Recognition Workshops: Human Activity Understanding from 3D Data, 2013.

[24]Eshed Ohn-Bar, and Mohan M. Trivedi, “Hand Gesture Recognition in Real-Time for Automotive Interfaces: A Multimodal Vision-based Approach and Evaluations”, IEEE Transactions on Intelligent Transportation Systems, vol 15, no 6, pp 2368 - 2377 , 2014.

[25]Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright, “Robust Principal Component Analysis?”, Stanford University, 2009.

[26]Navneet Dalal, Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, Computer Vision and Pattern Recognition, pp  886-893, 2005.

[27]Gunner Fernback, “Two-Frame Motion Estimation Based on Polynomial Expansion”, SCIA, pp 363-370, 2003.

[28]Koby Crammer, Yoram Singer, “On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines”, Journal of Machine Learning Research, vol 2, pp 265-292, 2001.

[29]Chih-Wei Hsu, and Chih-Jen Lin, “A Comparison of Methods for Multi-class Support Vector Machines”, IEEE Computational Intelligence Society,  vol 13, no 2, pp 415 – 425, 2002.

[30]CVRR-HANDS 3D Dataset:

[31]MSR 3D Dataset:

[32]Zhouchen Lin, Arvind Ganesh, John Wright, Leqin Wu, Minming Chen, Yi Ma, “Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix”, Intl. Workshop on Comp. Adv. in Multi-Sensor Adapt. Processing, Aruba, Dutch Antilles , 2009.

[33]Pavlo Molchanov, Shalini Gupta, Kihwan Kim, and Jan Kaut, “Hand Gesture Recognition with 3D Convolutional Neural Networks”, In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015.