Action Recognition Based on the Modified Twostream CNN

Full Text (PDF, 196KB), PP.15-23

Views: 0 Downloads: 0


Dan zheng 1 Hang Li 1,* Shoulin Yin 1,*

1. Software College, Shenyang Normal University, Shenyang 110034, China

* Corresponding author.


Received: 20 Oct. 2020 / Revised: 27 Oct. 2020 / Accepted: 3 Nov. 2020 / Published: 8 Dec. 2020

Index Terms

Action recognition, dual-channel, convolutional neural network.


Human action recognition is an important research direction in computer vision areas. Its main content is to simulate human brain to analyze and recognize human action in video. It usually includes individual actions, interactions between people and the external environment. Space-time dual-channel neural network can represent the features of video from both spatial and temporal perspectives. Compared with other neural network models, it has more advantages in human action recognition. In this paper, a action recognition method based on improved space-time two-channel convolutional neural network is proposed. First, the video is divided into several equal length non-overlapping segments, and a frame image representing the static feature of the video and a stacked optical flow image representing the motion feature are sampled at random part from each segment. Then these two kinds of images are input into the spatial domain and the temporal domain convolutional neural network respectively for feature extraction, and then the segmented features of each video are fused in the two channels respectively to obtain the category prediction features of the spatial domain and the temporal domain. Finally, the video action recognition results are obtained by integrating the predictive features of the two channels. Through experiments, various data enhancement methods and transfer learning schemes are discussed to solve the over-fitting problem caused by insufficient training samples, and the effects of different segmental number, pre-training network, segmental feature fusion scheme and dual-channel integration strategy on action recognition performance are analyzed. The experiment results show that the proposed model can better learn the human action features in a complex video and better recognize the action.

Cite This Paper

Dan zheng, Hang Lia, and Shoulin Yin. " Action Recognition Based on the Modified Two-stream CNN ", International Journal of Mathematical Sciences and Computing (IJMSC), Vol.6, No.6, pp.15-23, 2020. DOI: 10.5815/IJMSC.2020.06.03


[1] Lin Teng, Hang Li, Shoulin Yin, Shahid Karim &Yang Sun (2020). An active contour model based on hybrid energy and fisher criterion for image segmentation[J]. International Journal of Image and Data Fusion. vol.11, No. 1, pp. 97-112. 2020.

[2] Jing Yu, Hang Li and Desheng Liu (2020). Modified Immune Evolutionary Algorithm for IoT Big Data Clustering and Feature Extraction Under Cloud Computing Environment [J]. Journal of Healthcare Engineering, 1, 2020.

[3] Zhang Y, Cheng L, Wu J, et al (2016). Action Recognition in Still Images With Minimum Annotation Efforts[J]. IEEE Transactions on Image Processing, 2016, 25(11):5479-5490.

[4] Yi Y, Lin M (2016). Human action recognition with graph-based multiple-instance learning[J]. Pattern Recognition, 2016, 53(C):148-162.

[5] Shoulin Yin, Hang Li, Lin Teng, Man Jiang & Shahid Karim (2020). An optimised multi-scale fusion method for airport detection in large-scale optical remote sensing images [J]. International Journal of Image and Data Fusion, vol. 11, no. 2, pp. 201-214, 2020.

[6] Kapsouras I, Nikolaidis N (2018). A Vector of Locally Aggregated Descriptors Framework for Action Recognition on Motion Capture Data[C]// 2018 26th European Signal Processing Conference (EUSIPCO). 2018.

[7] Lin B, Fang B (2018). A New Spatial-temporal Histograms of Gradients Descriptor and HOD-VLAD Encoding for Human Action Recognition[J]. International Journal of Wavelets Multiresolution and Information Processing, 2018, 17(4).

[8] Zhigang, Hongyan, Dejun, et al (2018). Action-Stage Emphasized Spatio-Temporal VLAD for Video Action Recognition.[J]. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society, 2018.

[9] Cai Z, Wang L, Peng X, et al (2014). Multi-view Super Vector for Action Recognition[C]// Computer Vision & Pattern Recognition. IEEE, 2014.

[10] Wang L, Koniusz P, Huynh D Q (2019). Hallucinating Bag-of-Words and Fisher Vector IDT terms for CNN-based Action Recognition[J]. ICCV, 2019. arXiv:1906.05910

[11] Peng X, Wang L, Wang X, et al (2016). Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice[J]. Computer Vision & Image Understanding, 150(Sep.):109-125, 2016.

[12] Wang L, Xiong Y, Wang Z, et al (2016). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[C]// European Conference on Computer Vision. Springer, Cham, 2016.

[13] Karpathy A, Toderici G, Shetty S, et al (2014). Large-Scale Video Classification with Convolutional Neural Networks[C]// Computer Vision & Pattern Recognition. IEEE, 2014.

[14] Simonyan K, Zisserman A (2014). Two-Stream Convolutional Networks for Action Recognition in Videos[J]. Advances in neural information processing systems, 2014, 1.

[15] Ng Y H, Hausknecht M, Vijayanarasimhan S, et al (2015). Beyond Short Snippets: Deep Networks for Video Classification[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015.

[16] Yin, S., Li, H (2020). GSAPSO-MQC:medical image encryption based on genetic simulated annealing particle swarm optimization and modified quantum chaos system. Evolutionary Intelligence (2020). doi:10.1007/s12065-020-00440-6

[17] Xiaowei Wang, Shoulin Yin, Desheng Liu, Hang Li & Shahid Karim (2020). Accurate playground localisation based on multi-feature extraction and cascade classifier in optical remote sensing images [J]. International Journal of Image and Data Fusion, vol. 11, no. 3. pp. 233-250, 2020.

[18] S. Yin and H. Li (2020). Hot Region Selection Based on Selective Search and Modified Fuzzy C-Means in Remote Sensing Images[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5862-5871, 2020, doi: 10.1109/JSTARS.2020.3025582

[19] Jing Yu and Lulu Zhao (2021). A Novel Deep CNN Method Based on Aesthetic Rule for User Preferential Images Recommendation[J]. Journal of Applied Science and Engineering. Volume 24, Issue 1, 2021.