Amirreza Shirani; Ahmad Reza Naghsh Nilchi

Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier

Full Text (PDF, 366KB), PP.39-45

Views: 0 Downloads: 0

Author(s)

Amirreza Shirani ^1,* Ahmad Reza Naghsh Nilchi ¹

1. Department of Computer Engineering, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2016.04.05

Received: 17 Dec. 2015 / Revised: 21 Jan. 2016 / Accepted: 26 Feb. 2016 / Published: 8 Apr. 2016

Index Terms

Emotion recognition, speech analysis, feature selection, support vector machine

Abstract

The aim of this paper is to utilize Support Vector Machine (SVM) as feature selection and classification techniques for audio signals to identify human emotional states. One of the major bottlenecks of common speech emotion recognition techniques is to use a huge number of features per utterance which could significantly slow down the learning process, and it might cause the problem known as "the curse of dimensionality". Consequently, to ease this challenge this paper aims to achieve high accuracy system with a minimum set of features. The proposed model uses two methods, namely "SVM features selection" and the common "Correlation-based Feature Subset Selection (CFS)" for the feature dimensions reduction part. In addition, two different classifiers, one Support Vector Machine and the other Neural Network are separately adopted to identify the six emotional states of anger, disgust, fear, happiness, sadness and neutral. The method has been verified using Persian (Persian ESD) and German (EMO-DB) emotional speech databases, which yield high recognition rates in both databases. The results show that SVM feature selection method provides better emotional speech-recognition performance compared to CFS and baseline feature set. Moreover, the new system is able to achieve a recognition rate of (99.44%) on the Persian ESD and (87.21%) on Berlin Emotion Database for speaker-dependent classification. Besides, promising result (76.12%) is obtained for speaker-independent classification case; which is among the best-known accuracies reported on the mentioned database relative to its little number of features.

Cite This Paper

Amirreza Shirani, Ahmad Reza Naghsh Nilchi,"Speech Emotion Recognition based on SVM as Both Feature Selector and Classifier", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.4, pp.39-45, 2016. DOI: 10.5815/ijigsp.2016.04.05

Reference

[1]B. Schuller, G. Rigoll, and M. Lang, "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture," in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on, 2004, pp. I-577-80 vol. 1.

[2]M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognition, vol. 44, pp. 572-587, 2011.

[3]H. Jin, L. T. Yang, and J. J.-P. Tsai, Ubiquitous Intelligence and Computing: Third International Conference, UIC 2006, Wuhan, China, September 3-6, 2006, Proceedings vol. 4159: Springer, 2006.

[4]N. Keshtiari, M. Kuhlmann, M. Eslami, and G. Klann-Delius, "Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)," Behavior research methods, vol. 47, pp. 275-294, 2015.

[5]F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, "A database of German emotional speech," in Interspeech, 2005, pp. 1517-1520.

[6]Y. Zhou, Y. Sun, L. Yang, and Y. Yan, "Applying articulatory features to speech emotion recognition," in Research Challenges in Computer Science, 2009. ICRCCS'09. International Conference on, 2009, pp. 73-76.

[7]B. Schuller, R. J. Villar, G. Rigoll, and M. K. Lang, "Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition," in ICASSP (1), 2005, pp. 325-328.

[8]S. Wang, X. Ling, F. Zhang, and J. Tong, "Speech emotion recognition based on principal component analysis and back propagation neural network," in Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on, 2010, pp. 437-440.

[9]T. Kostoulas, T. Ganchev, A. Lazaridis, and N. Fakotakis, "Enhancing emotion recognition from speech through feature selection," in Text, speech and dialogue, 2010, pp. 338-344.

[10]C. N. Anagnostopoulos and E. Vovoli, "Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database," in Information systems development, ed: Springer, 2010, pp. 413-421.

[11]S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, "Bimodal fusion of emotional data in an automotive environment," in Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05). IEEE International Conference on, 2005, pp. ii/1085-ii/1088 Vol. 2.

[12]X. M. Cheng, P. Y. Cheng, and L. Zhao, "A study on emotional feature analysis and recognition in speech signal," in Measuring Technology and Mechatronics Automation, 2009. ICMTMA'09. International Conference on, 2009, pp. 418-420.

[13]H. Atassi and A. Esposito, "A speaker independent approach to the classification of emotional vocal expressions," in Tools with Artificial Intelligence, 2008. ICTAI'08. 20th IEEE International Conference on, 2008, pp. 147-152.

[14]M. Lugger and B. Yang, "An incremental analysis of different feature groups in speaker independent emotion recognition," in 16th Int. Congress of Phonetic Sciences, 2007.

[15]H. K. Mishra and C. C. Sekhar, "Variational Gaussian mixture models for speech emotion recognition," in Advances in Pattern Recognition, 2009. ICAPR'09. Seventh International Conference on, 2009, pp. 183-186.

[16]L. Fu, X. Mao, and L. Chen, "Speaker independent emotion recognition using hmms fusion system with relative features," in Intelligent Networks and Intelligent Systems, 2008. ICINIS'08. First International Conference on, 2008, pp. 608-611.

[17]L. Fu, X. Mao, and L. Chen, "Relative speech emotion recognition based artificial neural network," in Computational Intelligence and Industrial Application, 2008. PACIIA'08. Pacific-Asia Workshop on, 2008, pp. 140-144.

[18]T. Iliou and C.-N. Anagnostopoulos, "Comparison of different classifiers for emotion recognition," in Informatics, 2009. PCI'09. 13th Panhellenic Conference on, 2009, pp. 102-106.

[19]B. Schuller, R. Müller, M. K. Lang, and G. Rigoll, "Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles," in INTERSPEECH, 2005, pp. 805-808.

[20]B.-C. Chiou and C.-P. Chen, "Speech Emotion Recognition with Cross-lingual Databases," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.

[21]B. Vlasenko, B. Schuller, A. Wendemuth, and G. Rigoll, "Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing," in Affective Computing and Intelligent Interaction, ed: Springer, 2007, pp. 139-147.

[22]I. Luengo, E. Navas, and I. Hernáez, "Feature analysis and evaluation for automatic emotion identification in speech," Multimedia, IEEE Transactions on, vol. 12, pp. 490-501, 2010.

[23]Y. Pan, P. Shen, and L. Shen, "Speech emotion recognition using support vector machine," International Journal of Smart Home, vol. 6, pp. 101-107, 2012.

[24]A. S. Utane and S. Nalbalwar, "Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine," emotion, vol. 2, p. 8, 2013.

[25]C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, "Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011," Artificial Intelligence Review, vol. 43, pp. 155-177, 2015.

[26]S. Wu, T. H. Falk, and W.-Y. Chan, "Automatic recognition of speech emotion using long-term spectro-temporal features," in Digital Signal Processing, 2009 16th International Conference on, 2009, pp. 1-6.

[27]S. Yun and C. D. Yoo, "Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen's emotion model," in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, 2009, pp. 4169-4172.

[28]C. N. Anagnostopoulos and E. Vovoli, "Sound processing features for speaker-dependent and phrase-independent emotion recognition in Berlin Database," in Information systems development, ed: Springer, 2009, pp. 413-421.

[29]P. Boersma and D. Weenink, "Praat, a system for doing phonetics by computer," 2001.

[30]I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine learning, vol. 46, pp. 389-422, 2002.

[31]M. Hall, "Correlation-based Feature Subset Selection for Machine Learning, 1998," Hamilton, New Zealand.

[32]J. Platt, "Fast training of support vector machines using sequential minimal optimization," Advances in kernel methods—support vector learning, vol. 3, 1999.

International Journal of Image, Graphics and Signal Processing (IJIGSP)