A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering

Full Text (PDF, 1170KB), PP.19-28

Views: 0 Downloads: 0


Ajinkya N. Jadhav 1,* Nagaraj V. Dharwadkar 1

1. Department of Computer science and Engineering, Rajarambapu Institute of Technology, Islampur 415414, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2018.11.03

Received: 31 Jul. 2018 / Revised: 3 Sep. 2018 / Accepted: 7 Oct. 2018 / Published: 8 Nov. 2018

Index Terms

Speaker Identification, MFCC, GMM, End-pointing


The automated speaker endorsement technique used for recognition of a person by his voice data. The speaker identification is one of the biometric recognition and they were also used in government services, banking services, building security and intelligence services like this applications. The exactness of this system is based on the pre-processing techniques used to select features produced by the voice and to identify the speaker, the speech modeling methods, as well as classifiers, are used. Here, the edges and continuous quality point are eliminated in the normalization process. The Mel-Scale Frequency Cepstral Coefficient is one of the methods to grab features from a wave file of spoken sentences. The Gaussian Mixture Model technique is used and done experiments on MARF (Modular Audio Recognition Framework) framework to increase outcome estimation. We have presented an end pointing elimination in Gaussian selection medium for MFCC.

Cite This Paper

Ajinkya N. Jadhav, Nagaraj V. Dharwadkar, " A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering", International Journal of Modern Education and Computer Science(IJMECS), Vol.10, No.11, pp. 19-28, 2018. DOI:10.5815/ijmecs.2018.11.03


[1]T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors", Speech Communication, vol. 52, no. 1, pp. 12-40, 2010.
[2]L. Zhu and Q. Yang, "Speaker Recognition System Based on weighted feature parameter", Physics Procedia, vol. 25, pp. 1515-1522, 2012.
[3]Ling Feng. Speaker Recognition. MS thesis. Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2004.
[4]H. Veisi and H. Sameti, "Speech enhancement using hidden Markov models in Mel-frequency domain", Speech Communication, vol. 55, no. 2, pp. 205-220, 2013.
[5]S. Ranjan and J. Hansen, "Curriculum Learning Based Approaches for Noise Robust Speaker Recognition", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 197-210, 2017.
[6]A. Mansour and Z. Lachiri, "SVM based Emotional Speaker Recognition using MFCC-SDC Features", International Journal of Advanced Computer Science and Applications, vol. 8, no. 4, 2017.
[7]P. Pal Singh, "An Approach to Extract Feature using MFCC", IOSR Journal of Engineering, vol. 4, no. 8, pp. 21-25, 2014.
[8]S. Chougule and M. Chavan, "Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition", Procedia Computer Science, vol. 58, pp. 272-279, 2015.
[9]Alsulaiman, Mansour, et al. "A technique to overcome the problem of small size database for automatic speaker recognition." Digital Information Management (ICDIM), 2010 Fifth International Conference on. IEEE, 2010.
[10]S. Paulose, D. Mathew and A. Thomas, "Performance Evaluation of Different Modeling Methods and Classifiers with MFCC and IHC Features for Speaker Recognition", Procedia Computer Science, vol. 115, pp. 55-62, 2017.
[11]N. Dehak, P. Dumouchel and P. Kenny, "Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification", IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 7, pp. 2095-2103, 2007.
[12]P. Kenny, G. Boulianne, P. Ouellet and P. Dumouchel, "Speaker and Session Variability in GMM-Based Speaker Verification", IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 4, pp. 1448-1460, 2007.
[13]C. Champod and D. Meuwly, "The inference of identity in forensic speaker recognition", Speech Communication, vol. 31, no. 2-3, pp. 193-203, 2000.
[14]M. Alsulaiman, A. Mahmood and G. Muhammad, "Speaker recognition based on Arabic phonemes", Speech Communication, vol. 86, pp. 42-51, 2017.
[15]D. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models", Speech Communication, vol. 17, no. 1-2, pp. 91-108, 1995.
[16]F. Bie, D. Wang, J. Wang and T. Zheng, "Detection and reconstruction of clipped speech for speaker recognition", Speech Communication, vol. 72, pp. 218-231, 2015.
[17]Revathi A., R. Ganapathy, and Y. Venkataramani. "Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach." International Journal of Computer science & Information Technology (IJCSIT) 1.2 (2009): 30-42.
[18]https://sourceforge.net/projects/marf/files/Applications/%5Bf%5D%20SpeakerIdentApp/0.3.0-devel-20050730/SpeakerIdentApp-samples-0.3.0-devel-20050730.tar.gz/download?use_mirror= master&download= [Online, accessed on 10 February, 2018].
[19]Saeidi, Rahim. "Advances in Front-end and Back-end for Speaker Recognition." nition: a feature based approach 13.5 (2011): 58-71.
[20]Mokhov, Serguei A. "On Design and Implementation of the Distributed Modular Audio Recognition Framework: Requirements and Specification Design Document." arXiv preprint arXiv:0905.2459 (2009).
[21]Asadullah, Muhammad, and Shibli Nisar. "A SILENCE REMOVAL AND ENDPOINT DETECTION APPROACH FOR SPEECH PROCESSING." Sarhad University International Journal of Basic and Applied Sciences 4.1 (2017): 10-15.
[22]Saha, G., Sandipan Chakroborty, and Suman Senapati. "A new silence removal and endpoint detection algorithm for speech and speaker recognition applications." Proceedings of the 11th national conference on communications (NCC). 2005.
[23]N. Singh, R. Khan and R. Shree, "Applications of Speaker Recognition", Procedia Engineering, vol. 38, pp. 3122-3126, 2012.
[24]El-Yazeed, MF Abu, NS Abdel Kader, and M. M. El-Henawy. "A modified group vector quantization algorithm for speaker identification." Circuits and Systems, 2003 IEEE 46th Midwest Symposium on. Vol. 2. IEEE, 2003.
[25]Sakka, Zied, et al. "A new method for speech denoising and speaker verification using subband architecture." Control, Communications and Signal Processing, 2004. First International Symposium on. IEEE, 2004.