Determination of Representativity for Bangla Vowel Perceptual Space

Full Text (PDF, 1318KB), PP.63-70

Views: 0 Downloads: 0


Md. Mahbub Hasan 1,* Sathi Rani Mitra 1

1. Department of Electrical and Electronic Engineering Khulna University of Engineering & Technology Khulna-9203, Bangladesh

* Corresponding author.


Received: 6 Nov. 2017 / Revised: 1 Dec. 2017 / Accepted: 7 Dec. 2017 / Published: 8 Mar. 2018

Index Terms

Representativity, Average directional vector, Formant frequency, Variance-covariance matrices, Mahalanobis distance, Eigen-value and Eigenvector


In this article, representativity between two multidimensional acoustical spaces of vowel has been formulated based on the geometric mean of correlation of average directional vector, variance-covariance matrices, and Mahalanobis distance. Generally, the multidimensional spaces formed by different combinations of acoustical features of vowel are considered as the vowel perceptual spaces. Therefore, ten bangla vowel-sounds (/অ/ [/a/], /আ/ [/ã/], / ই/ [/i/] , /ঈ/ [/ĩ/], /উ/ [/u/], / ঊ/ /ũ/, /এ/ [/e/], /ঐ/ [/ai/] , /ও/ [/o/] and /ঔ/ [/au/]) are collected from each native Bengali speaker to build the perceptual space of the speaker using the acoustical features of vowels. Similarly, total nine perceptual spaces are constructed from nine speakers and these are utilized to evaluate representativity. Using the proposed method, representativities of differently constructed perceptual spaces have been evaluated and compared numerically. Furthermore, dominating and representative acoustical features are also identified from the principal components of the perceptual spaces.

Cite This Paper

Md. Mahbub Hasan, Sathi Rani Mitra, "Determination of Representativity for Bangla Vowel Perceptual Space", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.3, pp.63-70, 2018. DOI:10.5815/ijitcs.2018.03.07


[1]S. E. G. Ohman, “Coarticulation in VCV utterances: spectrographic measurements,” Journal of Acoustical Society of America, vol. 39, no. 1, pp. 151-168, 1966. 

[2]K. N. Stevens and A. S. House, “Perturbation of vowel Articulations by consonantal Context: An Acoustical Study,” Journal of Speech & Hearing Research, vol. 6, no. 2, pp. 111-128, 1963. 

[3]M. Andermann, R. D. Patterson, C. Vogt, L. Winterstetter and A. Rupp, “Neuromagnetic correlates of voice pitch, vowel type, and speaker size in auditory cortex,” NeuroImage, vol. 158, pp.79-89, 2017. 

[4]C. T. Engineer, K. C. Rahebi, E. P. Buell, M. K. Fink and M. P. Kilgard, “Speech training alters consonant and vowel responses in multiple auditory cortex fields,” Behavioural Brain Research, vol. 287, no.1, pp. 256-264, 2015. 

[5]Saloni, R. K. Sharma, and A. K. Gupta, "Classification of High Blood Pressure Persons Vs Normal Blood Pressure Persons Using Voice Analysis", International Journal Image, Graphics and Signal Processing, vol.6, no.1, pp.47-52, 2014.

[6]G. Nijhawan, and M.K Soni, “A New Design Approach for Speaker Recognition Using MFCC and VAD,” International Journal Image, Graphics and Signal Processing, vol.5, no.9, pp.43-49, 2013. 

[7]J. K. Bizley and Y. E. Cohen, “The what, where and how of auditory-object perception,” Nature Reviews Neuroscience, vol.14, no. 10, pp. 693-707, 2013. 

[8]J. E. Peelle, J. Gross and M. H. Davis, “Phase-locked responses to speech in human auditory cortex are enhanced during comprehension,” Cerebral Cortex, vol. 23, no. 6, pp. 1378-1387, 2013. 

[9]P. B. Schafer and D. Z. Jin, “Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding,” Neural Computation, vol. 26, no.3, pp. 523-556, 2014. 

[10]M. L. Jepsen, S. D. Ewert and T. Dau, “A computational model of human auditory signal processing and perception,” Journal of Acoustical Society of America, vol.124, no.1, pp.422-438, 2008 

[11]K. M. Walker, J. K. Bizley, A. J. King and J. W. Schnupp, “Multiplexed and robust representations of sound features in auditory cortex,” Journal of Neuroscience, vol.31, no. 41, pp. 14565-14576, 2011. 

[12]B. E. F. Lindblom and M. S. Kennedy, “On the role of formant transitions in vowel Recognition,” Journal of Acoustical Society of America, vol. 42, no. 4, pp.832-843, 1967. 

[13]X. Wang and K. K. Paliwal, “Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition,” Pattern Recognition, vol. 36, no. 10, pp. 2429-2439, 2003. 

[14]S. Lopez, P. Riera, M. F. Assaneo, M. Eguia, M. Sigman and M. A. Trevisan, “Vocal caricatures reveal signatures of speaker identity,” Nature Scientific Reports 3, 3407,doi:10.1038/srep03407, 2013. 

[15]R. Cotterell and J. Eisner, “Probabilistic Typology: Deep Generative Models of Vowel Inventories,” Proceedings of 55th Annual meeting of the association for computational linguistics, vol. 1, pp. 1182-1192, 2017.

[16]M. R. Molis, “Evaluating models of vowel perception,” Journal of Acoustical Society of America, vol.118, no.2, pp.1062-1071, 2005. 

[17]L. Polka and O. S. Bohn, “Natural Referent Vowel (NRV) framework: An emerging view of early phonetic development,” Journal of Phonetics, vol. 39, no. 4, pp. 467-478, 2011. 

[18]V. Hacquard , M. A. Walter and A. Marantz, “The effects of inventory on vowel perception in French and Spanish: An MEG study,” Brain and Language, vol. 100, no. 3, pp. 295-300, 2007. 

[19]H. Matsumoto, S. Hiki, T. Sone and T. Nimura, “Multidimensional representation of personal quality of vowels and its acoustical correlates,” IEEE Transacrions on Audio and Electroacoustics, vol.AU-21, no.5, pp. 428-436, 1973. 

[20]B. E. Walden, A. A. Montgomery, G. T. Gibeily, R. A. Prosek and D. M. Schwartz, “Correlates of psychological dimensions in talker similarity,” Journal of Speech and Hearing Research, vol.21, pp.265-275, 1978. 

[21]J. Kreiman, B. R. Gerratt, K. Precoda and G. S. Berke, “Individual differences in voice quality perception,” Journal of Speech and Hearing Research, vol.35, pp.512-520, 1992.

[22]P. Divenyi, “Perception of complete and incomplete formant transitions in vowels,” Journal of Acoustical Society of America, vol.126, no.3, pp. 1427-1439, 2009. 

[23]K. J. Reilly and K. E. Dougherty, “The role of vowel perceptual cues in compensatory responses to perturbations of speech auditory feedback,” Journal of Acoustical Society of America, vol.134, no. 2, pp.1314-1323, 2013. 

[24]D. J. Rimbaud, D. L. Massart, C. A. Saby and C. Puel, “Determination of the representativity between two multidimensional data sets by a comparison of their structure,” Chemometrics and Intelligent Laboratory Systems, vol. 40, no. 2, pp.129-144, 1998. 


[26]P. Boersma and D. Weenink , Praat: doing phonetics by computer [Computer program]. Version 5.4.01, retrieved 9 November 2014 from  

[27]R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2013.( 

[28]O. Baumann and P. Belin, “Perceptual scaling of voice identity: common dimensions for different vowels and speakers,” Psychological Research, vol.74, no.1, pp.110-120, 2010.