Optimizing Parameters of Automatic Speech Segmentation into Syllable Units

Full Text (PDF, 365KB), PP.9-17

Views: 0 Downloads: 0


Riksa Meidy Karim 1,* Suyanto 1

1. School of Computing, Telkom University, Bandung, West Java 40257, Indonesia

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.05.02

Received: 24 Oct. 2018 / Revised: 11 Dec. 2018 / Accepted: 5 Jan. 2019 / Published: 8 May 2019

Index Terms

Boundary detection, genetic algorithm, iterative-splitting, iterative-assimilation, parameter optimization, syllable segmentation


An automatic speech segmentation into syllable is an important task in a modern syllable-based speech recognition. It is generally developed using a time-domain energy-based feature and a static threshold to detect a syllable boundary. The main problem is the fixed threshold should be defined exhaustively to get a high generalized accuracy. In this paper, an optimization method is proposed to adaptively find the best threshold. It optimizes the parameters of syllable speech segmentation and exploits two post-processing methods: iterative-splitting and iterative-assimilation. The optimization is carried out using three independent genetic algorithms (GAs) for three processes: boundary detection, iterative-splitting, and iterative-assimilation. Testing to an Indonesian speech dataset of 110 utterances shows that the proposed iterative-splitting with optimum parameters reduce deletion errors more than the commonly used non-iterative-splitting. The optimized iterative-assimilation is capable of removing more insertions, without over-merging, than the common non-iterative-assimilation. The sequential combination of optimized iterative-splitting and optimized iterative-assimilation gives the highest accuracy with the lowest deletion and insertion errors.

Cite This Paper

Riksa Meidy Karim, Suyanto, "Optimizing Parameters of Automatic Speech Segmentation into Syllable Units", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.5, pp.9-17 2019. DOI:10.5815/ijisa.2019.05.02


[1]Sakran, A. E., Abdou, S.M., Hamid, S.E. and Rashwan, M., “A Review: Automatic Speech Segmentation”, International Journal of Computer Science and Mobile Computing, Vol.6, No.4, pp. 308-315, 2017.
[2]Suyanto, S., Hartati, S., Harjoko, A., & Compernolle, D. Van. (2016). “Indonesian Syllabification Using a Pseudo Nearest Neighbour Rule and Phonotactic Knowledge”. Speech Communication, Vol.85, pp. 109–118. https://doi.org/10.1016/j.specom.2016.10.009
[3]Suyanto, S. and Putra, A. E., “Automatic Segmentation of Indonesian Speech into Syllables using Fuzzy Smoothed Energy Contour with Local Normalization, Splitting and Assimilation”, Journal of ICT Research and Applications, Vol.8, No.2, pp. 97–112, 2014.
[4]Sheikhi, G. and Almasganj, F., “Segmentation of speech into syllable units using fuzzy smoothed short term energy contour”, In: 18th Iranian Conference of Biomedical Engineering (ICBME 2011), pp. 195–198, 2011.
[5]Nagarajan, T., Murthy, H. A. and Hegde, R. M., “Segmentation of speech into syllable-like units”, In: 8th European Conference on Speech Communication and Technology (EUROSPEECH 2003), pp. 2893–2896, 2003.
[6]Kopeček, I., “Speech recognition and syllable segments”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 1692, pp. 203–208, 1999.
[7]Murthy, H. A. and Yegnanarayana, B., “Group delay functions and its applications in speech technology”, Sadhana, Vol. 36, No.5, pp. 745–782, 2011.
[8]Prasad, V. K., Nagarajan, T., Murthy, H.A., “Automatic Segmentation of Continuous Speech using Minimum Phase Group Delay Functions”, Speech Communication, Vol.42, pp. 429-446, 2003.
[9]Petrillo, M. and Cutugno, F., “A syllable segmentation algorithm for English and Italian”, Interspeech, pp. 2913–2916, 2003.
[10]Kamper, H., Jansen, A., Goldwater, S., “A Segmental Framework for Full-Supervised Large-Vocabulary Speech Recognition”, Computer Speech and Language, Vol.46, pp.154-174, 2017
[11]Vuuren, V. Z. V., Bosch, L. T., Niesler, T., “Unconstrained Speech Segmentation using Deep Neural Networks”, International Conference on Pattern Recognition Applications and Methods, pp. 248-254, 2015.
[12]Jazyah, Y. H., “Speech Segmentation Using Dynamic Windows and Thresholds for Arabic and English Languages”, Journal of Computer Science, Vol.14, No.4, pp. 485-490, 2018.
[13]Husni, H., Him, N. N. N., Radi, M. M., Yusof, Y., Kamarudin, S. S., “Automatic Transcription and Segmentation Accuracy of Dyslexic Children’s Speech”, The 2nd International Conference on Applied Science and Technology, Vol.1891, No.1, 2017.
[14]Ghahramani, Z., “An Introduction to Hidden Markov Models and Bayesian Networks”, International Journal of Pattern Recognition and Artificial Intelligence”, Vol.15, No.1, pp. 9-42, 2001.
[15] Ganeswari, G., VijayaRaghava, S. R., Thushar, A. K., Balaji, S., “Recent Trends in Application of Neural Networks to Speech Recognition”, International Journal on Recent and Innovation Trends in Computing and Communication, Vol.4, No.1, pp. 18-25, 2016
[16]Chakraborty, C., Talukdar, P. H., “Issues and Limitation of HMM in Speech Recognition: A Survey”, International Journal of Computer Applications, Vol.141, No.7, 2016.
[17]Konak, A., Coit, D. W. and Smith, A. E., “Multi-objective optimization using GAs: A tutorial”, Reliability Engineering and System Safety, Vol.91, No.9, pp. 992–1007, 2006.
[18]A. Hussain, Y. Shad, and M. Nauman, “An Efficient Genetic Algorithm for Numerical Function Optimization with Two New Crossover Operators,” Int. J. Math. Sci. Comput., Vol.4, No.4, pp. 41–55, 2018.
[19]A. Pahwa, “Speech Feature Extraction for Gender Recognition,” Int. J. Image, Graph. Signal Process., Vol.8, No.9 September, pp. 17–25, 2016.
[20]Lipowski, A. and Lipowska, D., “Roulette-wheel selection via stochastic acceptance”, Physica A: Statistical Mechanics and Its Applications, Vol.391, No.6, pp. 2193–2196, 2012.
[21]Eiben, A. E., “Multi-parent recombination”, Handbook of Evolutionary Computation, pp. 289–307, 1997.