Moner N. M. Arafa

Work place: Department of Preparing Computer Teacher, Faculty of Specific Education, Damietta University, Egypt



Research Interests: Speech Synthesis, Speech Recognition, Image Processing, Image and Sound Processing, Computer Architecture and Organization, Pattern Recognition


Moner N. M. Arafa received the B.Sc. degree in Preparing Computer Teacher Department, Faculty of Specific Education, from Mansoura University, Egypt in 2011. His research interest includes pattern recognition, speech processing, and advanced machine learning

Author Articles
A Dataset for Speech Recognition to Support Arabic Phoneme Pronunciation

By Moner N. M. Arafa Reda Elbarougy A. A. Ewees G. M. Behery

DOI:, Pub. Date: 8 Apr. 2018

It is difficult for some children to pronounce some phonemes such as vowels. In order to improve their pronunciation, this can be done by a human being such as teacher or parents. However, it is difficult to discover the error in the pronunciation without talking with each student individually. With a large number of students in classes nowadays, it is difficult for teachers to communicate with students separately. Therefore, this study proposes an automatic speech recognition system which has the capacity to detect the incorrect phoneme pronunciation. This system can automatically support children to improve their pronunciation by directly asking children to pronounce a phoneme and the system can tell them if it is correct or not. In the future, the system can give them the correct pronunciation and let them practise until they get the correct pronunciation. In order to construct this system, an experiment was done to collect the speech database. In this experiment 89, elementary school children were asked to produce 28 Arabic phonemes 10 times. The collected database contains 890 utterances for each phoneme. For each utterance, fundamental frequency f0, the first 4 formants are extracted and 13 MFCC co-efficients were extracted for each frame of the speech signal. Then 7 statics were applied for each signal. These statics are (max, min, range, mean, mead, variance and standard divination) therefore for each utterance to have 91 features. The second step is to evaluate if the phoneme is correctly pronounced or not using human subjects. In addition, there are six classifiers applied to detect if the phoneme is correctly pronounced or not by using the extracted acoustic features. The experimental results reveal that the proposed method is effective for detecting the miss pronounced phoneme ("أ").

[...] Read more.
Other Articles