M. Rajasekar

Work place: Hindustan Institute of Technology and Science, Chennai, India

E-mail: sekarca07@gmail.com

Website: https://orcid.org/0000-0003-1084-7881

Research Interests: Computational Learning Theory, Natural Language Processing, Image Processing


M. Rajasekar is working as a Assistant Professor in the Department of Computer Applications in Hindustan Institute of Technology and Science, Padur, Chennai, India. He has received his Master in Computer Applications from Anna University, Chennai Tamilnadu in 2008. He has submitted his Ph.D. Thesis Repot in Hindustan Institute of Technology and Science, Chennai, Tamilnadu. His research interests are Natural Language Processing, Machine Learning Methods. Email: sekarca07@gmail.com

Author Articles
Comparison of Machine Learning Algorithms in Domain Specific Information Extraction

By M. Rajasekar Angelina Geetha

DOI: https://doi.org/10.5815/ijmsc.2023.01.02, Pub. Date: 8 Feb. 2023

Information Extraction is an essential task in Natural Language Processing. It is the process of extracting useful information from unstructured text. Information extraction helps in most of the NLP applications like sentiment analysis, named entity recognition, medical data extraction, features extraction from research articles, feature extraction from agriculture, etc. Most of the applications in information extraction are performed by machine learning models. Many research work shave been carried out on machine learning based information extraction from various domain texts in English such as Bio medical, Share market, Weather, Business, Social media, Agriculture, Engineering, and Tourism. However domain specific information extraction for a particular regional language is still a challenge. There are different types of classification algorithms. However, for a selected domain to select the appropriate classification algorithm is very difficult. In this paper three famous classification algorithms are selected to do information extraction by classifying the Gynecological domain data in Tamil Language. The main objective or this research work is to analyze the machine learning methods which is suitable for Tamil domain specific text documents. There are 1635 documents being involved in classification task to extract the features by these selected three algorithms. By evaluating the classification task of each model it has been found that the Naive Bayes classification model provides highest accuracy value (84%) for the gynecological domain data. The F1-Score, Error rate and Execution time also evaluated for the selected machine learning models. The evaluation of performance has proved that the Naïve Bayes classification model gives optimal results. It has been concluded that the Naïve Bayes classification model is the best model to classify the gynaecological domain text in Tamil language

[...] Read more.
Other Articles