Part-of-speech Tagging for Marathi using Maximum Entropy Markove Model

PDF (872KB), PP.13-27

Views: 0 Downloads: 0

Author(s)

Swati Prakash Sonawane 1,* Kavita Tukaram Patil 2 R. P. Bhavsar 1 B. V. Pawar 3

1. School of Computer Science KBC North Maharashtra University, Jalgaon, Maharashtra 425001, India

2. SVKM’s Institute of Technology, Dhule, Maharashtra 424001, India

3. K. C. E. Society’s Institute of Management and Research, Jalgaon, Maharashtra 425001, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2026.03.02

Received: 16 Dec. 2025 / Revised: 1 Feb. 2026 / Accepted: 10 Apr. 2026 / Published: 8 Jun. 2026

Index Terms

Bureau of Indian Standard Tagset, Maximum Entropy Markov Model, Morphologically Rich Languages, Natural Language Processing, Part of Speech Tagging

Abstract

Part-of-Speech (POS) tagging is an essential and important pre-processing activity for many Natural Language Processing (NLP) applications, this is particularly more evident for morphologically rich languages such as Marathi. This research investigates POS tagging for Marathi using the Maximum Entropy Markov Model (MEMM). MEMM combines the strengths of conditional probability modelling and sequence prediction, allowing the integration of rich contextual features. Features used include word forms, suffixes, prefixes, and neighboring tags, effectively tackling the challenges presented by inflectional variations and ambiguity in Marathi. Experimental results demonstrate that the MEMM-based POS tagger achieves an accuracy of 83.72%. This performance marks a notable advancement in Marathi POS tagging, given the linguistic diversity and the scarcity of annotated data. Error analysis enhances the issues like ambiguity in homonyms and out-of-vocabulary words, providing methods for further improvement through enriched datasets and sophisticated modelling techniques. This study enhances NLP applications such as machine translation, spell checking, and sentiment analysis for Indian languages and offers a solid foundation for future research in Marathi POS tagging.

Cite This Paper

Swati Prakash Sonawane, Kavita Tukaram Patil, R. P. Bhavsar, B. V. Pawar, "Part-of-speech Tagging for Marathi using Maximum Entropy Markove Model", International Journal of Information Technology and Computer Science(IJITCS), Vol.18, No.3, pp.13-27, 2026. DOI:10.5815/ijitcs.2026.03.02

Reference

[1]K. T. Patil, R. P. Bhavsar, and B. V. Pawar, "Contrastive study of minimum edit distance and cosine similarity measures in the context of word suggestions for misspelled Marathi words," Multimedia Tools and Applications, Vol. 82, No. 10, pp. 15573–15591, 2023, doi: 10.1007/s11042-022-13948-z.
[2]K. T. Patil, R. P. Bhavsar, and B. V. Pawar, "Word suggestions for non-word text errors using similarity measure," in Proceedings of the 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Vol. 1, pp. 892–897, IEEE, 2021. DOI: 10.1109/ICACCS51430.2021.9441858
[3]K. T. Patil, R. P. Bhavsar, and B. V. Pawar, "Spelling checking and error corrector system for Marathi language text using minimum edit distance algorithm," in Advances in Computing and Data Sciences: 5th International Conference, ICACDS 2021, Revised Selected Papers, Part I, pp. 102–111, Springer International Publishing, 2021, doi: 10.1007/978-3-030-81462-5_10.
[4]A. Bharati, R. Sangal, and V. Chaitanya, Natural Language Processing: A Paninian Perspective, Prentice Hall of India, 1995. (Book - No DOI)
[5]R. Rajeev et al., "POS tagging using Hidden Markov Models for Hindi," 2008. (Specific publication details not found in standard databases - may be unpublished work or workshop paper)
[6]M. Shrivastava and P. Bhattacharyya, "Hindi POS tagger using naive stemming: harnessing morphological information without extensive linguistic knowledge," in Proceedings of the ICON-08, Pune, India, 2008. (Conference paper - DOI not assigned)
[7]Deshpande, Madhuri M., and Sharad D. Gore. "A hybrid part-of-speech tagger for Marathi sentences." In 2018 International Conference on Communication information and Computing Technology (ICCICT), pp. 1-10. IEEE, 2018. DOI: 10.1109/ICCICT.2018.8325898
[8]Singh, Jyoti, Nisheeth Joshi, and Iti Mathur. "Development of Marathi part of speech tagger using statistical approach." In 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1554-1559. IEEE, 2013. DOI: 10.1109/ICACCI.2013.6637411
[9]N. Deshpande et al., "Application of CRFs in Marathi POS tagging," 2015. (Specific publication details not found in standard databases)
[10]J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proceedings of the 18th International Conference on Machine Learning (ICML), pp. 282–289, 2001. (Classic foundational paper - predates widespread DOI assignment for ML conferences)
[11]D. E. Cahyani and W. Mustikaningtyas, "Indonesian part of speech tagging using maximum entropy Markov model on Indonesian manually tagged corpus," IAES International Journal of Artificial Intelligence, Vol. 11, No. 1, p. 336, 2022, doi: 10.11591/ijai.v11.i1.pp336-343.
[12]R. S. Yuwana, A. R. Yuliani, and H. F. Pardede, "On part of speech tagger for Indonesian language," in Proceedings of the 2nd International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 369–372, IEEE, 2017, doi: 10.1109/ICITISEE.2017.8285524.
[13]P. Le-Hong, A. Roussanaly, T. M. H. Nguyen, and M. Rossignol, "An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts," in Proceedings of TALN 2010, p. 12, 2010. (Conference paper - DOI not assigned for this regional conference) https://aclanthology.org/2010.jeptalnrecital-long.36.pdf
[14]H. Huang and X. Zhang, "Part-of-speech tagger based on maximum entropy model," in Proceedings of the 2nd IEEE International Conference on Computer Science and Information Technology, pp. 26–29, IEEE, 2009, doi: 10.1109/ICCSIT.2009.5234402.
[15]F. Pisceldo, M. Adriani, and R. Manurung, "Probabilistic part of speech tagging for Bahasa Indonesia," in Proceedings of the Third International MALINDO Workshop, pp. 1–6, 2009. (Workshop paper - DOI not assigned)
[16]A. Ekbal, R. Haque, and S. Bandyopadhyay, "Maximum entropy based Bengali part of speech tagging," Research in Computing Science Journal, Vol. 33, No. 8, pp. 67–78, 2008. (Regional journal - DOI not found in standard databases) http://www.cicling.org/2008/RCS-vol-33/06-Ekbal.pdf
[17]S. Dandapat, "Part of speech tagging and chunking with maximum entropy model," in Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, pp. 29–32, 2007. (Workshop paper - DOI not assigned)
[18]A. Dalal, K. Nagaraj, U. Sawant, and S. Shelke, "Hindi part-of-speech tagging and chunking: A maximum entropy approach," Proceedings of the NLPAI Machine Learning Competition, 2006. (Competition paper - DOI not assigned)
[19]P. Halácsy, A. Kornai, C. Oravecz, T. Vikto, and D. Varga, "Using a morphological analyzer in high precision POS tagging of Hungarian," 2006. 2245-2248 (Specific publication details not found in standard databases) 
[20]H. Tseng, D. Jurafsky, and C. D. Manning, "Morphological features help POS tagging of unknown words across language varieties," in Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, 2005. (Workshop paper - DOI not assigned) https://aclanthology.org/I05-3005.pdf 
[21]K. Toutanova and C. D. Manning, "Enriching the knowledge sources used in a maximum entropy part-of-speech tagger," in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 63–70, 2000. (Conference paper - predates widespread DOI assignment) https://aclanthology.org/W00-1308.pdf
[22]A. Ratnaparkhi, "A maximum entropy model for part-of-speech tagging," in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 1996. (Classic foundational paper - predates DOI assignment for NLP conferences)  https://aclanthology.org/W96-0213.pdf