Integrated Topic Modeling and Feature Engineering for High-accuracy Sentiment Classification in Consumer Reviews

PDF (984KB), PP.13-26

Views: 0 Downloads: 0

Author(s)

Vijay Gupta 1 Punam Rattan 2 Mukesh Kumar 3,4,*

1. School of Computer Application, Lovely Professional University, Phagwara-144411, Punjab, India

2. Department of Computer Science & Technology, Manav Rachna University, Faridabad-121004, Haryana, India

3. Advanced Centre of Research & Innovation (ACRI), School of Advance Computing, CGC University Mohali-140307, Punjab, India

4. Faculty of Law, Sohar University, Sohar 311, Sultanate of Oman

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.02.02

Received: 14 Oct. 2025 / Revised: 22 Dec. 2025 / Accepted: 26 Feb. 2026 / Published: 8 Apr. 2026

Index Terms

Latent Dirichlet Allocation, Term Frequency-Inverse Document Frequency, XGBoost Classifier, Topic Modeling, N-gram Features, Sentiment Analysis

Abstract

The rapid rise of mobile technology paired with the steady growth of the internet, has led to a massive increase in the amount of user generated content, such as online consumer reviews, accessible through the browser. As the volume of user-generated content continues to rise, it becomes increasingly important to develop sophisticated methods for performing sentiment analysis on the texts collected from users, especially those that have been generated in relation to restaurants and similar types of service establishments. In this paper, we will present a new approach to sentiment analysis which incorporates Latent Dirichlet Allocation topic models, Term Frequency- Inverse Document Frequency vector representations and XGBoost Classifiers into a unified framework. Unlike conventional implementations, this study integrates probabilistic topic distributions from LDA with multi-level n-gram TF-IDF features and evaluates their combined impact using XGBoost for enhanced classification performance. Using three distinct n-gram levels (unigrams, bigrams, and trigrams), we will evaluate various aspects of text-based data including common linguistic patterns and sentiment trends. Higher-order n-grams were included to capture contextual dependencies beyond single-word features. Overall, our results demonstrate that the performance of our proposed framework is superior to traditional corpus-based models on multiple evaluation metrics, including: classification accuracy 96.07%, classification sensitivity 95.43%, classification specificity 97.12% and F1-Score 96.16%. 

Cite This Paper

Vijay Gupta, Punam Rattan, Mukesh Kumar, "Integrated Topic Modeling and Feature Engineering for High-accuracy Sentiment Classification in Consumer Reviews", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.2, pp.13-26, 2026. DOI:10.5815/ijisa.2026.02.02

Reference

[1]de Oliveira Santini, F., Ladeira, W.J., Pinto, D.C., Herter, M.M., Sampaio, C.H. and Babin, B.J., 2020. Customer engagement in social media: a framework and meta-analysis. Journal of the Academy of Marketing Science, 48, pp.1211-1228.
[2]Omisakin, O.M., Bandara, C. and Kularatne, I., 2020. Designing a customer feedback service channel through AI to improve customer satisfaction in the supermarket industry. Journal of Information & Knowledge Management, 19(03), p.2050015.
[3]Arora, L., Singh, P., Bhatt, V. and Sharma, B., 2021. Understanding and managing customer engagement through social customer relationship management. Journal of Decision Systems, 30(2-3), pp.215-234.
[4]Zhang, W., Li, X., Deng, Y., Bing, L. and Lam, W., 2022. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Transactions on Knowledge and Data Engineering, 35(11), pp.11019-11038.
[5]Mariani, M.M., Perez‐Vega, R. and Wirtz, J., 2022. AI in marketing, consumer research and psychology: A systematic literature review and research agenda. Psychology & Marketing, 39(4), pp.755-776.
[6]Önden, A., Alnour, M., Simic, V. and Pamucar, D., 2024. The evolution of sentiment analysis across various scientific disciplines: A comprehensive review based on the bibliometric technique. Decision Making Advances, 2(1), pp.222-237.
[7]Alatabi, H.A. and Abbas, A.R., 2020. Sentiment analysis in social media using machine learning techniques. Iraqi Journal of Science, pp.193-201.
[8]Kuppusamy, M. and Selvaraj, A., 2023. A novel hybrid deep learning model for aspect based sentiment analysis. Concurrency and Computation: Practice and Experience, 35(4), p.e7538.
[9]Ma, D., Li, S., Wu, F., Xie, X. and Wang, H., 2019, July. Exploring sequence-to-sequence learning in aspect term extraction. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 3538-3547).
[10]Li, K., Chen, C., Quan, X., Ling, Q. and Song, Y., 2020. Conditional augmentation for aspect term extraction via masked sequence-to-sequence generation. arXiv preprint arXiv:2004.14769.
[11]Giannakopoulos, A., Musat, C., Hossmann, A. and Baeriswyl, M., 2023. Unsupervised aspect term extraction with B-LSTM & CRF using automatically labelled datasets. arXiv preprint arXiv:1709.05094.
[12]Li, X., Bing, L., Li, P., Lam, W. and Yang, Z., 2022. Aspect term extraction with history attention and selective transformation. arXiv preprint arXiv:1805.00760.
[13]Li, X. and Lam, W., 2021, September. Deep multi-task learning for aspect term extraction with memory interaction. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2886-2892).
[14]He, R., Lee, W.S., Ng, H.T. and Dahlmeier, D., 2023, July. An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 388-397).
[15]Yang, Y., Li, K., Quan, X., Shen, W. and Su, Q., 2022, December. Constituency lattice encoding for aspect term extraction. In Proceedings of the 28th international conference on computational linguistics (pp. 844-855).
[16]Wang, W., Pan, S.J., Dahlmeier, D. and Xiao, X., 2021, February. Coupled multi-layer attentions for co-extraction of aspect and opinion terms. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
[17]Xu, H., Liu, B., Shu, L. and Yu, P.S., 2021. Double embeddings and CNN-based sequence labeling for aspect extraction. arXiv preprint arXiv:1805.04601.
[18]Xiao, Y., Li, C., Thürer, M., Liu, Y. and Qu, T., 2022. User preference mining based on fine-grained sentiment analysis. Journal of Retailing and Consumer Services, 68, p.103013.
[19]Chen, Z. and Qian, T., 2022, November. Enhancing aspect term extraction with soft prototypes. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 2107-2117).
[20]Liao, M., Li, J., Zhang, H., Wang, L., Wu, X. and Wong, K.F., 2022, November. Coupling global and local context for unsupervised aspect extraction. In Proceedings of the 2022 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 4579-4589).
[21]Ghadery, E., Movahedi, S., Jalili Sabet, M., Faili, H. and Shakery, A., 2023. LICD: A language-independent approach for aspect category detection. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2023, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41 (pp. 575-589). Springer International Publishing.
[22]Zhou, X., Wan, X. and Xiao, J., 2022, February. Representation learning for aspect category detection in online reviews. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).
[23]Shi, T., Li, L., Wang, P. and Reddy, C.K., 2021, May. A simple and effective self-supervised contrastive learning framework for aspect detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 15, pp. 13815-13824).
[24]Tulkens, S. and Van Cranenburgh, A., 2022. Embarrassingly simple unsupervised aspect extraction. arXiv preprint arXiv:2004.13580.
[25]Fan, Z., Wu, Z., Dai, X., Huang, S. and Chen, J., 2022, June. Target-oriented opinion words extraction with target-fused neural sequence labeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 2509-2518).
[26]Wu, M., Wang, W. and Pan, S.J., 2022, November. Deep weighted maxsat for aspect-based opinion extraction. In Proceedings of the 2022 conference on empirical methods in natural language processing (EMNLP) (pp. 5618-5628).
[27]Mensah, S., Sun, K. and Aletras, N., 2021. An empirical study on leveraging position embeddings for target-oriented opinion words extraction. arXiv preprint arXiv:2109.01238.
[28]Veyseh, A.P.B., Nouri, N., Dernoncourt, F., Dou, D. and Nguyen, T.H., 2020. Introducing syntactic structures into target opinion word extraction with deep learning. arXiv preprint arXiv:2010.13378.
[29]Hercig, T., Brychcín, T., Svoboda, L., Konkol, M. and Steinberger, J., 2021. Unsupervised methods to improve aspect-based sentiment analysis in Czech. Computación y Sistemas, 20(3), pp.365-375.
[30]Castellucci, G., Filice, S., Croce, D. and Basili, R., 2022, August. Unitor: Aspect based sentiment analysis with structured learning. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2022) (pp. 761-767).
[31]L. Li, L. Yang, and Y. Zeng, “Improving Sentiment Classification of Restaurant Reviews with Attention-based BI-GRU Neural Network”, Symmetry, Vol.13, No.8, p.1517, 2021.
[32]C. Zuheros, E. Martínez-Cámara, E. Herrera-Viedma, and F. Herrera, “Sentiment Analysis Based Multi-Person Multi-Criteria Decision-Making Methodology Using Natural Language Processing And Deep Learning For Smarter Decision Aid, Case Study of Restaurant Choice Using Tripadvisor Reviews”, Information Fusion, Vol.68, pp.22-36, 2021.
[33]R. Patil, D. Shukla, A. Kumar, Y. Rajanak, and Y.P. Singh, “Machine Learning for Sentiment Analysis and Classification of Restaurant Reviews”, In: Proc. of 3rd International Conf. On Computing, Analytics and Networks (ICAN), IEEE, pp.1-5, 2022.