Car Price Analysis Using Data Collected from an Online Sales Platform

PDF (1129KB), PP.124-139

Views: 0 Downloads: 0

Author(s)

Bui Quang Phu 1 Pham Hoang Phuc 1 Pham The Son 2,*

1. Faculty of Computer Science, University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam

2. Faculty of Information Science and Engineering, University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2026.01.07

Received: 19 Jun. 2025 / Revised: 8 Oct. 2025 / Accepted: 11 Nov. 2025 / Published: 8 Feb. 2026

Index Terms

Data Collection, Predictive Analytics, Price Prediction Analysis, Exploratory Data Analysis, EDA, Extreme Gradient Boosting, XGBoost, MSE

Abstract

In this paper, we aim to develop a car price prediction model using data collected from an online sales platform. To accomplish the proposed objective, we applied the following approaches and techniques: (1) Collecting sales data from the online sales platform; (2) Exploratory analysis of data before and after data preprocessing; (3) Experimenting to find a suitable prediction model for the collected dataset. The novelty of this study lies in constructing a real-world dataset of pre-owned car prices collected directly from an online sales platform and in building a car price prediction model using an empirical approach combined with machine learning models. Unlike previous studies based on existing structured datasets, this study emphasizes the discovery of data-driven insights through exploratory analysis and the identification of key variables affecting car prices. At the same time, essential insights regarding car prices were obtained from the dataset. Experimental results show that the model using the XGBoost algorithm achieved an R2 of 0.776 for the default parameter case and an R2 of 0.779 for the optimized parameter case. These findings provide a practical solution for real-world car price prediction systems, allowing buyers and sellers to make more informed pricing decisions.

Cite This Paper

Bui Quang Phu, Pham Hoang Phuc, Pham The Son, "Car Price Analysis Using Data Collected from an Online Sales Platform", International Journal of Information Technology and Computer Science(IJITCS), Vol.18, No.1, pp.124-139, 2026. DOI:10.5815/ijitcs.2026.01.07

Reference

[1]Bokonda PL, Ouazzani-Touhami K, Souissi N. Predictive analysis using machine learning: Review of trends and methods. 2020 International Symposium on Advanced Electrical and Communication Technologies (ISAECT), 2020, p. 1–6. https://doi.org/10.1109/ISAECT50560.2020.9523703.
[2]Punia S, Shankar S. Predictive analytics for demand forecasting: A deep learning-based decision support system. Know-Based Syst 2022;258. https://doi.org/10.1016/j.knosys.2022.109956.
[3]Jamarani A, Haddadi S, Sarvizadeh R, Haghi Kashani M, Akbari M, Moradi S. Big data and predictive analytics: A systematic review of applications. Artif Intell Rev 2024;57:176. https://doi.org/10.1007/s10462-024-10811-5.
[4]Ranjeeth S, Latchoumi TP, Paul PV. A Survey on Predictive Models of Learning Analytics. Procedia Comput Sci 2020;167:37–46. https://doi.org/https://doi.org/10.1016/j.procs.2020.03.180.
[5]Habel J, Alavi S, Heinitz N. A theory of predictive sales analytics adoption. AMS Review 2023;13:34–54. https://doi.org/10.1007/s13162-022-00252-0.
[6]Ahmad M, Farooq MA, Hussain MZ, Hasan MZ, Mustafa M, Khalid A, et al. Car Price Prediction using Machine Learning. 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), IEEE; 2024, p. 1–5.
[7]Gupta S, Vijarania M, Udbhav M. A machine learning approach for predicting price of used cars and power demand forecasting to conserve non-renewable energy sources. Renewable Energy Optimization, Planning and Control: Proceedings of ICRTE 2022, Springer; 2023, p. 301–10.
[8]Budiono DA, Utomo KS, Wibowo KJ, Wiradinata MJ. Used car price prediction model: a machine learning approach. Int J Comput Inf Syst(IJCIS) 2024;5:59–66.
[9]Jin C. Price prediction of used cars using machine learning. 2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT), IEEE; 2021, p. 223–30.
[10]Das Adhikary DR, Sahu R, Pragyna Panda S. Prediction of used car prices using machine learning. Biologically Inspired Techniques in Many Criteria Decision Making: Proceedings of BITMDM 2021, Springer; 2022, p. 131–40.
[11]Nandan M, Ghosh D. Pre-owned car price prediction by employing machine learning techniques. Journal of Decision Analytics and Intelligent Computing 2023;3:167–84.
[12]Longani C, Prasad Potharaju S, Deore S. Price prediction for pre-owned cars using ensemble machine learning techniques. Recent Trends in Intensive Computing, IOS Press; 2021, p. 178–87.
[13]Benabbou F, Sael N, Herchy I. Machine Learning for Used Cars Price Prediction: Moroccan Use Case. In: Lazaar M, Duvallet C, Touhafi A, Al Achhab M, editors. Proceedings of the 5th International Conference on Big Data and Internet of Things, Cham: Springer International Publishing; 2022, p. 332–46.
[14]Wu J Da, Hsu CC, Chen HC. An expert system of price forecasting for used cars using adaptive neuro-fuzzy inference. Expert Syst Appl 2009;36:7809–17. https://doi.org/10.1016/J.ESWA.2008.11.019.
[15]Pinheiro JMH, de Oliveira SVB, Silva THS, Saraiva PAR, de Souza EF, Godoy R V, et al. The impact of feature scaling in machine learning: Effects on regression and classification tasks. ArXiv Preprint ArXiv:250608274 2025.
[16]Wang H, Wu Y, Zhang Y, Lai F, Feng Z, Xie B, et al. Uncertainty and explainable analysis of machine learning model for reconstruction of sonic slowness logs. Artificial Intelligence in Geosciences 2023;4:182–98. https://doi.org/https://doi.org/10.1016/j.aiig.2023.11.002.
[17]Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: Association for Computing Machinery; 2016, p. 785–94. https://doi.org/10.1145/2939672.2939785.
[18]Ru-tao Z, Jing W, Gao-jian C, Qian-wen L, Yun-jing Y. A Machine Learning Pipeline Generation Approach for Data Analysis. 2020 IEEE 6th International Conference on Computer and Communications (ICCC), 2020, p. 1488–93. https://doi.org/10.1109/ICCC51575.2020.9345123.