Quantitative Analysis of Socio-Economic Determinants of Adult Income Using Machine Learning Techniques

PDF (1474KB), PP.1-17

Views: 0 Downloads: 0

Author(s)

Sabrina Akter 1 Sadia Enam 1 Md. Moshiur Rahman 2 Fahmida Ahmed Antara 1,*

1. Department of IoT and Robotics Engineering, Gazipur Digital University, Kaliakair, Gazipur-1750, Bangladesh

2. Department of Software Engineering, Gazipur Digital University, Kaliakair, Gazipur-1750, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2025.06.01

Received: 9 Nov. 2024 / Revised: 21 Mar. 2025 / Accepted: 26 Aug. 2025 / Published: 8 Dec. 2025

Index Terms

Socioeconomic, Income Prediction, Exploratory Data Analysis, Economic Inequality, Gender Disparities

Abstract

Income inequality is a persistent issue in both developed and developing economies, influenced by complex socio-economic factors such as education, occupation, and gender. This study addresses a critical gap by applying advanced machine learning techniques to analyze the socio-economic determinants of income in Bangladesh and global contexts. The primary objectives were to identify the most influential factors affecting income and assess the effectiveness of various machine learning models in predicting income levels. Using datasets from Bangladesh and global sources, this study employed Random Forest, Gradient Boosting, Logistic Regression, and Support Vector Machines to predict income and assess feature importance. Key findings showed that education, occupation, gender and hours worked per week were the most significant predictors of income. The Bangladeshi dataset highlighted limited access to higher education and pronounced gender disparities, while the global dataset reflected gender pay gaps and more equitable educational access. Random Forest Classifier appeared as the most effective model, achieving 100% accuracy in Bangladesh and 96% accuracy globally. These findings underscore the need for targeted policies to improve educational access, promote vocational training, and address gender inequality to reduce income disparities. Additionally, the study demonstrates the potential of machine learning to uncover non-linear relationships in socio-economic data, providing valuable insights for evidence-based policymaking. This research highlights the importance of integrating advanced data-driven methods to address the socio-economic drivers of income inequality and promote inclusive economic growth.

Cite This Paper

Sabrina Akter, Sadia Enam, Md. Moshiur Rahman, Fahmida Ahmed Antara, "Quantitative Analysis of Socio-Economic Determinants of Adult Income Using Machine Learning Techniques", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.17, No.6, pp. 1-17, 2025. DOI:10.5815/ijieeb.2025.06.01

Reference

[1]L. C. Pal, “Impact of Education on Economic Development,” Khazanah Pendidikan Islam, vol. 5, no. 1, pp. 10–19, Jun. 2023, doi: 10.15575/kp.v5i1.25199.
[2]G. V. Avila and P. S. M. Mendonça, “Wage gap between women and men: analysis of main causes and impact of schooling on wage gap,” Inter. J. Sci. Manag. Tour., vol. 10, no. 4, p. e1079, Aug. 2024, doi: 10.55905/ijsmtv10n4-039.
[3]T. Kemeny and M. Storper, “The Changing Shape of Spatial Income Disparities in the United States,” Economic Geography, vol. 100, no. 1, pp. 1–30, Jan. 2024, doi: 10.1080/00130095.2023.2244111.
[4]E. L. Laspiñas and J. V. B. Murcia, “Machine Learning Approaches in Classifying Income Levels,” TWIST, vol. 19, no. 2, pp. 92–97, Apr. 2024, doi: 10.5281/ZENODO.10049652.
[5]M. E. Pérez-Pons, J. Parra-Dominguez, S. Omatu, E. Herrera-Viedma, and J. M. Corchado, “Machine Learning and Traditional Econometric Models: A Systematic Mapping Study,” Journal of Artificial Intelligence and Soft Computing Research, vol. 12, no. 2, pp. 79–100, Apr. 2021, doi: 10.2478/jaiscr-2022-0006.
[6]M. Simionescu, “Machine Learning vs. Econometric Models to Forecast Inflation Rate in Romania? The Role of Sentiment Analysis,” Mathematics, vol. 13, no. 1, p. 168, Jan. 2025, doi: 10.3390/math13010168.
[7]G. Teles, J. J. P. C. Rodrigues, R. A. L. Rabêlo, and S. A. Kozlov, “Comparative study of support vector machines and random forests machine learning algorithms on credit operation,” Softw Pract Exp, vol. 51, no. 12, pp. 2492–2500, Dec. 2021, doi: 10.1002/spe.2842.
[8]C. Chakraborty, M. Bhattacharya, S. Pal, and S.-S. Lee, “From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare,” Current Research in Biotechnology, vol. 7, p. 100164, 2024, doi: 10.1016/j.crbiot.2023.100164.
[9]P. Akumbom and P. Abuengmoh, “Educational Levels and Wage Inequalities: Empirical evidence from Cameroon,” Aug. 16, 2023, Preprints. doi: 10.22541/au.169220703.34362635/v1.
[10]I. F. Farkhati, “Social Inequality and Access to Education: Structural Analysis in Indonesia,” Jun. 07, 2024, SocArXiv. doi: 10.31235/osf.io/8cybx. 
[11]D. Acemoglu and D. Autor, “Skills, Tasks and Technologies: Implications for Employment and Earnings,” in Handbook of Labor Economics, vol. 4, Elsevier, 2011, pp. 1043–1171. doi: 10.1016/S0169-7218(11)02410-5.
[12]R. J. Gordon, The rise and fall of American growth : the U.S. standard of living since the Civil War. Princeton University Press, 2017.
[13]D. McMillon, “What Makes Systemic Discrimination, ‘Systemic?’ Exposing the Amplifiers of Inequity,” SSRN Journal, 2023, doi: 10.2139/ssrn.4650232.
[14]M. Yu. Eflova and М. A. Viugina, “Gender imbalance in the labor market: causes and features,” jour, no. 1 (64), pp. 62–69, Apr. 2024, doi: 10.26907/2079-5912.2024.1.62-69.
[15]F. S. Mandelman and A. Zlate, “Offshoring, Automation, Low-Skilled Immigration, and Labor Market Polarization,” American Economic Journal: Macroeconomics, vol. 14, no. 1, pp. 355–389, Jan. 2022, doi: 10.1257/mac.20180205.
[16]S. Assari and H. Zare, “Unequal Effect of Educational Attainment on Reducing Poverty and Welfare; Diminished Returns of American Indian/Alaska Native Populations,” J Rehab Therapy, vol. 6, no. 2, pp. 1–11, Jun. 2024, doi: 10.29245/2767-5122/2024/2.1143.
[17]D. Owor, I. Nabimanya, and K. Abbott, “Thorough Exploratory Data Analysis (EDA) of a Market: A Case Study of Banking Stocks on the S&P 500,” 2024, SSRN. doi: 10.2139/ssrn.5031960.
[18]X. Cheng, “A Comprehensive Study of Feature Selection Techniques in Machine Learning Models,” Ins. Comput. Signal Syst., vol. 1, no. 1, pp. 65–78, Nov. 2024, doi: 10.70088/xpf2b276.
[19]I. Ozturk, “The Role of Education in Economic Development: A Theoretical Perspective,” SSRN Journal, 2008, doi: 10.2139/ssrn.1137541.
[20]I. M. Gordon, K. Hrazdil, and S. Spector, “The Gender Pay Gap in Academia: Evidence from the Beedie School of Business,” Administrative Sciences, vol. 14, no. 5, p. 103, May 2024, doi: 10.3390/admsci14050103.
[21]D. B. McMillon, “What Makes Systemic Discrimination, ‘Systemic?’ Exposing the Amplifiers of Inequity,” 2024, arXiv. doi: 10.48550/ARXIV.2403.11028.
[22]N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” jair, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
[23]T. Sutanto, M. R. Aditya, H. Budiman, M. R. Noor Ridha, U. Syapotro, and N. Azijah, “Comparison of Logistic Regression, Random Forest, SVM, KNN Algorithmfor Water Quality Classification Based on Contaminant Parameters,” intij, vol. 2022, no. 1, Nov. 2024, doi: 10.61453/jods.v2023no48.
[24]E. Halabaku and E. Bytyçi, “Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests,” IASC, vol. 39, no. 6, pp. 987–1006, 2024, doi: 10.32604/iasc.2024.059429.
[25]N. Nissa, S. Jamwal, and M. Neshat, “A Technical Comparative Heart Disease Prediction Framework Using Boosting Ensemble Techniques,” Computation, vol. 12, no. 1, p. 15, Jan. 2024, doi: 10.3390/computation12010015.
[26]S. F. Hussain, “A novel robust kernel for classifying high-dimensional data using Support Vector Machines,” Expert Systems with Applications, vol. 131, pp. 116–131, Oct. 2019, doi: 10.1016/j.eswa.2019.04.037.
[27]R. Van Den Goorbergh, M. Van Smeden, D. Timmerman, and B. Van Calster, “The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression,” Journal of the American Medical Informatics Association, vol. 29, no. 9, pp. 1525–1534, Aug. 2022, doi: 10.1093/jamia/ocac093.
[28]J. N. Mandrekar, “Receiver Operating Characteristic Curve in Diagnostic Test Assessment,” Journal of Thoracic Oncology, vol. 5, no. 9, pp. 1315–1316, Sep. 2010, doi: 10.1097/JTO.0b013e3181ec173d.