Information Engineering for Data-Driven Analysis of h-Index Formation Across Academic Career Stages Using Large-Scale Bibliometric Parameters, Statistical and Clustering Methods

PDF (4216KB), PP.159-213

Views: 0 Downloads: 0

Author(s)

Yurii Ushenko 1,2,* Victoria Vysotska 3,4 Serhii Vladov 4,5 Zhengbing Hu 6 Lyubomyr Chyrun 3,7

1. Department of Physics, Shaoxing University, Shaoxing, Zhejiang Province 312000, China

2. Department of Computer Science, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58012, Ukraine

3. Information Systems and Networks Department, Lviv Polytechnic National University, Lviv, 79013, Ukraine

4. Combating Cybercrime Department, Kharkiv National University of Internal Affairs, Kharkiv, 61080, Ukraine

5. Department of Scientific Activity Organisation, Kharkiv National University of Internal Affairs, 27, L. Landau Avenue, 61080 Kharkiv, Ukraine

6. School of Computer Science and Artificial Intelligence, Hubei University of Technology, Wuhan, China

7. Department of Applied Mathematics Department, Ivan Franko National University of Lviv, Lviv, 79000, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2026.01.10

Received: 26 Oct. 2026 / Revised: 12 Dec. 2025 / Accepted: 7 Jan. 2026 / Published: 8 Feb. 2026

Index Terms

Hirsch Index, Scientometrics, Scientific Effectiveness, Correlation Analysis, Clustering, Academic Career, Citation, Co-Authorship

Abstract

In the context of globalisation of the scientific space and the growing role of scientometric indicators, the Hirsch index (h-index) remains one of the key tools for assessing scientific performance. At the same time, the influence of individual factors on the h-index varies significantly across the stages of a scientist's academic career, necessitating their comparative analysis. The purpose of this work is to conduct a comparative study of the Hirsch index and the factors that influence its formation, considering both novice and experienced scientits anaccounting for The study employed descriptive statistics, visual analysis, time-series smoothing (Kendall's method, Pollard's method, exponential and median smoothing), correlation analysis (Pearson's coefficients), and the k-means clustering method. The study was conducted on two large datasets representing novice and experienced scientists. It was found that the average h-index of experienced scientists is 37.78, approximately 2.6 times that of beginner scientists (14.59). Correlation analysis revealed a weak or negative relationship between the h-index and self-citation, with the strongest correlation observed between the h-index and co-authorship (r = 0.68–0.80). Medium identified 6 clusters, including one that unites scientific leaders with extremely high H-index values. The study's results confirm that, in the early stages of a scientific career, geographical and institutional factors play a significant role. In contrast, for experienced scientists, the Hirsch index becomes more predictable and is determined by the quality of scientific publications, the level of citation, and practical cooperation within scientific teams.

Cite This Paper

Yurii Ushenko, Victoria Vysotska, Serhii Vladov, Zhengbing Hu, Lyubomyr Chyrun, "Information Engineering for Data-Driven Analysis of h-Index Formation Across Academic Career Stages Using Large-Scale Bibliometric Parameters, Statistical and Clustering Methods", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.18, No.1, pp. 159-213, 2026. DOI:10.5815/ijieeb.2026.01.10

Reference

[1]F. A. Shah and S. A. Jawaid, “The h-index: an indicator of research and publication output,” Pakistan Journal of Medical Sciences, vol. 39, no. 2, p. 315, 2023.
[2]S. Paul and B. Dutta, “Evolving tools for assessing and mapping scientific research landscapes,” The Serials Librarian, pp. 1–19, 2025.
[3]M. K. Akhtar, “The h-index is an unreliable research metric for evaluating the publication impact of experimental scientists,” Frontiers in Research Metrics and Analytics, vol. 9, p. 1385080, 2024.
[4]M. Shannigrahi and D. K. Kirtania, “Scientometric and content mapping of the 100 most cited papers in chemistry,” Discover Chemistry, vol. 2, p. 264, 2025.
[5]J. Santa, “2025 world ranking of citation indexing researchers,” ResearchGate, 2025.
[6]N. Wang, Y. Yu, L. Shi, Z. Zhang, and J. Song, “A scientometric study on research trends and characteristics of randomised controlled trials in orthodontics,” Journal of Dental Sciences, 2025.
[7]P. Reyes-Cornejo, L. Araya-Castillo, H. Moraga-Flores, J. Boada-Grau, and C. Olivares-Brito, “Scientometric study of digital transformation and human resources: Collaborations, opportunities, and future research directions,” Administrative Sciences, vol. 15, no. 4, p. 152, 2025.
[8]R. Jan and R. Ahmad, “H-index and its variants: Which variant fairly assess author’s achievements,” Journal of Information Technology Research, vol. 13, no. 1, pp. 68–76, 2020.
[9]J. P. A. Ioannidis, J. Baas, R. Klavans, and K. Boyack, “Supplementary data tables for ‘A standardised citation metrics author database annotated for scientific field’,” PLoS Biology Dataset, 2019.
[10]J. Baas, K. Boyack, and J. P. A. Ioannidis, “Updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2020.
[11]J. Baas, K. Boyack, and J. P. A. Ioannidis, “August 2021 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2021.
[12]J. P. A. Ioannidis, “September 2022 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2022.
[13]J. P. A. Ioannidis, “September 2022 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2022.
[14]J. P. A. Ioannidis, “October 2023 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2023.
[15]J. P. A. Ioannidis, “August 2024 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2024.
[16]J. P. A. Ioannidis, “August 2025 data update for updated science-wide author databases of standardised citation indicators,” Elsevier Data Repository, 2025.
[17]T. S. Biró, A. Telcs, M. Józsa, and Z. Néda, “Gintropic scaling of scientometric indexes,” Physica A: Statistical Mechanics and its Applications, vol. 618, p. 128717, 2023.
[18]C. Palomino-Leyva, J. Rivera-Recuenco, A. Fernandez-Giusti, J. Barja-Ore, Y. Retamozo-Siancas, and F. Mayta-Tovalino, “Bibliometric analysis of the worldwide scientific production on COVID-19 infection and cerebrovascular disease,” Annals of Cardiac Anaesthesia, vol. 26, no. 2, pp. 197–203, 2023.
[19]L. D. Davis, C. M. Gilmore, A. Vargus, H. Ogbeifun, Y. H. P. Chun, and C. R. Frei, “Comparison of h-index and other bibliometrics in Google Scholar and Scopus for articles published by translational science trainees,” Humanities and Social Sciences Communications, vol. 12, no. 1, pp. 1–4, 2025.
[20]Y. Jiang, X. L. Liu, Z. Zhang, and X. Yang, “Evaluation and comparison of academic impact and disruptive innovation level of medical journals: Bibliometric analysis and disruptive evaluation,” Journal of Medical Internet Research, vol. 26, p. e55121, 2024.
[21]M. Bublyk, O. Slava, V. Vysotska, L. Kolyasa, and O. Vlasenko, “World universities strategic analysis based on data from the QS World University Rankings,” CEUR Workshop Proceedings, vol. 3373, pp. 354–375, 2023.
[22]V. Vysotska, V. Lytvyn, M. Hrendus, S. Kubinska, and O. Brodyak, “Method of textual information authorship analysis based on stylometry,” in Proceedings of the IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. 2, pp. 9–16, 2018.
[23]V. Vysotska, “Computer linguistic systems design and development features for Ukrainian language content processing,” CEUR Workshop Proceedings, vol. 3688, pp. 229–271, 2024.
[24]B. Korostynskyi, O. Mediakov, V. Vysotska, O. Markiv, and M. Duda, “Analysis of geo-economic distribution of scientific publications citation and self-citation standardized indices based on machine learning,” CEUR Workshop Proceedings, vol. 3171, pp. 1657–1683, 2022.