Streamlining Stock Price Analysis: Hadoop Ecosystem for Machine Learning Models and Big Data Analytics

Full Text (PDF, 458KB), PP.25-34

Views: 0 Downloads: 0


Jesslyn Noverlita 1 Herison Surbakti 2,*

1. School of ICT, Faculty of Computing and Digital Technology, HELP University, Malaysia

2. Information and Communication Technology, Rangsit University International College, Pathumthani, Thailand

* Corresponding author.


Received: 23 May 2023 / Revised: 4 Jul. 2023 / Accepted: 18 Aug. 2023 / Published: 8 Oct. 2023

Index Terms

Big Data Analytics, Hadoop Ecosystem, Machine Learning, Data Warehousing, Scalability, Distributed Processing, Predictive Modeling


The rapid growth of data in various industries has led to the emergence of big data analytics as a vital component for extracting valuable insights and making informed decisions. However, analyzing such massive volumes of data poses significant challenges in terms of storage, processing, and analysis. In this context, the Hadoop ecosystem has gained substantial attention due to its ability to handle large-scale data processing and storage. Additionally, integrating machine learning models within this ecosystem allows for advanced analytics and predictive modeling. This article explores the potential of leveraging the Hadoop ecosystem to enhance big data analytics through the construction of machine learning models and the implementation of efficient data warehousing techniques. The proposed approach of optimizing stock price by constructing machine learning models and data warehousing empowers organizations to derive meaningful insights, optimize data processing, and make data-driven decisions efficiently.

Cite This Paper

Jesslyn Noverlita, Herison Surbakti, "Streamlining Stock Price Analysis: Hadoop Ecosystem for Machine Learning Models and Big Data Analytics", International Journal of Information Technology and Computer Science(IJITCS), Vol.15, No.5, pp.25-34, 2023. DOI:10.5815/ijitcs.2023.05.03


[1]J. C. Urenda and V. Kreinovich, “Data Processing: Beyond Traditional Techniques,” Studies in Big Data, pp. 225–242, 2022, doi: 10.1007/978-3-031-16780-5_37.
[2]V. Goswami, P. Jadav, and S. K. Soni, “Review on How IIoT Has Revolutionized Greenhouse, Manufacturing and Medical Industries,” Recent Advances in Mechanical Infrastructure, pp. 179–192, 2022, doi: 10.1007/978-981-16-7660-4_16.
[3]T. W. de Wit and V. Menon, “Informed Trading Support for the Amateur Investoron the New York Stock Exchange,” 2019 IEEE International Conference on Big Data (Big Data), Dec. 2019, Published, doi: 10.1109/bigdata47090.2019.9006544.
[4]D. Veeraiah and J. N. Rao, “An Efficient Data Duplication System based on Hadoop Distributed File System,” 2020 International Conference on Inventive Computation Technologies (ICICT), Feb. 2020, Published, doi: 10.1109/icict48043.2020.9112567.
[5]M. Fotache, M.-I. Cluci, and V. Greavu-┼×erban, “Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters,” Proceedings of the 5th International Conference on Internet of Things, Big Data and Security, 2020, Published, doi: 10.5220/0009407903270334.
[6]N. Nway Nway, J. Myint, and E. Chaw Htoon, “Evaluating Checkpoint Interval for Fault-Tolerance in MapReduce,” 2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Oct. 2018, Published, doi: 10.1109/cyberc.2018.00046.
[7]K. Wendt, “Traditional Stock Exchanges,” Social Stock Exchanges, pp. 61–76, 2022, doi: 10.1007/978-3-030-99720-5_3.
[8]M. Kulkarni, A. Jadha, and D. Dhingra, “Time Series Data Analysis for Stock Market Prediction,” SSRN Electronic Journal, 2020, Published, doi: 10.2139/ssrn.3563111.
[9]S. Narayanan, P. Samuel, and M. Chacko, “Improving prediction with enhanced Distributed Memory-based Resilient Dataset Filter,” Journal of Big Data, vol. 7, no. 1, Feb. 2020, doi: 10.1186/s40537-020-00292-y.
[10]P. Singh, S. Singh, P. K. Mishra, and R. Garg, “RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework,” Second International Conference on Computer Networks and Communication Technologies, pp. 755–768, 2020, doi: 10.1007/978-3-030-37051-0_85.
[11]K. Rubulis, J. Vempers, and E. ┼Żeiris, “Development of Framework for Designing an Analytical Data Warehouse: Case of e-Municipalities,” Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, 2020, Published, doi: 10.5220/0010056001610171.
[12]H. Surbakti and A. Ta’a, “Cognitive Approach Using SFL Theory in Capt uring Tacit Knowledge in Business Intelligence,” 2019 International Conference on Data and Software Engineering (ICoDSE), Nov. 2019, Published, doi: 10.1109/icodse48700.2019.9092615.
[13]Y. K. Gupta and T. Mittal, “Comparative Study of Apache Pig & Apache Cassandra in Hadoop Distributed Environment,” 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Nov. 2020, Published, doi: 10.1109/iceca49313.2020.9297532.
[14]U. P. K. Dr. K and Dr. L. R. Bhavanam, “Usage of HIVE Tool in Hadoop ECO System with Loading Data and User Defined Functions,” International Journal of Psychosocial Rehabilitation, vol. 24, no. 04, pp. 1058–1062, Feb. 2020, doi: 10.37200/ijpr/v24i4/pr201080.
[15]E. Elliott, “Spark SQL and Hive Tables,” Introducing .NET for Apache Spark, pp. 107–118, 2021, doi: 10.1007/978-1-4842-6992-3_6.
[16]P. Konagala, “Big Data Analytics Using Apache Hive to Analyze Health Data,” Research Anthology on Big Data Analytics, Architectures, and Applications, pp. 979–992, 2022, doi: 10.4018/978-1-6684-3662-2.ch046.