Analysis of Amazon Product Reviews Using Big Data- Apache Pig Tool

Full Text (PDF, 474KB), PP.11-18

Views: 0 Downloads: 0


Amrit Pal Singh 1,* Gurvinder Singh 2

1. USICT, GGSIPU, Delhi, 110078, India

2. GlobalLogic Technologies, Associate Analyst, New Delhi, 110064, India

* Corresponding author.


Received: 26 Dec. 2017 / Revised: 20 Apr. 2018 / Accepted: 18 Oct. 2018 / Published: 8 Jan. 2019

Index Terms

Big Data, Apache Pig Tool, Amazon


We live in the era of digital technologies where data is increasing day by day at a very high rate. The data is further popularly classified as ‘Big Data’ because of its velocity, veracity, variety and its huge volume. This data could be unstructured, semi-structured or structured as it is divergent in nature. In this work, we would assess various categories of Amazon Product Reviews, the large datasets that contain around 144 million reviews in total. The datasets consists of Product reviews collected from Amazon, each having various numbers of attributes of 11 different categories. The motive of this work is to find and compare the ratings of the products during the lifespan of the product reviews. Another goal of this work is to help Amazon regarding the listing of the products in their database.
This work aims to relate user’s ratings and reviews to discover how beneficial and good a product is [6]. User ratings are collected and are analyzed based on different categories (datasets) which gives an insight as to which product performs good and what are the problems associated to a certain non-performing product.

Cite This Paper

Amrit Pal Singh, Gurvinder Singh, "Analysis of Amazon Product Reviews Using Big Data- Apache Pig Tool", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.11, No.1, pp. 11-18, 2019. DOI:10.5815/ijieeb.2019.01.02


[1]J., Dean, & S., Ghemawat (2010). MapReduce: a flexible data processing tool. Communications of the ACM, 53(1), 72-77.
[2]J., Mehine (2011). Raamistiku Apache Pig kasutamine suuremahulises andmeanalüüsis (Doctoral dissertation, Tartu Ülikool).
[3]B., Jopson (2011). Amazon urges California referendum on online tax. The Financial Times, 4.
[4]J., McAuley, R. Pandey & J. Leskovec (2015, August). Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). ACM.
[5]J., McAuley, C., Targett, Q., Shi, & A., Van Den Hengel (2015, August). Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM.
[6]J., McAuley, & A. Yang, (2016, April). Addressing complex and subjective product-related queries with customer reviews. In Proceedings of the 25th International Conference on World Wide Web (pp. 625-635). International World Wide Web Conferences Steering Committee.
[7]S., Mohanty, K., NathRout, S., Barik, & S.K., Das. A Study on Evolution of Data in Traditional RDBMS to Big Data Analytics.
[8]S., Singh, V., Mandal, & S., Srivastava. The Big Data Analytics with Hadoop
[9]Apache Hadoop,
[10]R., Shobana, D., Saranya. Hadoop on Big Data Analysis. International Journal of Advanced Research Trends in Engineering and Technology
[11]S., Dhawan, & S., Rathee (2013). Big data analytics using Hadoop components like pig and hive. American International Journal of Research in Science, Technology, Engineering & Mathematics, 88, 13-131.
[12]Pig Latin Reference Manual 2.
[13]S., Rathi. A brief Study of Big Data Analytics using Apache Pig and Hadoop Distributed File System
[14]E. L., Lydia, & M. B., Swarup. Analysis of Big data through Hadoop Ecosystem Components like Flume, MapReduce, Pig and Hive.