A Survey Work on Optimization Techniques Utilizing Map Reduce Framework in Hadoop Cluster

Full Text (PDF, 382KB), PP.61-68

Views: 0 Downloads: 0


Bibhudutta Jena 1,* Mahendra Kumar Gourisaria 1 Siddharth Swarup Rautaray 1 Manjusha Pandey 1


* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2017.04.07

Received: 4 Jul. 2016 / Revised: 25 Nov. 2016 / Accepted: 15 Jan. 2017 / Published: 8 Apr. 2017

Index Terms

MAPREDUCE, Optimization, Big Data, HADOOP, NOSQL, Processing Capabilities


Data is one of the most important and vital aspect of different activities in today's world. Therefore vast amount of data is generated in each and every second. A rapid growth of data in recent time in different domains required an intelligent data analysis tool that would be helpful to satisfy the need to analysis a huge amount of data. Map Reduce framework is basically designed to process large amount of data and to support effective decision making. It consists of two important tasks named as map and reduce. Optimization is the act of achieving the best possible result under given circumstances. The goal of the map reduce optimization is to minimize the execution time and to maximize the performance of the system. This survey paper discusses a comparison between different optimization techniques used in Map Reduce framework and in big data analytics. Various sources of big data generation have been summarized based on various applications of big data.The wide range of application domains for big data analytics is because of its adaptable characteristics like volume, velocity, variety, veracity and value .The mentioned characteristics of big data are because of inclusion of structured, semi structured, unstructured data for which new set of tools like NOSQL, MAPREDUCE, HADOOP etc are required. The presented survey though provides an insight towards the fundamentals of big data analytics but aims towards an analysis of various optimization techniques used in map reduce framework and big data analytics.

Cite This Paper

Bibhudutta Jena, Mahendra Kumar Gourisaria, Siddharth Swarup Rautaray, Manjusha Pandey,"A Survey Work on Optimization Techniques Utilizing Map Reduce Framework in Hadoop Cluster", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.4, pp.61-68, 2017. DOI:10.5815/ijisa.2017.04.07


[1]HUI JIANG1, KUN WANG1, YIHUI WANG, MIN GAO, YAN ZHANG, "Energy Big Data: A Survey", Digital Object Identifier 10.1109/ACCESS.2016.2580581. Tim Mattson," HPBC 2015 Keynote Speaker - Big Data: What happens when data actually gets big", Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International.
[2]Carson K. Leung, Hao Zhang, "Management of Distributed Big Data for Social Networks", 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[3]Setareh Rafatirah , Avesta Sasan , Houman Homayoun," System and architecture level characterization of big data applications on big and little core server architectures " IEEE, 2015.
[4]Mohand-Saïd Hacid, Rafiqul Haque, " Blinked Data: Concepts, Characteristics, and Challenge " , 2014 IEEE World Congress on Services.
[5]Nicolas Sicard, B´en´edicte Laurent, Michel Sala, Laurent Bonnet, " REDUCE, YOU SAY: What NoSQL can do for Data Aggregation and BI in Large Repositories " 2011 22nd International Workshop on Database and Expert Systems Applications.
[6]Karamjit Kaur and Rinkle Rani," Modelling and Querying Data in NoSQL Databases ", 2013 IEEE International Conference on Big Data.
[7]Richard K. Lomotey and Ralph Deters, "Terms Mining in Document-Based NoSQL: Response to Unstructured Data " , 2014 IEEE International Congress on Big Data.
[8]Eva Kureková, "Measurement Process Capability – Trends and Approaches", MEASUREMENT SCIENCE REVIEW, Volume 1, Number 1, 2009.
[9]Mehul Nalin Vora, " Hadoop-HBase for Large-Scale Data", 2011 International Conference on Computer Science and Network Technology.
[10]Apache Haddop HDFS homepage http://hadoop.apache.org/hdfs.
[11]Tom White, "Hadoop: The Definitive Guide", 1st edition, O'Reilly Media, June 2009, ISBN 9780596521974.
[12]Nagesh HR, Guru Prasad “High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework” International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.1, pp.75-84, 2017. DOI: 10.5815/ijisa.2017.01.08.
[13]Siddharth S Rautaray, and Manjusha Pandey, “Single and Multiple Hand Gesture Recognition Systems: A Comparative Analysis”, I.J. Intelligent Systems and Applications, 6 (11), 57-65, 2014.
[14]Apache PIG Homepage - http://pig.apache.org/
[15]Apache Hive Homepage - http://hive.apache.org
[16]Apache Sqoop Homepage - http://sqoop.apache.org/
[17]Apache Zoo Keeper Homepage - http://zookeeper.apache.org/
[18]Jeffrey Dean and Sanjay Ghemawat , " Map Reduce: Simplified Data Processing on Large Clusters " , IEEE Micro, 23(2):2228, April 2005.
[19]Troiano, Luigi, Alfredo Vaccaro, and Maria Carmela Vitelli. "On-line smart grids optimization by case-based reasoning on big data", 2016 IEEE Workshop on Environmental Energy and Structural Monitoring Systems (EESMS), 2016.
[20]Ramaprasath, Abhinandan, Anand Srinivasan, and Chung-Horng Lung. "Performance optimization of big data in mobile networks", 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), 2015.
[21]Esma Yildirim, Engin Arslan, Jangyoung Kim, Tevfik Kosar. "Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency", IEEE Transactions on Cloud Computing, 2016.
[22]Mayank Bhushan , Monica Singh , Sumit K Yadav ," Big Data query optimization by using Locality Sensitive Bloom Filter ",IJCT, 2015.
[23]Liu, Yunxiang, and Jiongjun Du. "Parameter Optimization of the SVM for Big Data", 2015 8th International Symposium on Computational Intelligence and Design (ISCID), 2015.
[24]Lanchao Liu and Zhu Han , " Multi-Block ADMM for Big Data Optimization in Smart Grid " , IEEE, 2015.
[25]Al-Madi, Nailah, Ibrahim Aljarah, and Simone A. Ludwig. "Parallel glowworm swarm optimization clustering algorithm based on MapReduce", 2014 IEEE Symposium on Swarm Intelligence, 2014.
[26]A. Ramaprasath, K. Hariharan, A. Srinivasan, “Cache Coherency Algorithm to Optimize Bandwidth in Mobile Networks”, Springer Verlag, Lecture Notes in Electrical Engineering, Networks and Communications, Chapter 24, Volume 284, 2014, pp 297-305.
[27]Ziv J., Lempel A., “A Universal Algorithm for Sequential Data Compression,” IEEE Transactions on Information Theory, Vol. 23, No. 3, pp. 337-343.
[28]E. Yildirim, J. Kim, and T. Kosar, “Optimizing the sample size for a cloud-hosted data scheduling service,” in Proc. 2nd Int. Workshop Cloud Computing. Sci. Appl., 2012.
[29] Mayank Bhushan & Sumit Yadav, “Cost based Model for Big Data Processing with Hadoop Architecture,” volume 14 Issue 2, Year 2014.
[30]Gunjan Varshney1, D. S. Chauhan2, M. P. Dave,” Evaluation of Power Quality Issues in Grid Connected PV Systems”, International Journal of Electrical and Computer Engineering (IJECE), Vol. 6, No. 4, August 2016, pp. 1412~1420.
[31]N.E. Ayat, M. Cheriet, C.Y. Suen, “Automatic model selection for the optimization of SVM kernels,” Artificial intelligence in medicine, vol. 38, no.10, pp. 1733-1745, 2005.
[32]Hamid Bagheri, Abdusalam Abdullah Shaltooki., " Big Data: Challenges, Opportunities and Cloud Based Solutions", International Journal of Electrical and Computer Engineering (IJECE), 2015.