A CV Parser Model using Entity Extraction Process and Big Data Tools

Full Text (PDF, 721KB), PP.21-31

Views: 0 Downloads: 0


Papiya Das 1,* Manjusha Pandey 1 Siddharth Swarup Rautaray 1

1. KIIT University, School Of Computer Engineering, Odisha, Bhubaneswar 751024

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2018.09.03

Received: 8 Feb. 2018 / Revised: 10 May 2018 / Accepted: 14 Jul. 2018 / Published: 8 Sep. 2019

Index Terms

Computerized Database, Conventional Database, Entity Extraction, Natural Language Process-ing(NLP), Temporal Database, Text data mining, Text analytics


Private organizations like offices, libraries, hospi-tals make use of computers for computerized database, when computers became a most cost-effective device.After than E.F Codd introduced relational database model i.e conventional database. Conventional database can be enhanced to temporal database. Conventional or traditional databases are structured in nature. But always we dont have the pre-organized data. We have to deal with different types of data. That data is huge and in large amount i.e Big data. Big data mostly emphasized into internal data sources like transaction, log data, emails etc. From these sources high-enriched information is extracted by the means of process text data mining or text analytics. Entity Extraction is a part of Text Analysis. An entity can be anything like people, companies, places, money, any links, phone number etc. Text documents, bLogposts or any long articles contain large number of entities in many forms. Extracting those entities to gain the valuable information is the main target. Extraction of entities is possible in natural language processing(NLP) with R language. In this research work we will briefly discuss about text analysis process and how to extract entities with different big data tools.

Cite This Paper

Papiya Das, Manjusha Pandey, Siddharth Swarup Rautaray, "A CV Parser Model using Entity Extraction Process and Big Data Tools", International Journal of Information Technology and Computer Science(IJITCS), Vol.10, No.9, pp.21-31, 2018. DOI:10.5815/ijitcs.2018.09.03


[1]H. Joshi and G. Bamnote, “Distributed database: A survey,” Interna-tional Journal Of Computer Science And Applications, vol. 6, no. 2, 2013.

[2]R. Narasimhan and T. Bhuvaneshwari, “Big dataa brief study,” Int. J. Sci. Eng. Res, vol. 5, no. 9, pp. 350–353, 2014.

[3]A. Halavais and D. Lackaff, “An analysis of topical coverage of wikipedia,” Journal of Computer-Mediated Communication, vol. 13, no. 2, pp. 429–440, 2008.

[4]J. A. Stankovic, “Misconceptions about real-time computing: A serious problem for next-generation systems,” Computer, vol. 21, no. 10, pp. 10–19, 1988.

[5]M. Ferguson, “Architecting a big data platform for analytics,” A Whitepaper prepared for IBM, vol. 30, 2012.

[6]B. Mandal, S. Sethi, and R. K. Sahoo, “Architecture of efficient word processing using hadoop mapreduce for big data applications,” in Man and Machine Interfacing (MAMI), 2015 International Conference on. IEEE, 2015, pp. 1–6.

[7]S. Vijayarani and M. R. Janani, “Text mining: open source tokenization tools–an analysis,” Advanced Computational Intelligence, vol. 3, no. 1, pp. 37–47, 2016.

[8]R. Gaikwad Varsha, R. Patil Harshada, and V. B. Lahane, “Survey paper on pattern discovery text mining for document classification.” 

[9]“Entity extraction— aylien,” http://aylien.com/text-api/ entity-extraction.

[10]“Named-entity recognition - wikipedia,” https://en.wikipedia.org/wiki/ Named-entity recognition.

[11]“Entity extraction: How does it work? - expert system,” www. expertsystem.com/entity-extraction-work/.

[12]“Named entity extraction — lexalytics,” https://www.lexalytics.com/ technology/entity-extraction.

[13]Botan, R. Derakhshan, N. Dindar, L. Haas, R. J. Miller, and N. Tatbul, “Secret: a model for analysis of the execution semantics of stream processing systems,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 232–243, 2010.

[14]T. Garcia and T. Wang, “Analysis of big data technologies and method-query large web public rdf datasets on amazon cloud using hadoop and open source parsers,” in Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013, pp. 244–251.

[15]Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017.

[16]D. C¸elik, A. Karakas, G. Bal, C. Gultunca,¨ A. Elc¸i, B. Buluz, and M. C. Alevli, “Towards an information extraction system based on ontology to match resumes and jobs,” in Computer Software and Applications Conference Workshops (COMPSACW), 2013 IEEE 37th Annual. IEEE, 2013, pp. 333–338.

[17]M. Jose, P. S. Kurian, and V. Biju, “Progression analysis of students in a higher education institution using big data open source predictive modeling tool,” in Big Data and Smart City (ICBDSC), 2016 3rd MEC International Conference on. IEEE, 2016, pp. 1–5.

[18]F. Javed, Q. Luo, M. McNair, F. Jacob, M. Zhao, and T. S. Kang, “Carotene: A job title classification system for the online recruitment domain,” in Big Data Computing Service and Applications (BigDataSer-vice), 2015 IEEE First International Conference on. IEEE, 2015, pp. 286–293.

[19]W. Hua, Z. Wang, H. Wang, K. Zheng, and X. Zhou, “Understand short texts by harvesting and analyzing semantic knowledge,” IEEE transactions on Knowledge and data Engineering, vol. 29, no. 3, pp. 499–512, 2017.

[20]“Intelligent hiring with resume parser and ranking using natural ...” https: //www.ijircce.com/upload/2016/april/218 Intelligent.pdf.

[21]P. Shivratri, P. Kshirsagar, R. Mishra, R. Damania, and N. Prabhu, “Resume parsing and standardization,” 2015.

[22]Z. Chuang, W. Ming, L. C. Guang, X. Bo, and L. Zhi-qing, “Resume parser: Semi-structured chinese document analysis,” IEEE, pp. 12–16, 2009.

[23]Ulusoy, “Research issues in real-time database systems: survey paper,” Information Sciences, vol. 87, no. 1-3, pp. 123–151, 1995.

[24]A. M. Jadhav and D. P. Gadekar, “A survey on text mining and its techniques,” International Journal of Science and Research (IJSR), vol. 3, no. 11, 2014.