Towards an Efficient Big Data Indexing Approach under an Uncertain Environment

Full Text (PDF, 516KB), PP.1-13

Views: 0 Downloads: 0


Asma Omri 1,* Mohamed Nazih Omri 1

1. MARS Research Laboratory LR17ES05, University of Sousse, Tunisia

* Corresponding author.


Received: 7 Oct. 2021 / Revised: 18 Nov. 2021 / Accepted: 2 Jan. 2022 / Published: 8 Apr. 2022

Index Terms

Indexing, Probabilistic, Big Data, Syntactic, Uncertain


It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.

Cite This Paper

Asma Omri, Mohamed Nazih Omri, "Towards an Efficient Big Data Indexing Approach under an Uncertain Environment", International Journal of Intelligent Systems and Applications(IJISA), Vol.14, No.2, pp.1-13, 2022. DOI: 10.5815/ijisa.2022.02.01


[1] M-F. Bruandet, J-P. Chevallet, and F. Paradis, « Construction de thesaurus dans le système de recherche d’information IOTA : application a l’extraction de la terminologie ».1eres Journees Scientifiques et Techniques du Reseau Francophone de l’Ingerierie de la Langue de l’AUPELF-URF, Avignon, France. pp. 537–544, 1997.
[2] P. Bosc, and O. Pivert, “About Possibilistic Queries and Their Evaluation”, IEEE Transactions on Fuzzy Systems (TFS). Vol.15 no.3, pp.439–452. 2007.
[3] P. Bosc, and O. Pivert, “About projection-selection-join queries addressed to possibilistic relational databases”, IEEE Trans-actions on Fuzzy Systems (TFS). Vol.13 no. 1, pp. 124–139, 2005.
[4] C. Tambellini, “An information retrieval system adapted to uncertain data: adaptation of language model”. 2007.Leo Breiman. Random forests.Machine Learning, 45:5–32, 2001.1936.
[5] Z. Lv, X. Li, H. Lv, W. Xiu, “BIM Big Data Storage in WebVRGIS”, IEEE Transactions on Industrial Informatics. Vol.16 no. 4, pp. 2566 – 2573, 2019.
[6] P. Bosc, N. Lietard, and O. Pivert, “About Inclusion-Based Generalized Yes/No Queries in a Possibilistic Database Context”. ISMIS. pp.284–289, 2006.
[7] K. Benouaret, D. Benslimane, A. Hadjali, M. Barhamgi, Z. Maamar, and Q. Z. Sheng, “Web Service Compositions with Fuzzy Preferences: A Graded Dominance Relationship-Based Approach”, ACM Transactions on Internet Technology, Vol. 13 no.4, pp. 1–33, 2014.
[8] Q. He, H. Wang, F. Zhuang, T. Shang, and Z. Shi, “Parallel sampling from big data with uncertainty distribution. Fuzzy Sets and Systems, Vol.25 no. 8, pp. 117–133, 2015.
[9] A. Berko, and V. Alieksieiev, “A Method to Solve Uncertainty Problem for Big Data Sources”. 2018 IEEE Second Interna-tional Conference on Data Stream Mining & Processing (DSMP), pp. 32-37, 2018.
[10] G. Viswanath. and P.V. Krishna, “Hybrid encryption framework for securing big data storage in multi-cloud environment”, Evolutionary Intelligence. Vol.14: pp. 691–698, 2021.
[11] W. Xizhao, and H. Yulin, “Learning from Uncertainty for Big Data: Future Analytical Challenges and Strategies”, IEEE Sys-tems, Man, and Cybernetics Magazine, Vol.2 no. 2, pp. 26-31, 2016.
[12] S. Sadeghfam, S. Sadeghfam, A. Nadiri, and K. Ghodsi, “Next Stages in Aquifer Vulnerability Studies by Integrating Risk Indexing with Understanding Uncertainties by using Generalised Likelihood Uncertainty Estimation”, Exposure and Health. Vol.13, pp.1-15, 2021.
[13] C-C, Lai, H-Y. Lin, and C-M. Liu, “Highly Efficient Indexing Scheme for k-Dominant Skyline Processing over Uncertain Data Streams. The 30th Wireless and Optical Communications Conference (WOCC 2021), 2021.
[14] B. Kao, S. Lee, F. Lee, D. Cheung, and W-S. Ho, “Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index”, IEEE Transactions on Knowledge and Data Engineering, vol. 22 no. 9, pp. 1219-1233, 2010.
[15] R. Zhu, B. Wang, and G. Wang, "Indexing Uncertain Data for Supporting Range Queries”, Web-Age Information Manage-ment, Springer International Publishing, 72–83. 2014.
[16] C. Charu, “Aggarwal and Philip S. Yu. On Indexing High Dimensional Data with Uncertainty, 2008.
[17] S. Singh, and C. Mayfield, S. Prabhakar, R. Shah, and C. Hambrusch, “Indexing Uncertain Categorical Data”. Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007,
[18] V. Almeida, and R. Güting, “Supporting uncertainty in moving objects in network databases”, 13th ACM International Work-shop on Geographic Information Systems, ACM-GIS, pp. 31-40. 2005.
[19] D. Veronika, “Indexing Uncertainty: The Case of Turn-Final Or”. Research on Language and Social Interaction. Routledge. Vol.48 no.3, pp.301-318, 2015.
[20] R. Li, B. Bhanu, Chinya Ravishankar, M. Kurth, J. Ni, "Uncertain spatial data handling: Modeling, indexing and query", Computers & Geosciences, Vol. 33, Issue 1, pp. 42-61, ISSN 0098-3004, 2007.
[21] L. M. Mohammed, I. Hamidah, M. Nor Fazlida, Y. Razali,"An Indexed Non-Probability Skyline Query Processing Framework for Uncertain Data", International Conference on Advanced Machine Learning Technologies and Applications Springer Sin-gapore, w
[22] Z. Sun, X. Huang, J. Xu, and F. Bonchi, “Efficient Probabilistic Truss Indexing on Uncertain Graphs”, In Proceedings of the Web Conference 2021 (WWW '21). Association for Computing Machinery, New York, NY, USA, pp.354–366. DOI:
[23] M-N. Omri. “Relevance Feedback for Goal’s Extraction from Fuzzy Semantic Networks”, Asian Journal of Information Technology (AJIT), Vol. 3 no. 6, pp.434-440, 2004.
[24] K. Garrouch, M-N. Omri, and A. Kouzana, “A New Information Retrieval Model Based on Possibilistic Bayesian Networks”, Journal of Information Systems Management, vol. 2 no.2, pp. 79-88, 2012.
[25] R. Boughamoura, M-N, Omri, and H. Youssef, “A Fuzzy Approach for Pertinent Information Extraction from Web Re-sources”, International Journal of Computational Science, vol. 1 no.1, pp.13-30, 2007.
[26] M-N, Omri, “Possibilistic Pertinence Feedback and Semantic Networks for Goal’s Extraction”, Asian Journal of Information Technology (AJIT), vol. 3 no. 4, pp.258-265, 2004.
[27] Aaron Zimba, Victoria Chama," Cyber Attacks in Cloud Computing: Modelling Multi-stage Attacks using Probability Density Curves", International Journal of Computer Network and Information Security, Vol.10, No.3, pp.25-36, 2018.
[28] H-L. Truong, and S. Dustdar, “On analyzing and specifying concerns for data as a service”, IEEE Asia-Pacific Conference on
Services Computing (APSCC), pp. 87–94, 2009.
[29] A-L. Lemos, F. Daniel, and B. Benatallah, “Web Service Composition: A Survey of Techniques and Tools”, ACM Computing Surveys (CSUR). Vol. 48 no. 3, pp.33, 2016.
[30] P. Bosc, and H. Prade, “An introduction to the fuzzy set and possibility theory-based treatment of soft queries and uncertain or imprecise databases”. SPRINGER. 1994.
[31] P. Smets, “Imperfect information: Imprecision and uncertainty. In Uncertainty Management in Information Systems”, Kluwer Academic Publishers. 1996.
[32] H. Fukuda, and T-W. Chou, « A probabilistic theory of the strength of short-fibre composites with variable fibre length and orientation”, Journal of Materials Science, vol.17, pp.1003–1011, 1982.
[33] A. A. Borovkov, “Limit Theorems on the Distributions of Maxima of Sums of Bounded Lattice Random Variables. I. Theory of Probability & Its Applications, Vol. 5 no. 2, 1960.
[34] J. Bendler, S. Wagner, T. Brandt, and D. Neumann, “Taming Uncertainty in Big Data - Evidence from Social Media in Urban Areas, Business & Information Systems Engineering, Vol. 6 no. 5, pp. 279–288, 2014.
[35] K. Garrouch, and M-N. Omri. “Fuzzy Networks based Information Retrieval Model”, International Journal of Computer In-formation Systems and Industrial Management Applications, Vol. 8, 2016.
[36] K. Garrouch, and M-N. Omri, “Possibilistic Network based Information Retrieval Model”, The International Conference on Intelligent Systems Design and Applications (ISDA), 2015.
[37] A. R. Pathak. M. Pandey. And S. Rautaray, “Adaptive Model for Dynamic and Temporal Topic Modeling from Big Data us-ing Deep Learning Architecture”, I. J. Computer Network and Information Security, vol. 6, pp.13-27, 2016.
[38] A. Vasilakopoulos, and V. Kantere, “Efficient Query Computing for Uncertain Possibilistic Databases with Provenance”, 3rd Workshop on the Theory and Practice of Provenance (TaPP), 2011.
[39] A. Malki, D. Benslimane, S-M. Benslimane, M. Barhamgi, M. Malki, P. Ghodous, and K. Drira, “Data Services with uncertain and correlated semantics”, World Wide Web, vol.19, pp.157–175, 2016.
[40] A. Malki, M. Barhamgi, S-M. Benslimane, D. Benslimane, and M. Malki, “Composing Data Services with Uncertain Seman-tics”, IEEE Transactions on Knowledge and Data Engineering, vol. 7 no. 4, 936 – 949, 2015.
[41] B. Li, and E. Gaussier, “Modèles de langue pour la recherche d’information’’, Document numérique, Vol. 16, pp. 11-30, 2013.
[42] M. E. Maron, and J. L. Kuhns, “On Relevance, Probabilistic Indexing and Information Retrieval”, Journal of the ACM, Vol. 7 no. 3, 1960.
[43] J. M. Ponte, and W. B. Croft, “A Language Modeling Approach to Information Retrieval. SIGIR’98”, Proceedings of the 21st Annual International (ACM) Conference on Research and Development in Information Retrieval, August 24-28, 1998, Mel-bourne, Australia. 275–281,1998.
[44] A. Panwar, A. Jain, M. Kumar. A Novel Probability based Approach for Optimized Prefetching. I.J. Information Engineering and Electronic Business, 5, 60-67, 2016.