ComPer: A Comprehensive Performance Evaluation Method for Recommender Systems

Full Text (PDF, 1384KB), PP.1-18

Views: 0 Downloads: 0


Alaa Alslaity 1,* Thomas Tran 1

1. University of Ottawa, Ottawa, K1N 6N5, Canada

* Corresponding author.


Received: 10 Jun. 2019 / Revised: 22 Aug. 2019 / Accepted: 25 Oct. 2019 / Published: 8 Dec. 2019

Index Terms

Recommender Systems, Recommendation Evaluation, Experiments Replication, performance, unified evaluation


Recommender Systems are receiving substantial attention in several application areas (such as healthcare systems and e-commerce), where each area has different requirements. These systems are multifaceted by nature. So, many metrics, which are sometimes contradictious, are introduced to assess different aspects. The existence of several alternatives and dimensions to recommendation approaches complicate the evaluation of recommender systems. In such a situation, it is desirable to evaluate and compare recommenders in a united way that assesses the multifaceted aspects of these systems fairly and uniformly. Despite the abundance of evaluation dimensions, the literature still lacks an evaluation method that evaluates the multiple properties of these systems, all at once. As a potential solution, this paper proposes an evaluation methodology that provides a multidimensional assessment of recommender systems. The proposed method, which we call ComPer, combines the most common evaluation dimensions into a single, yet, general evaluation metric. ComPer is inspired by the idea that a recommender system mimics human beings; hence, it can be seen as a human and its outputs can be assessed as human’s outputs. Up to our knowledge, this is the first evaluation approach that deals with recommenders as humans. ComPer aims to be thorough (by combining multiple dimensions), simple (by presenting the final result as a single value), and independent (by providing setting-independent results). The applicability of the proposed methodology is evaluated empirically using three different datasets. The initial results are promising in the sense that ComPer is able to give comparable results regardless of the experimental settings.

Cite This Paper

Alaa Alslaity, Thomas Tran, "ComPer: A Comprehensive Performance Evaluation Method for Recommender Systems", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.12, pp.1-18, 2019. DOI:10.5815/ijitcs.2019.12.01


[1]Meyer, F., Françoise F., Fabrice C., and Eric G. "Toward A New Protocol to Evaluate Recommender Systems." arXiv preprint arXiv:1209.1983. (2012).

[2]Avazpour, I., Teerat P., Lars G., and John G. "Dimensions and Metrics For Evaluating Recommendation Systems." In Recommendation systems in software engineering, pp. 245-273. Springer, Berlin, Heidelberg. (2014).

[3]Bellogin, A., Pablo C., and Ivan C. "Precision-oriented Evaluation of Recommender Systems: an algorithmic comparison." In Proceedings of the fifth ACM conference on Recommender systems, pp. 333-336. ACM. (2011).

[4]Said, A., and Alejandro, B. “Comparative Recommender System Evaluation: Benchmarking Recommendation Frameworks”. In: Proceedings of the 8th ACM Conference on Recommender systems. ACM. (2014)

[5]Del O., Félix H., and Elena G. "Evaluation of Recommender Systems: A New Approach." Expert Systems with Applications 35, no. 3: 790-804. (2008).

[6]Krathwohl, David R. "A Revision of Bloom's Taxonomy: An Overview." Theory into practice 41, no. 4: 212-218. (2002).

[7]Campos, P. G., Fernando D., and Iván C. "Time-Aware Recommender Systems: A Comprehensive Survey and Analysis of Existing Evaluation Protocols." User Modeling and User-Adapted Interaction 24, no. 1-2: 67-119. (2014).

[8]Arana-Llanes, J. Y., Rendón-Miranda, J. C., González-Serna, J. G., & Alejandres-Sánchez, H. O. “Design and User-Centered Evaluation of Recommender Systems for Mobile Devices-Methodology for User-Centered Evaluation of Context-Aware Recommender Systems”. In International Conference on Computational Science and Computational Intelligence (CSCI), (Vol. 2, pp. 277-280). IEEE. (2014).

[9]Pu, P., Li C., and Rong H. "A User-Centric Evaluation Framework for Recommender Systems." In Proceedings of the fifth ACM conference on Recommender systems, pp. 157-164. ACM. (2011)

[10]Böhmer, M., Lyubomir G., and Antonio K. "Appfunnel: A Framework for Usage-Centric Evaluation of Recommender Systems That Suggest Mobile Applications." In Proceedings of the 2013 international conference on Intelligent user interfaces, pp. 267-276. ACM. (2013).

[11]Erdt, M., Florian J., Katja S., and Christoph R. "Investigating Crowdsourcing as an Evaluation Method For TEL Recommender Systems." In ECTEL meets ECSCW 2013: Workshop on Collaborative Technologies for Working and Learning, vol. 1047, pp. 25-29. (2013)

[12]Najjar, N. A., and David C. W. "Evaluating Group Recommendation Strategies in Memory-Based Collaborative Filtering." In Proceedings of the ACM recommender systems conference workshop on human decision making in recommender systems, pp. 43-51. New York, NY: ACM. (2011)

[13]Isinkaye, F. O., Folajimi Y. O., and Ojokoh B. A. "Recommendation Systems: Principles, Methods and Evaluation." Egyptian Informatics Journal 16, no. 3: 261-273. (2015).

[14]Guibing G., Jie Z., Zhu S., and Neil Y. “LibRec: A Java Library for Recommender Systems”. in Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP). (2015)

[15]Hofmann, T., and Jan P. "Latent Class Models for Collaborative Filtering." In IJCAI, vol. 99, no. 1999. (1999)

[16]Hofmann, T. "Latent Semantic Models for Collaborative Filtering." ACM Transactions on Information Systems (TOIS) 22, no. 1: 89-115. (2004)

[17]Pang, Jiaona, Jun Guo, and Wei Zhang. "Using Multi-Objective Optimization to Solve the Long Tail Problem in Recommender System." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, (2019).

[18]Smyth, B., and Paul M. "Similarity vs. Diversity." In International Conference on Case-Based Reasoning, pp. 347-361. Springer, Berlin, Heidelberg. (2001)

[19]Said, A., Bellogín, A., and De V. A. “A Top-N Recommender System Evaluation Protocol Inspired By Deployed Systems”. In LSRS Workshop at ACM RecSys. (2013).

[20]Chen, Mingang, and Pan Liu. "Performance Evaluation of Recommender Systems." International Journal of Performability Engineering 13, no. 8 (2017).

[21]Silveira, T., Zhang, M., Lin, X., Liu, Y. and Ma, S. "How Good Your Recommender System Is? A Survey on Evaluations in Recommendation". International Journal of Machine Learning and Cybernetics, 10(5), pp.813-831. (2019)

[22]Zheng, Y., Mobasher, B., & Burke, R. “Carskit: A Java-Based Context-Aware Recommendation Engine”. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 1668-1671). IEEE. (2015)

[23]Garimella, K., et al. "Reducing Controversy by Connecting Opposing Views." Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM. (2017)

[24]Christakopoulou, E., and George K. "Local Item-Item Models for Top-N Recommendation." Proceedings of the 10th ACM Conference on Recommender Systems. ACM. (2016)

[25]Cheng, W., Guisheng Y., Yuxin D., Hongbin D., and Wansong Z. "Collaborative Filtering Recommendation On Users’ Interest Sequences." PloS one 11, no. 5 (2016): e0155739.

[26]Duwairi, R. M. "Machine Learning For Arabic Text Categorization." Journal of the American Society for Information Science and Technology 57.8: 1005-1010. (2006)

[27]Yu, T., Guo, J., Li, W., Wang, H. J., & Fan, L. “Recommendation With Diversity: An Adaptive Trust-Aware Model”. Decision Support Systems, 113073. (2019)

[28]Bag, S., Abhijeet G., and Manoj K. T. "An Integrated Recommender System For Improved Accuracy and Aggregate Diversity." Computers & Industrial Engineering 130, p.p: 187-197. (2019)

[29]Navigli, R., and Federico M. "An Overview of Word And Sense Similarity." Natural Language Engineering. p.p : 1-22. (2019)

[30]Ge, M., Carla D., and Dietmar, J. "Beyond Accuracy: Evaluating Recommender Systems By Coverage And Serendipity." In Proceedings of the fourth ACM conference on Recommender systems, pp. 257-260. ACM, (2010).

[31]Champiri, D., Adeleh A., and Salim S. "Meta-Analysis of Evaluation Methods and Metrics Used in Context-Aware Scholarly Recommender Systems." Knowledge and Information Systems p.p: 1-32. (2019)

[32]Sohail, S., Jamshed S., and Rashid A. "A Comprehensive Approach For the Evaluation of Recommender Systems Using Implicit Feedback." International Journal of Information Technology 11, no. 3 (2019): 549-567.

[33]Del O., Félix H., and Elena G. "Evaluation of Recommender Systems: A New Approach." Expert Systems with Applications 35, no. 3: 790-804. (2008)