IJCNIS Vol. 17, No. 6, 8 Dec. 2025
Cover page and Table of Contents: PDF (size: 1914KB)
PDF (1914KB), PP.98-115
Views: 0 Downloads: 0
PEGASUS, Flan-T5, Dual Summary Framework, ROUGE, BERTScore, Dataset Diversity, Abstractive Text Summarization
Abstractive summarization plays a critical role in managing large volumes of textual data, yet it faces persistent challenges in consistency and evaluation. Our study compares two state-of-the-art models, PEGASUS and Flan-T5, across a diverse range of benchmark datasets using both ROUGE and BARTScore metrics. Findings reveal that PEGASUS excels in generating detailed, coherent summaries for large-scale texts evidenced by an R-1 score of 0.5874 on Gigaword while Flan-T5, enhanced by our novel T5 Dual Summary Framework, produces concise outputs that closely align with reference lengths. Although ROUGE effectively measures lexical overlap, its moderate correlation with BARTScore indicates that it may overlook deeper semantic quality. This underscores the need for hybrid evaluation approaches that integrate semantic analysis with human judgment to more accurately capture summary meaning. By introducing a robust benchmark and the pioneering T5 Dual Framework, our research advocates for task-specific optimization and more comprehensive evaluation methods. Moreover, current dataset limitations point to the necessity for broader, more inclusive training sets in future studies.
Abdulrahman Mohsen Ahmed Zeyad, Arun Biradar, "Abstractive Text Summarization: A Hybrid Evaluation of Integrating Flan-T5 (Dual Framework) with Pegasus Reveals Conciseness Advantages across Diverse Datasets", International Journal of Computer Network and Information Security(IJCNIS), Vol.17, No.6, pp.98-115, 2025. DOI:10.5815/ijcnis.2025.06.07
[1]K. Yao, L. Zhang, D. Du, T. Luo, L. Tao, and Y. Wu, “Dual encoding for abstractive text summarization,” IEEE Trans. Cybern., vol. 50, no. 3, pp. 985–996, 2020, doi: 10.1109/TCYB.2018.2876317.
[2]A. M. A. Zeyad and A. Biradar, “A Hybrid Text Summarization Approach Using Neural Networks and Metaheuristic Algorithms,” Int. J. Saf. Secur. Eng., vol. 13, no. 3, pp. 479–489, 2023, doi: 10.18280/ijsse.130310.
[3]R. Pasunuru and M. Bansal, “Multi-reward reinforced summarization with saliency and entailment,” NAACL HLT 2018 - 2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 2, pp. 646–653, 2018, doi: 10.18653/v1/n18-2102.
[4]W. Kryściński, R. Paulus, C. Xiong, and R. Socher, “Improving abstraction in text summarization,” Proc. 2018 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2018, pp. 1808–1817, 2018, doi: 10.18653/v1/d18-1207.
[5]J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “PEGASUS: Pre-Training with extracted gap-sentences for abstractive summarization,” in 37th International Conference on Machine Learning, ICML 2020, 2020, pp. 11328–11339. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3524938.3525989
[6]C. Raffel et al., Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Bloomsbury Publishing, 2020. doi: 10.5040/9781408182406.00000009.
[7]Y. K. Atri, V. Goyal, and T. Chakraborty, “Multi-Document Summarization Using Selective Attention Span and Reinforcement Learning,” IEEE/ACM Trans. Audio Speech Lang. Process., vol. 31, pp. 3457–3467, 2023, doi: 10.1109/TASLP.2023.3316459.
[8]T. Hirao, M. Nishino, J. Suzuki, and M. Nagata, “Enumeration of extractive oracle summaries,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 1, pp. 386–396, 2017, doi: 10.18653/v1/e17-1037.
[9]M. Grusky, M. Naaman, and Y. Artzi, “Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Stroudsburg, PA, USA: Association for Computational Linguistics, 2018, pp. 708–719. doi: 10.18653/v1/N18-1065.
[10]A. M. Rush, S. Chopra, and J. Weston, “A Neural Attention Model for Abstractive Sentence Summarization,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2015, pp. 379–389. doi: 10.18653/v1/D15-1044.
[11]M. Zhang, C. Li, M. Wan, X. Zhang, and Q. Zhao, “ROUGE-SEM: Better evaluation of summarization using ROUGE combined with semantics,” Expert Syst. Appl., vol. 237, no. PA, p. 121364, 2024, doi: 10.1016/j.eswa.2023.121364.
[12]T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “Bertscore: Evaluating Text Generation With Bert,” 2020, doi: https://doi.org/10.48550/arXiv.1904.09675.
[13]A. R. Fabbri et al., “SummEval: Re-evaluating Summarization Evaluation,” Trans. Assoc. Comput. Linguist., pp. 391–409, 2021, doi: 10.1162/tacl_a_00373.
[14]G. Moro and L. Ragazzi, “Align-then-abstract representation learning for low-resource summarization,” Neurocomputing, vol. 548, pp. 1–9, 2023, doi: https://doi.org/10.1016/j.neucom.2023.126356.
[15]M. Sänger, L. Weber, and U. Leser, “WBI at MEDIQA 2021: Summarizing Consumer Health Questions with Generative Transformers,” Assoc. Comput. Linguist., pp. 86–95, 2021, doi: 10.18653/v1/2021.bionlp-1.9.
[16]M. Zhang, G. Zhou, W. Yu, N. Huang, and W. Liu, “A Comprehensive Survey of Abstractive Text Summarization Based on Deep Learning,” Comput. Intell. Neurosci., vol. 2022, pp. 1–21, Aug. 2022.
[17]A. G. Etemad, A. I. Abidi, and M. Chhabra, “A Review on Abstractive Text Summarization Using Deep Learning,” in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2021, pp. 1–6. doi: 10.1109/ICRITO51393.2021.9596500.
[18]J. Zhong, Z. Wang, and others, “Mtl-das: Automatic text summarization for domain adaptation,” Comput. Intell. Neurosci., vol. 2022, 2022.
[19]S. Narayan, S. B. Cohen, and M. Lapata, “Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2018, pp. 1797–1807. doi: 10.18653/v1/D18-1206.
[20]A. P. Widyassari, A. Affandy, E. Noersasongko, A. Z. Fanani, A. Syukur, and R. S. Basuki, “Literature Review of Automatic Text Summarization: Research Trend, Dataset and Method,” in 2019 International Conference on Information and Communications Technology (ICOIACT), 2019, pp. 491–496. doi: 10.1109/ICOIACT46704.2019.8938454.
[21]S. Rothe, J. Maynez, and S. Narayan, “A Thorough Evaluation of Task-Specific Pretraining for Summarization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 140–145. doi: 10.18653/v1/2021.emnlp-main.12.
[22]G. Tsuchiya, “Postmortem Angiographic Studies on the Intercoronary Arterial Anastomoses.: Report I. Studies on Intercoronary Arterial Anastomoses in Adult Human Hearts and the Influence on the Anastomoses of Strictures of the Coronary Arteries.,” Jpn. Circ. J., vol. 34, no. 12, pp. 1213–1220, 1971, doi: 10.1253/jcj.34.1213.
[23]A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1029–1046, 2022, doi: https://doi.org/10.1016/j.jksuci.2020.05.006.
[24]Y. D. Prabowo, A. I. Kristijantoro, H. Warnars, and W. Budiharto, “Systematic literature review on abstractive text summarization using kitchenham method,” ICIC Express Lett. Part B Appl., vol. 12, no. 11, pp. 1075–1080, 2021.
[25]P. Wang et al., “Multi-Document Scientific Summarization from a Knowledge Graph-Centric View,” in Proceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, and S.-H. Na, Eds., Gyeongju, Republic of Korea: International Committee on Computational Linguistics, Oct. 2022, pp. 6222–6233. [Online]. Available: https://aclanthology.org/2022.coling-1.543
[26]A. Cohan, G. Feigenblat, T. Ghosal, and M. Shmueli-Scheuer, “Overview of the First Shared Task on Multi Perspective Scientific Document Summarization ({M}u{P}),” in Proceedings of the Third Workshop on Scholarly Document Processing, A. Cohan, G. Feigenblat, D. Freitag, T. Ghosal, D. Herrmannova, P. Knoth, K. Lo, P. Mayr, M. Shmueli-Scheuer, A. de Waard, and L. L. Wang, Eds., Gyeongju, Republic of Korea: Association for Computational Linguistics, Oct. 2022, pp. 263–267. [Online]. Available: https://aclanthology.org/2022.sdp-1.32
[27]X. Tie et al., “Personalized Impression Generation for PET Reports Using Large Language Models,” J. Imaging Informatics Med., pp. 1–18, 2024.
[28]R. Tang, G. Lueck, R. Quispe, H. Inan, J. Kulkarni, and X. Hu, “Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks,” in Findings of the Association for Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds., Singapore: Association for Computational Linguistics, Dec. 2023, pp. 15406–15418. doi: 10.18653/v1/2023.findings-emnlp.1029.
[29]S. Ruder, I. Vulić, and A. Søgaard, “Square one bias in NLP: Towards a multi-dimensional exploration of the research manifold,” arXiv Prepr. arXiv2206.09755, 2022.
[30]C. P. Chai, “Comparison of text preprocessing methods,” Nat. Lang. Eng., vol. 29, no. 3, pp. 509–553, 2023.
[31]S. Wadhwa, S. Amir, and B. C. Wallace, “Revisiting relation extraction in the era of large language models,” in Proceedings of the conference. Association for Computational Linguistics. Meeting, 2023, p. 15566.
[32]H. W. Chung et al., “Scaling Instruction-Finetuned Language Models,” pp. 1–54, 2022, doi: https://doi.org/10.48550/arXiv.2210.11416.
[33]G. Raposo, L. Coheur, and B. Martins, “Prompting, Retrieval, Training: An exploration of different approaches for task-oriented dialogue generation,” Assoc. Comput. Linguist., pp. 400–412, 2023, doi: 10.18653/v1/2023.sigdial-1.37.
[34]J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “PEGASUS: Pre-Training with extracted gap-sentences for abstractive summarization,” in 37th International Conference on Machine Learning, ICML 2020, 2020, pp. 11328–11339. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3524938.3525989
[35]A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, no. Nips, pp. 5999–6009, 2017.
[36]J. W. Rae et al., “Scaling language models: Methods, analysis \& insights from training gopher,” arXiv Prepr. arXiv2112.11446, 2021.
[37]A. Fabbri, I. Li, T. She, S. Li, and D. Radev, “Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1074–1084. doi: 10.18653/v1/P19-1102.
[38]T. Kudo and J. Richardson, “Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” arXiv Prepr. arXiv1808.06226, 2018.
[39]T. B. Brown et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 2020-Decem. 2020.
[40]C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data,” Front. energy Res., vol. 9, p. 652801, 2021.
[41]A. Fabbri, I. Li, T. She, S. Li, and D. Radev, “Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 1074–1084. doi: 10.18653/v1/P19-1102.
[42]M. S. and P. J. L. Jingqing Zhang, Yao Zhao, “XSum Dataset.” [Online]. Available: https://huggingface.co/google/pegasus-xsum
[43]M. S. and P. J. L. Authors: Jingqing Zhang, Yao Zhao, “CNN/DailyMail Dataset.” [Online]. Available: https://huggingface.co/google/pegasus-cnn_dailymail
[44]M. S. and P. J. L. Authors: Jingqing Zhang, Yao Zhao, “Multi-News Dataset.” [Online]. Available: https://huggingface.co/google/pegasus-multi_news
[45]M. S. and P. J. L. Authors: Jingqing Zhang, Yao Zhao, “Newsroom Dataset.” [Online]. Available: https://huggingface.co/google/pegasus-newsroom
[46]M. S. and P. J. L. Authors: Jingqing Zhang, Yao Zhao, “Gigaword Dataset.” [Online]. Available: https://huggingface.co/google/pegasus-gigaword
[47]M. Lewis and et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020. [Online]. Available: https://aclanthology.org/2020.acl-main.703/
[48]J. Lovelace, V. Kishore, C. Wan, E. Shekhtman, and K. Q. Weinberger, “Latent Diffusion for Language Generation.” 2023. [Online]. Available: https://arxiv.org/abs/2212.09462
[49]M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. Huang, “A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds., Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 431–469. doi: 10.18653/v1/2023.findings-acl.29.
[50]M. Zhong, P. Liu, Y. Chen, D. Wang, X. Qiu, and X. Huang, “Extractive Summarization as Text Matching,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 6197–6208. doi: 10.18653/v1/2020.acl-main.552.
[51]J. Junadhi, A. Agustin, L. Efrizoni, F. Okmayura, D. R. Habibie, and M. Muslim, “Improving Evaluation Metrics for Text Summarization: A Comparative Study and Proposal of a Novel Metric,” J. Appl. Data Sci., vol. 6, no. 2, pp. 885–896, 2025.
[52]R. Sarwar et al., “HybridEval: An Improved Novel Hybrid Metric for Evaluation of Text Summarization,” J. Informatics Web Eng., vol. 3, no. 3, pp. 233–255, 2024.