MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2

Full Text (PDF, 565KB), PP.53-69

Views: 0 Downloads: 0


Sabbir Hossain 1 Rahman Sharar 1,* Md. Ibrahim Bahadur 1 Abu Sufian 1 Rashidul Hasan Nabil 1

1. Department of Computer Science, Faculty of Science and Technology, American International University-Bangladesh, Dhaka, Bangladesh

* Corresponding author.


Received: 1 Mar. 2023 / Revised: 23 Apr. 2023 / Accepted: 29 May 2023 / Published: 8 Aug. 2023

Index Terms

Medical chatbot, RNN, LSTM, GRU, TNN, KeyBERT, BioBERT, GPT-2


The emergence of chatbots over the last 50 years has been the primary consequence of the need of a virtual aid. Unlike their biological anthropomorphic counterpart in the form of fellow homo sapiens, chatbots have the ability to instantaneously present themselves at the user's need and convenience. Be it for something as benign as feeling the need of a friend to talk to, to a more dire case such as medical assistance, chatbots are unequivocally ubiquitous in their utility. This paper aims to develop one such chatbot that is capable of not only analyzing human text (and speech in the near future), but also refining the ability to assist them medically through the process of accumulating data from relevant datasets. Although Recurrent Neural Networks (RNNs) are often used to develop chatbots, the constant presence of the vanishing gradient issue brought about by backpropagation, coupled with the cumbersome process of sequentially parsing each word individually has led to the increased usage of Transformer Neural Networks (TNNs) instead, which parses entire sentences at once while simultaneously giving context to it via embeddings, leading to increased parallelization. Two variants of the TNN Bidirectional Encoder Representations from Transformers (BERT), namely KeyBERT and BioBERT, are used for tagging the keywords in each sentence and for contextual vectorization into Q/A pairs for matrix multiplication, respectively. A final layer of GPT-2 (Generative Pre-trained Transformer) is applied to fine-tune the results from the BioBERT into a form that is human readable. The outcome of such an attempt could potentially lessen the need for trips to the nearest physician, and the temporal delay and financial resources required to do so.

Cite This Paper

Sabbir Hossain, Rahman Sharar, Md. Ibrahim Bahadur, Abu Sufian, Rashidul Hasan Nabil, "MediBERT: A Medical Chatbot Built Using KeyBERT, BioBERT and GPT-2", International Journal of Intelligent Systems and Applications(IJISA), Vol.15, No.4, pp.53-69, 2023. DOI:10.5815/ijisa.2023.04.05


[1]Ayanouz, S., Abdelhakim, B. and Benhmed, M., 2020. A Smart Chatbot Architecture based NLP and Machine Learning for Health Care Assistance. Proceedings of the 3rd International Conference on Networking, Information Systems & Security, doi:
[2]A. Mullen, L., Benoit, K., Keyes, O., Selivanov, D., & Arnold, J. (2018). Fast, Consistent Tokenization of Natural Language Text. Journal of Open Source Software, 3(23), 655.
[3]Kumar, L., & Bhatia, P. K. (2013). TEXT MINING: CONCEPTS, PROCESS AND APPLICATIONS. Journal of Global Research in Computer Sciences, 4(3), 36–39.
[4]Ferilli, S., Esposito, F., & Grieco, D. (2014). Automatic Learning of Linguistic Resources for Stopword Removal and Stemming from Text. Procedia Computer Science, 38, 116–123.
[5]du Buf, J., Kardan, M., & Spann, M. (1990). Texture feature performance for image segmentation. Pattern Recognition, 23(3–4), 291–309. (90)90017-f
[6]Fattahi, J., & Mejri, M. (2021). SpaML: a Bimodal Ensemble Learning Spam Detector based on NLP Techniques. 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP).
[7]Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7), 1527–1554.
[8]Williams, R. J., & Zipser, D. (1995). Gradient-based learning algorithms for recurrent networks and their computational complexity. L. Erlbaum Associates Inc. EBooks, 433–486.
[9]Kim, Y., Denton, C., Hoang, L., & Rush, A. M. (2017). Structured Attention Networks. ArXiv: Computation and Language.
[10]Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
[11]Hu, Y., Huber, A. E. G., Anumula, J., & Liu, S. (2018). Overcoming the vanishing gradient problem in plain recurrent networks. Cornell University - ArXiv.
[12]Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356.
[13]Robinson, A. J. & Fallside, F. (1987). The Utility Driven Dynamic Error Propagation Network (CUED/F-INFENG/TR.1). Engineering Department, Cambridge University.
[14]Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
[15]Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Computation, 31(7), 1235–1270.
[16]Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10), 2451–2471.
[17]Li, W., Qi, F., Tang, M., & Yu, Z. (2020). Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing, 387, 63–77.
[18]Shao, D., Zheng, N., Yang, Z., Chen, Z., Xiang, Y., Xian, Y., & Yu, Z. (2019). Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model. IEEE Access, 7, 12993–13002.
[19]Attri, I., & Dutta, D. M. (2019). Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods. International Journal of Innovative Technology and Exploring Engineering, 8(9), 1383–1388.
[20]Ouerhani, N., Maalel, A., Ghézala, H. B., & Chouri, S. (2020). Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep learning Sentiment Analysis Model during and after quarantine. Research Square.
[21]Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014b). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).
[22]Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv: Neural and Evolutionary Computing.
[23]Zhao, R., Wang, D., Yan, R., Mao, K., Shen, F., & Wang, J. (2018). Machine HealthMonitoring Using Local Feature-Based Gated Recurrent Unit Networks. IEEE Transactionson Industrial Electronics, 65(2), 1539–1548.
[24]Yang, S., Yu, X., & Zhou, Y. (2020). LSTM and GRU Neural Network Performance Comparison Study: Taking Yelp Review Dataset as an Example. 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI).
[25]Arai, K., Bhatia, R., & Kapoor, S. (Eds.). (2019). Proceedings of the Future Technologies Conference (FTC) 2018. Advances in Intelligent Systems and Computing.
[26]Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., & Zhou, M. (2017). SuperAgent: A Customer Service Chatbot for E-commerce Websites. Proceedings of ACL 2017, System Demonstrations.
[27]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. Neural Information Processing Systems, 30, 5998–6008.
[28]Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. Computer Vision – ECCV 2020, 213–229.
[29]Zeyer, A., Bahar, P., Irie, K., Schluter, R., & Ney, H. (2019). A Comparison of Transformer and LSTM Encoder Decoder Models for ASR. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30]Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. International Conference on Learning Representations.
[31]Sutskever, I., Vinyals, O., & V. Le, Q. (2014). Sequence to Sequence Learning with Neural Networks. Cornell University - ArXiv.
[32]Kaiser, U., & Sutskever, I. (2016). Neural GPUs Learn Algorithms. International Conference on Learning Representations.
[33]Kalchbrenner, N., Espeholt, L., Simonyan, K., Van Den Oord, A., Graves, A., & Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. ArXiv: Computation and Language.
[34]Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv: Computation and Language.
[35]Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015 IEEE International Conference on Computer Vision (ICCV).
[36]Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Transformer models for text-based emotion detection: a review of BERT-based approaches. Artificial Intelligence Review, 54(8), 5789–5829.
[37]Qudar, M. M. A., & Mago, V. (2020). TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis. ArXiv: Computation and Language.
[38]Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019b). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics.
[39]Mathur, A., & Suchithra, M. (2022). Application of Abstractive Summarization in Multiple Choice Question Generation. 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES).
[40]Hegde, C. V., & Patil, S. (2020). Unsupervised Paraphrase Generation using Pre-trained Language Models. ArXiv: Computation and Language.
[41]Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (n.d.). Language Models are Unsupervised Multitask Learners. models/language_models_are_unsupervised_multitask_learners.pdf
[42]Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv: Computation and Language.
[43]Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ArXiv: Computation and Language.
[44]Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2001). BLEU. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02.
[45]Liu, H., Lin, T., Sun, H., Lin, W., Chang, C., Zhong, T., & Rudnicky, A. I. (2017). RubyStar: A Non-Task-Oriented Mixture Model Dialog System. ArXiv: Computation and Language.
[46]Serban, I. V., Sankar, C., Germain, M., Zhang, S., Lin, Z., Subramanian, S., Kim, T., Pieper, M., Chandar, S., Ke, N. R., Rajeshwar, S., De Brébisson, A., Sotelo, J., Suhubdy, D., Michalski, V., Nguyen, A., Pineau, J., & Bengio, Y. (2017). A Deep Reinforcement Learning Chatbot. ArXiv: Computation and Language.
[47]Adamopoulou, E., & Moussiades, L. (2020). An Overview of Chatbot Technology. IFIP Advances in Information and Communication Technology, 373–383.
[48]Tyen, G., Brenchley, M., Caines, A., & Buttery, P. (n.d.). Towards An Open-Domain Chatbot For Language Practice. Association for Computational Linguistics.
[49]Yin, Z., Chang, K., & Zhang, R. (n.d.). DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks. Proceedings of the 23rdACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017.
[50]Qiu, M., Li, F. L., Wang, S., Gao, X., Chen, Y., Zhao, W., Chen, H., Huang, J., & Chu, W. (2017). AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
[51]Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., & Zhou, M. (2017b). SuperAgent: A Customer Service Chatbot for E-commerce Websites. Proceedings of ACL 2017, System Demonstrations.
[52]Koehler, B. J. (2017, December 1). AhriBot: A Python Bot Written for Discord Tasks.
[53]Liu, H., Lin, T., Sun, H., Lin, W., Chang, C., Zhong, T., & Rudnicky, A. I. (2017b). RubyStar: A Non-Task-Oriented Mixture Model Dialog System. ArXiv: Computation and Language.
[54]Tiong, R. L., & Alum, J. (1997). Evaluation of proposals for BOT projects. International Journal of Project Management, 15(2), 67–72.
[55]Rick, S. R., Goldberg, A. P., & Weibel, N. (2019). SleepBot. Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion.
[56]Epstein, J., & Klinkenberg, W. (2001). From Eliza to Internet: a brief history of computerized assessment. Computers in Human Behavior, 17(3), 295–314.
[57]Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36–45.
[58]Jurafsky, D., & Martin, J. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall EBooks.
[59]Wallace, R. S. (2007). The Anatomy of A.L.I.C.E. Parsing the Turing Test, 181–210.
[60]Bao, Q., Ni, L., & Liu, J. (2020). HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention. Proceedings of the Australasian Computer Science Week Multiconference.
[61]Sabharwal, N., & Agrawal, A. (2020). Introduction to Google Dialogflow. Cognitive Virtual Assistants Using Google Dialogflow, 13–54.
[62]Samuel, I., Ogunkeye, F. A., Olajube, A., & Awelewa, A. (2020, November). Development of a voice chatbot for payment using amazon lex service with eyowo as the payment platform. In 2020 International Conference on Decision Aid Sciences and Application (DASA) (pp. 104-108). IEEE.