Data Analysis and Visualization of Continental Cancer Situation by Twitter Scraping

Full Text (PDF, 798KB), PP.23-31

Views: 0 Downloads: 0


Md. Hosne Al Walid 1,* D. M. Anisuzzaman 1 A. F. M. Saifuddin Saif 2

1. Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Dhaka, Bangladesh

2. Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh

* Corresponding author.


Received: 29 Apr. 2019 / Revised: 15 May 2019 / Accepted: 23 May 2019 / Published: 8 Jul. 2019

Index Terms

Data Analysis, Data Visualization, Cancer Situation, and Twitter Analysis


With the advent of user-generated content, usability, and interoperability of web platforms, people are today more eager to express and share their opinions on the web regarding both daily activities and global issues. Cancer is often undetected, leading to serious issues which continue to affect a person's life and his surroundings. Recently Twitter has been very popular to be used to predict and monitor real-world outcomes as well as health-related concerns. Nowadays people are using social media in any situation. Even cancer patients, their friends, and family are increasingly sharing their experience in social media, which has increased the ability of patients to find others similar to their conditions to discuss treatment options, suggest lifestyle changes, and to offer support. Our work targets to link patients with a particular illness (cancer) together and to provide researchers with enriched patient data that might be very useful for future analysis of this disease. We wanted to create a meeting point for the healthcare sector and social media through our work. Our target was to collect Twitter data from different continents of the world and analyze them. We scraped tweets from over the last two years from all around the world. Then clean the data using a regular expression and then process it to prepare our own dataset. We used sentiment analysis and natural language processing to classify them into positive, negative and neutral tweets to determine which of the tweet means to have cancer and which don't. We then analyzed the prepared dataset and visualized and compared them with veritable cancer-related information to ascertain if people's tweets are allied with actual cancer situation.

Cite This Paper

Md. Hosne Al Walid, D. M. Anisuzzaman, A. F. M. Saifuddin Saif, "Data Analysis and Visualization of Continental Cancer Situation by Twitter Scraping", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.7, pp. 23-31, 2019.DOI: 10.5815/ijmecs.2019.07.03


[1]Eysenbach, G.: Infodemiology and Infoveillance. Am. J. Prev. Med. 40(5), S154– S158 (2011).
[2]Gunther Eysenbach, MD, MPH, Centre for Global Health Innovation, Consumer & Public Health Informatics Lab, University Health Network, 190 Elizabeth Street, Toronto M5G 2C4 Canada.
[3]Achrekar, Harshavardhan, Avinash Gandhe, Ross Lazarus, Ssu-Hsin Yu, and Benyuan Liu. "Predicting flu trends using Twitter data." In Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on, pp. 702-707. IEEE, 2011.
[4]Crannell, W. Christian, Eric Clark, Chris Jones, Ted A. James, and Jesse Moore. "A pattern-matched Twitter analysis of US cancer-patient sentiments." journal of surgical research 206, no. 2 (2016): 536-542.
[5]Wong, Kai O., Faith G. Davis, Osmar R. Zaïane, and Yutaka Yasui. "Sentiment Analysis of Breast Cancer Screening in the United States using Twitter." In KDIR, pp. 265-274. 2016.
[6]Cataldi, Mario, Luigi Di Caro, and Claudio Schifanella. "Emerging topic detection on Twitter based on temporal and social terms evaluation." In Proceedings of the tenth international workshop on multimedia data mining, p. 4. ACM, 2010.
[7]Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo. "Tweet analysis for real-time event detection and earthquake reporting system development." IEEE Transactions on Knowledge and Data Engineering 25, no. 4 (2013): 919-931.
[8]Azam, Nausheen, Muhammad Abulaish, and Nur Al-Hasan Haldar. "Twitter data mining for events classification and analysis." In Soft Computing and Machine Intelligence (ISCMI), 2015 Second International Conference on, pp. 79-83. IEEE, 2015.
[9]Chung, Jessica Elan, and Eni Mustafaraj. "Can collective sentiment expressed on twitter predict political elections?." In AAAI, vol. 11, pp. 1770-1771. 2011.
[10]Bollen, Johan, Huina Mao, and Xiaojun Zeng. "Twitter mood predicts the stock market." Journal of computational science 2, no. 1 (2011): 1-8.
[11]Jain, Saloni, "Real-Time Social Network Data Mining For Predicting The Path For A Disaster." Thesis, Georgia State University, 2015
[12]Zhao, Yanchang. "Analysing twitter data with text mining and social network analysis." In Proceedings of the 11th Australasian Data Mining and Analytics Conference (AusDM 2013), p. 23. 2013.
[13]Tran, Hung Viet, "Discovering entities' behavior through mining Twitter." Ph.D. (Doctor of Philosophy) thesis, University of Iowa, 2012
[14]Ficamos, Pierre, and Yan Liu. "A topic-based approach for sentiment analysis on Twitter data." International Journal of Advanced Computer Science and Applications 7, no. 12 (2016): 201-205.
[15]Gokulakrishnan, Balakrishnan, Pavalanathan Priyanthan, Thiruchittampalam Ragavan, Nadarajah Prasath, and AShehan Perera. "Opinion mining and sentiment analysis on a Twitter data stream." In Advances in ICT for emerging regions (ICTer), 2012 International Conference on, pp. 182-188. IEEE, 2012.
[16]Zaydman, Mikhail. Tweeting About Mental Health: Big Data Text Analysis of Twitter for Public Policy. The Pardee RAND Graduate School, 2017.
[17]Zalak M. patel, Vishal P. Patel, "A Survey on Various Techniques of Sentiment Analysis in Data Mining", International Journal of Engineering Development and Research 2015 Volume 3, Issue 4 | ISSN: 2321-9939.
[18]Boyd, Danah, Scott Golder, and Gilad Lotan. "Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter." In System Sciences (hicss), 2010 43rd Hawaii international conference on, pp. 1-10. IEEE, 2010.
[19]Tumasjan, Andranik, Timm Oliver Sprenger, Philipp G. Sandner, and Isabell M. Welpe. "Predicting elections with Twitter: What 140 characters reveal about political sentiment." Icwsm 10, no. 1 (2010): 178-185.
[20]Backstrom, Lars, Jon Kleinberg, Ravi Kumar, and Jasmine Novak. "Spatial variation in search engine queries." In Proceedings of the 17th international conference on World Wide Web, pp. 357-366. ACM, 2008.
[21]National Cancer Institute: Cancer Statistics. Available at: [Accessed November 15, 2018].
[22]American Cancer Society: Cancer Statistics Report. Available at: [Accessed November 22, 2018].
[23]Wikipedia: Causes of cancer. Available at: [Accessed November 27, 2018].
[24]Cancer Research UK: Together we will beat cancer. Available at: [Accessed December 05, 2018].