Data Analysis and Visualization of Continental Cancer Situation by Twitter Scraping

Md. Hosne Al Walid 1,* D. M. Anisuzzaman 1 A. F. M. Saifuddin Saif 2

1. Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Dhaka, Bangladesh

2. Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh

* Corresponding author.


Received: 29 Apr. 2019 / Revised: 15 May 2019 / Accepted: 23 May 2019 / Published: 8 Jul. 2019

Index Terms

Data Analysis, Data Visualization, Cancer Situation, and Twitter Analysis


With the advent of user-generated content, usability, and interoperability of web platforms, people are today more eager to express and share their opinions on the web regarding both daily activities and global issues. Cancer is often undetected, leading to serious issues which continue to affect a person's life and his surroundings. Recently Twitter has been very popular to be used to predict and monitor real-world outcomes as well as health-related concerns. Nowadays people are using social media in any situation. Even cancer patients, their friends, and family are increasingly sharing their experience in social media, which has increased the ability of patients to find others similar to their conditions to discuss treatment options, suggest lifestyle changes, and to offer support. Our work targets to link patients with a particular illness (cancer) together and to provide researchers with enriched patient data that might be very useful for future analysis of this disease. We wanted to create a meeting point for the healthcare sector and social media through our work. Our target was to collect Twitter data from different continents of the world and analyze them. We scraped tweets from over the last two years from all around the world. Then clean the data using a regular expression and then process it to prepare our own dataset. We used sentiment analysis and natural language processing to classify them into positive, negative and neutral tweets to determine which of the tweet means to have cancer and which don't. We then analyzed the prepared dataset and visualized and compared them with veritable cancer-related information to ascertain if people's tweets are allied with actual cancer situation.

Cite This Paper

Md. Hosne Al Walid, D. M. Anisuzzaman, A. F. M. Saifuddin Saif, "Data Analysis and Visualization of Continental Cancer Situation by Twitter Scraping", International Journal of Modern Education and Computer Science(IJMECS), Vol.11, No.7, pp. 23-31, 2019.DOI: 10.5815/ijmecs.2019.07.03


