Feature Selection based Breast Cancer Prediction

Full Text (PDF, 738KB), PP.13-23

Views: 0 Downloads: 0


Rakibul Hasan 1 A. S. M. Shafi 2,*

1. Department of Computer Science and Engineering, Khwaja Yunus Ali University (KYAU), Enayetpur, Sirajganj-6751, Bangladesh

2. Department of Computer Science and Engineering, University of Information Technology & Sciences (UITS), Baridhara, Dhaka-1212, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2023.02.02

Received: 23 Jun. 2022 / Revised: 2 Aug. 2022 / Accepted: 13 Jan. 2023 / Published: 8 Apr. 2023

Index Terms

Breast Cancer Prediction, Machine Learning, Feature Selection, Classification


Breast cancer is one of the main causes of mortality for women around the world. Such mortality rate could be reduced if it is possible to diagnose breast cancer at the primary stage. It is hard to determine the causes of this disease that may lead to the development of breast cancer. But it is still important in predicting the probability of cancer. We can assess the likelihood of occurrence of breast cancer using machine learning algorithms and routine diagnosis data. Although a variety of patient information attributes are stored in cancer datasets not all of the attributes are important in predicting cancer. In such situations, feature selection approaches can be applied to keep the pertinent feature set. In this research, a comprehensive analysis of Machine Learning (ML) classification algorithms with and without feature selection on Wisconsin Breast Cancer Original (WBCO), Wisconsin Diagnosis Breast Cancer (WDBC), and Wisconsin Prognosis Breast Cancer (WPBC) datasets is performed for breast cancer prediction. We employed wrapper-based feature selection and three different classifiers Logistic Regression (LR), Linear Support Vector Machine (LSVM), and Quadratic Support Vector Machine (QSVM) for breast cancer prediction. Based on experimental results, it is shown that the LR classifier with feature selection performs significantly better with an accuracy of 97.1% and 83.5% on WBCO and WPBC datasets respectively. On WDBC datasets, the result reveals that the QSVM classifier without feature selection achieved an accuracy of 97.9% and these results outperform the existing methods.

Cite This Paper

Rakibul Hasan, A. S. M. Shafi, "Feature Selection based Breast Cancer Prediction", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.15, No.2, pp. 13-23, 2023. DOI:10.5815/ijigsp.2023.02.02


[1]World Health Organization (WHO). Global Health Estimates 2020: Deaths by Cause, Age, Sex, by Country and by Region, 2000-2019.
[2]Sung, H., Ferlay, J., Siegel, RL., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F. (2020). Global cancer statistics: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 71: 209- 249. https://doi.org/10.3322/caac.21660.
[3]Elmore, JG., Wells, CK., Lee, CH., Howard, DH., Feinstein, AR. (1994). Variability in radiologists' interpretations of mammograms, N Engl J Med. 331:1493-1499.
[4]Vimpeli, SM., Saarenmaa, I., Huhtala, H., Soimakallio, S. (2008). Large-core needle biopsy versus fine-needle aspiration biopsy in solid breast lesions: comparison of costs and diagnostic value, Acta Radiol. 49(8):863-9. doi: 10.1080/02841850802235751. PMID: 18618302.
[5]Zhang, Y.D., Satapathy, S.C., Guttery, D.S., Gorriz, J.M., Wang, S.H. (2021). Improved breast cancer classification through combining graph convolutional network and convolutional neural network, Inf. Process. Manag. 58, 102439.
[6]Zhang, Y.D., Pan, C., Chen, X., Wang, F. (2018). Abnormal breast identification by nine-layer convolutional neural network with parametric rectified linear unit and rank-based stochastic pooling, J. Comput. Sci. 27, 57–68.
[7]Chandrashekar, G., Sahin, F. (2014). A survey on feature selection methods. Comput. Electr. Eng, 40, 16–28.
[8]Saeys, Y., Inza, I., Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics. 23, 2507–2517.
[9]Babiker, M., Karaarslan, E., Hoscan, Y. (2019). A hybrid feature-selection approach for finding the digital evidence of web application attacks, Turkish J. Electr. Eng. Comput. Sci., 27, 4102-4117.
[10]Bayrak, E.A., Kırcı, P., Ensari, T. (2019). Comparison of Machine Learning Methods for Breast Cancer Diagnosis. Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, pp. 1-3, doi: 10.1109/EBBT.2019.8741990.
[11]Sakri, S.B., Abdul Rashid N.B., Muhammad Zain, Z. (2018). Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction, IEEE Access, vol. 6, pp. 29637-29647. doi: 10.1109/ACCESS.2018.2843443.
[12]Alghunaim, S., Al-Baity, H.H. (2019). On the scalability of machine-learning algorithms for breast cancer prediction in big data context, IEEE Access, vol. 7, pp. 91535-91546.
[13]Mekha, P., Teeyasuksaet, N. (2019). Deep Learning Algorithms for Predicting Breast Cancer Based on Tumor Cells. Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), pp. 343-346, doi: 10.1109/ECTI-NCON.2019.8692297.
[14]Azmi, MSBM., and Cob, Z.C. (2010). Breast Cancer prediction based on Backpropagation Algorithm. IEEE Student Conference on Research and Development (SCOReD), pp. 164-168, doi: 10.1109/SCORED.2010.5703994.
[15]Alshouiliy, K., Shivanna, A., Ray, S., AlGhamdi, A., Agrawal, D.P. (2019). Analysis and Prediction of Breast Cancer using AzureML Platform. IEEE 10th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 0212-0218, doi: 10.1109/IEMCON.2019.8936294.
[16]Basunia, M.R., Pervin, I.A., Al Mahmud, M., Saha, S., Arifuzzaman, M. (2020). On Predicting and Analyzing Breast Cancer using Data Mining Approach. IEEE Region 10 Symposium (TENSYMP), pp. 1257-1260, doi: 10.1109/TENSYMP50017.2020.9230871.
[17]Kaya, S., Yağanoğlu, M. (2020). An Example of Performance Comparison of Supervised Machine Learning Algorithms Before and After PCA and LDA Application: Breast Cancer Detection. Innovations in Intelligent Systems and Applications Conference (ASYU), pp. 1-6, doi: 10.1109/ASYU50717.2020.9259883.
[18]Ray, S, AlGhamdi, A., Alshouiliy, K., Agrawal, D.P. (2020). Selecting Features for Breast Cancer Analysis and Prediction. International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 1-6, doi: 10.1109/ICACCE49060.2020.9154919.
[19]Pritom, A.I., Munshi, M.A.R., Sabab, S.A., and Shihab, S. (2016). Predicting breast cancer recurrence using effective classification and feature selection technique. 19th International Conference on Computer and Information Technology (ICCIT), pp. 310-314, doi: 10.1109/ICCITECHN.2016.7860215.
[20]Pawlovsky, A. P., and Nagahashi, M. (2014). A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 189-192, doi: 10.1109/BHI.2014.6864336.
[21]Aalaei, Sh., Shahraki, H., Rowhanimanesh, AR., Eslami, S. (2016). Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets, Iran J Basic Med Sci; 19:476-482.
[22]Chaurasia, V., Pal, S., Tiwari, B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 119-126. doi:10.1177/1748301818756225.
[23]Banu, AB., Subramanian, PT. (2018). Comparison of Bayes classifiers for breast cancer classification. Asian Pac J Cancer Prev (APJCP). 19(10):2917–20.
[24]Huang, MW., Chen, CW., Lin, WC., Ki, SW., Tsai, CF. (2017). SVM and SVM ensembles in breast cancer prediction, PLoS One, 12:1–14.
[25]UCI Breast Cancer Wisconsin (Diagnostic) Dataset, https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/, Last Access: 12.04.2021.
[26]Nadkarni, P. (2016). Core Technologies: Machine Learning and Natural Language Processing, Clinical Research Computing, Academic Press, Pages 85-114, ISBN 9780128031308, https://doi.org/10.1016/B978-0-12-803130-8.00004-X.
[27]Tran, H. (2019). A survey of machine learning and data mining techniques used in multimedia system, Dept. Comput. Sci., Univ. Texas Dallas Richardson, Richardson, TX, USA, Tech. Rep.
[28]Zhang, Y.-D., Wu, L. (2012). An MR brain images classifier via principal component analysis and kernel support vector machine. Prog. Electromagn. Res. 2012, 130, 369–388.
[29]Attallah, O., Sharkas, M. A., & Gadelkarim, H. (2020). Deep learning techniques for automatic detection of embryonic neurodevelopmental disorders. Diagnostics, 10(1), 27.