An Optimization-Based Framework for Feature Selection and Parameters Determination of SVMs

Full Text (PDF, 489KB), PP.1-9

Views: 0 Downloads: 0


Seyyid Ahmed Medjahed 1,* Mohammed Ouali 2 Tamazouzt Ait Saadi 3 Abdelkader Benyettou 1

1. University of Sciences and Technology Mohamed Boudiaf USTO-MB, Faculty of Mathematics and Computer Science, Oran, 31000, Algeria

2. Department of Computer Science, College of Computers and Information Technology, Taif University, KSA

3. University of Have, Havre, 76600, France

* Corresponding author.


Received: 3 Aug. 2014 / Revised: 6 Dec. 2014 / Accepted: 11 Feb. 2015 / Published: 8 Apr. 2015

Index Terms

Feature Selection, Parameter Determination, Learning Set Selection, Support Vector Machine, Simulated Annealing


In this paper, feature selection and parameters determination in SVM are cast as an energy minimization procedure. The problem of feature selection and parameters determination is a very difficult problem where the number of feature is very large and where the features are highly correlated. We define the problem of feature selection and parameters determination in SVM as a combinatorial problem and we use a stochastic method that, theoretically, guarantees to reach the global optimum. Several public datasets are employed to evaluate the performance of our approach. Also, we propose to use the DNA Microarray Datasets which are characterized by the large number of features. To validate our approach, we apply it to image classification. The feature descriptors of the images were extracted by using the Pyramid Histogram of Oriented Gradients. The proposed approach was compared with twenty feature selection methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of other approaches.

Cite This Paper

Seyyid Ahmed Medjahed, Mohammed Ouali, Tamazouzt Ait Saadi, Abdelkader Benyettou, "An Optimization-Based Framework for Feature Selection and Parameters Determination of SVMs", International Journal of Information Technology and Computer Science(IJITCS), vol.7, no.5, pp.1-9, 2015. DOI:10.5815/ijitcs.2015.05.01


[1]C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,” vol. 20, n. 3, pp. 273-279, 1995.

[2]M. Rychetsky, “Algorithms and architectures for machine learning based on regularized neural networks and support vector approaches,” Shaker Verlag GmBH, Germany, 2001.

[3]J. Shawe-Taylor and N. Cristianini, “Kernel Methods for Pattern Analysis,” Cambridge University Press, 2004.

[4]J. Wang, X. Wu, and C. Zhang., “Support vector machines based on k-means clustering for real-time business intelligence systems,” Int. J. Business Intell. Data Mining, vol. 1, n. 1, pp. 54-65, 2005.

[5]C.-W. Hsu, C. C. Chang, and C. J. Lin, “A practical guide to support vector classification,” Technical Report, University of National Taiwan, Department of Computer Science and Information Engineering, 2003.

[6]P. F. Pai and W. C. Hong, “Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms,” Electric Power Syst. Res., vol. 74, n. 3, pp. 417-425, 2005.

[7]P. F. Pai and W. C. Hong, “Support vector machines with simulated annealing algorithms in electricity load forecasting,” Energy Conversion Manage, vol. 46, n. 17, pp. 2669-2688, 2005.

[8]P. F. Pai and W. C. Hong, “Software reliability forecasting by support vector machines with simulated annealing algorithms,” J. Syst. Softw., vol. 79, n. 6, pp. 747-755, 2006.

[9]Y. Ren and G. Bai, “Determination of optimal svm parameters by using ga/pso,” Journal of Computers, vol. 5, n.8, pp. 1160-1168, 2010.

[10]J. Sartakhti, M. Zangooei, and K. Mozafari, “Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (svm-sa),” Computer Methods and Programs in Biomedicine, vol. 108, n. 2, pp. 570–579, 2012.

[11]R. Parimala and R. Nallaswamy, “Feature Selection using a Novel Particle Swarm Optimization and It’s Variants,” International Journal of Information Technology and Computer Science, vol. 4, n. 5, pp. 16-24, 2012.

[12]B. Izadi, B. Ranjbarian, S. Ketabi and F. N. Mofakham, “Performance Analysis of Classification Methods and Alternative Linear Programming Integrated with Fuzzy Delphi Feature Selection,” International Journal of Information Technology and Computer Science, vol. 5, n. 10, pp. 9-20, 2013.

[13]S. Goswami and A. Chakrabarti, “Feature Selection: A Practitioner View,” International Journal of Information Technology and Computer Science, vol. 6, n. 11, pp. 66-77, 2014.

[14]I. Lavy and A. Yadin, “Support Vector Machine as Feature Selection Method in Classifier Ensembles,” International Journal of Modern Education and Computer Science, vol. 6, n. 3, pp. 1-10, 2014.

[15]R.-C. Chen and C.-H. Hsieh, “Web page classification based on a support vector machine using a weighed vote schema,” Expert Syst. Appl., vol.31, n. 2, pp.427-435, 2006.

[16]C. Gold and P. Sollich, “Bayesian approach to feature selection and parameter tuning for support vector machine classifiers,” Neural Netw, vol. 18, n, 5-6, pp. 693-701, 2005.

[17]O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Mach. Learn., vol. 46, n. 1-3, pp.131-159, 2002.

[18]R. Kohavi and G. John, “Wrappers for feature subset selection,” Artif. Intell., vol. 97, n. 1-2, pp. 273-324, 1997.

[19]T. Shon, Y. Kim, C. Lee, and J. Moon, “A machine learning framework for network anomaly detection using svm and ga,” Proceedings of the IEEEWorkshop on Information Assurance and Security, vol. 2, 2005.

[20]O. Amayri and N. Bouguila, “On online high-dimensional spherical data clustering and feature selection,” Eng. Appl. of AI, vol. 26, n. 4, pp. 1386-1398, 2013.

[21]N. Bouguila and D. Ziou, “A countably infinite mixture model for clustering and feature selection,” Knowl. Inf. Syst., vol. 33, n. 2, pp. 351-370, 2012.

[22]W. Fan and N. Bouguila, “Variational learning of a dirichlet process of generalized dirichlet distributions for simultaneous clustering and feature selection,” Pattern Recognition, vol. 46, n. 10, pp. 2754-2769, 2013.

[23]T. Elguebaly and N. Bouguila, “Simultaneous bayesian clustering and feature selection using rjmcmc-based learning of finite generalized dirichlet mixture models,” Signal Processing, vol. 93, n. 6, pp.1531-1546, 2013.

[24]W. Fan, N. Bouguila, and H. Sallay, “Anomaly intrusion detection using incremental learning of an infinite mixture model with feature selection,” RSKT 2013, 2013.

[25]W. Fan and N. Bouguila, “Online learning of a dirichlet process mixture of generalized dirichlet distributions for simultaneous clustering and localized feature selection,” Journal of Machine Learning Research, vol.25, 2012. 

[26]T. Bdiri and N. Bouguila, “Bayesian learning of inverted dirichlet mixtures for svm kernels generation,” Neural Computing and Applications, vol. 23, n. 5, pp. 1443-1458, 2013.

[27]N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” Journal of Chemical Physics, vol. 21, n. 6, 1953.

[28]S. Kirkpatrick, J. C. D. Gelatt, and M. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, 1983.

[29]H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minr edundancy,” IEEE Transactions on pattern analysis and machine intelligence, vol. 27, n.8, pp. 1226–1238, 2005.

[30]G. Brown, A. Pocock, M.-J. Zhao, and M. Luj´an, “Conditional likelihood maximisation: A unifying framework for information theoretic feature selection,” The Journal of Machine Learning Research, vol. 13, pp. 27-66, 2012.

[31]F. Fleuret. Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 2004, 5:1531–1555.

[32]H. Yang and J. Moody, “Feature selection based on joint mutual information,” In Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis, pp. 22–25, 1999.

[33]P. Meyer and G. Bontempi, “On the use of variable complementarity for feature selection in cancer classification,” Applications of Evolutionary Computing, vol. 3907, pp. 91–102, 2006.

[34]L. Dahua and T. Xiaoou, “Conditional infomax learning: An integrated framework for feature extraction and fusion,” Computer Vision ECCV, vol. 3951, pp. 68–82, 2006.

[35]J. A. “Machine learning based on attribute interactions. Fakulteta za racunalniˇstvo in informatiko,” Univerza v Ljubljani, 2005.

[36]R. Battiti, :Using mutual information for selecting features in supervised neural net learning,” IEEE Transactions on Neural Networks, vol. 5, n. 4, pp. 537 –550, 1994.

[37]D. D. Lewis, “Feature selection and feature extraction for text categorization,” In Proceedings of Speech and Natural Language Workshop. Morgan Kaufmann, pp. 212–217, 1992.

[38]L. Yu and H. Liu, “Efficient feature selection via analysis of relevance and redundancy” Journal of Machine Learning Research, vol. 5, pp.1205–1224, 2004.

[39]Z. Zhao, F. Morstatter, S. Sharma, S. Alelyani, A. Anaud, and H. Liu, “Advancing feature selection research-asu feature selection repository,” technical report, 2010.

[40]Z. H. Cheng Q. and C. J. “The fisher-markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data,” IEEE Tran Pattern Anal Mch Intell, vol. 33, n. 6, pp. 1217–1233, 2011.

[41]H. Liu and Z. Zhao, “Spectral feature selection for supervised and unsupervised learning,” Proceedings of the 24th International Conference on Machine Learning, 2007.

[42]L. J. Wei, “Asymptotic conservativeness and efficiency of kruskal-wallis test for k dependent samples,” Journal of the American Statistical Association, vol. 76, n. 376, pp. 1006–1009, 1981.

[43]R. Duda, P. Hart, and D. Stork, “Pattern Classification,” 2nd ed. John Wiley & Sons, New York, 2001.

[44]T. M. Cover and J. A. Thomas, “Elements of Information Theory,” Wiley, 1991.

[45]I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, vol.46, n. 1-3, pp. 389-422, 2002.

[46]S. Shah and A. Kusiak, “Cancer gene search with data-mining and genetic algorithms,” Computers in Biology and Medicine, 2002.

[47]R. Mallika and V. Saravanan, “An svm based classification method for cancer data using minimum microarray gene expressions,” World Academy of Science, Engineering and Technology, 2010.

[48]A. E., Garcia-Ni, eto J., L. Jourdan, and E. Talbi, “Gene selection in cancer classification using pso/svm and ga/svm hybrid algorithms,” Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, 2007.

[49]T. Muchenxuan, L. Kun-Hong, X. Chungaui, and J. Wenbin, “An ensemble of svm classifiers based on gene pairs,” Computers in Biology and Medecine, 2013.

[50]A. Tan, D. Naiman, L. Xu, R. Winslow, and D. Geman, “Simple decision rules for classifying human cancers from gene expression profiles,” Bioinformatics, vol. 21, 2005.

[51]L. Dehua, Q. Hui, D. Guang, and Z. Zhihua, “An iterative svm approach to feature selection and classification in high-dimensional datasets,” Pattern Recogn., vol. 46, n. 9, pp. 2531–2537, 2013.

[52]Y. Lia, G. Wanga, H. Chend, L. Shia, and L. Qina, “An ant colony optimization based dimension reduction method for high-dimensional datasets,” Journal of Bionic Engineering, vol. 10, n. 2, pp. 231–241, 2013.

[53]L. Y, W. G, and C. H, “An improved particle swarm optimization for feature selection,” Journal of Bionic Engineering, vol. 8, pp.191–200, 2011.

[54]L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” In IEEE CVPR Workshop of Generative Model Based Vision, 2004.

[55]B. Anna, Z. Andrew, and M. Xavier, “Representing shape with a spatial pyramid kernel,” CIVR ’07 Proceedings of the 6th ACM international conference on Image and video retrieval, 2007.