Internet Traffic Classification for Educational Institutions Using Machine Learning

Full Text (PDF, 489KB), PP.37-45

Views: 0 Downloads: 0


Jaspreet Kaur 1,* Sunil Agrawal 1 B.S.Sohi 2

1. University Institute of Engineering & Technology, Panjab University, Chandigarh (India)-160014

2. Chandigarh Group of Colleges, Gharuan, Punjab (India)

* Corresponding author.


Received: 9 Aug. 2011 / Revised: 10 Jan. 2012 / Accepted: 16 Mar. 2012 / Published: 8 Jul. 2012

Index Terms

Internet traffic classification, Educational websites, Non-educational websites, Proxy websites, Machine Learning, Features


In recent times machine learning algorithms are used for internet traffic classification. The infinite number of websites in the internet world can be classified into different categories in different ways. In educational institutions, these websites can be classified into two categories, educational websites and non-educational websites. Educational websites are used to acquire knowledge, to explore educational topics while the non-educational websites are used for entertainment and to keep in touch with people. In case of blocking these non-educational websites students use proxy websites to unblock them. Therefore, in educational institutes for the optimum use of network resources the use of non-educational and proxy websites should be banned. In this paper, we use five ML classifiers Naïve Bayes, RBF, C4.5, MLP and Bayes Net to classify the educational and non-educational websites. Results show that Bayes Net gives best performance in both full feature and reduced feature data sets for intended classification of internet traffic in terms of classification accuracy, recall and precision values as compared to other classifiers.

Cite This Paper

Jaspreet Kaur, Sunil Agrawal, B.S.Sohi, "Internet Traffic Classification for Educational Institutions Using Machine Learning", International Journal of Intelligent Systems and Applications(IJISA), vol.4, no.8, pp.37-45, 2012. DOI:10.5815/ijisa.2012.08.05



[2]Thuy T.T. Nguyen and Grenville Armitage. (Fourth Quarter 2008). A Survey of Techniques for Internet Traffic Classification using Machine Learning. IEEE Communications Survey & tutorials, vol. 10, no. 4, pp. 56-76.

[3]Runyuan Sun, Bo Yang, Lizhi Peng, Zhenxiang Chen, Lei Zhang, and Shan Jing. (2010). Traffic Classification Using Probabilistic Neural Network. In Sixth International Conference on Natural Computation (ICNC 2010), pp. 1914-1919.

[4]Andrew W. Moore, Denis Zuev and Michael L. Crogan. (August 2005). Discriminators for use in flow-based classification. Queen Mary University of London, Department of Computer Science, RR-05-13, ISSN 1470-5559.

[5]Ian H, Witten and Eibe Frank. (2005) Data Mining: Practical Machine Learning Tools and Techniques, 2th edition, Morgan Kaufmann Publishers, San Francisco, CA.

[6]Murat Soysal and Ece Guran Schmidt. (2010). Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison. Performance Evaluation Elsevier Journal, Vol. 67, pp. 451-467.

[7]Kuldeep Singh and Sunil Agrawal. (2011). Internet Traffic Classification using RBF Neural Network. In Proceedings of International Conference on Communication and Computing technologies (ICCCT-2011), (Jalandhar, Punjab, India) 39-43.

[8]Shijun Huang Kai Chen Chao Liu, Alei Liang and Haibing Guan. (2009). A Statistical-Feature-Based Approach to Internet Traffic Classification Using Machine Learning. ©2009 IEEE 9781-4244-3941-6/09/$25.00

[9]Kuldeep Singh and Sunil Agrawal. (2011). Comparative Analysis of five Machine Learning Algorithms for IP Traffic Classification. International Conference on Emerging Trends in Networks and Computing Communications (ENCTT-2011), Udaipur, Rajasthan, India.

[10]S. Agrawal and B. S. Sohi. (2011). Generalization and Optimization of Feature Set for Accurate Identification of P2P Traffic in the Internet using Neural Network. WSEAS TRANSACTIONS on COMMUNICATIONS.

[11]Weka website (2011) z/ml/weka/

[12]Jie Cheng and Russell Greiner. Learning Bayesian Belief Network Classifiers: Algorithms and System. Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.

[13]Ioan Pop. (2006). An approach of the Naive Bayes classifier for the document classification. General Mathematics, Vol. 14, No. 4, pp.135-138.

[14]Y.L. Chong and K. Sundaraj. (2009). A Study of Back Propagation and Radial Basis Neural Networks on ECG signal classification. In 6th International Symposium on Mechatronics and its Applications (ISMA09), (Sharjah, UAE).

[15]Simon Haykin. (2005) Neural Networks: A Comprehensive foundation, 2th edition, Pearson Prentice Hall, New Delhi.

[16]Thales Sehn Korting. C4.5 algorithm and Multivariate Decision Trees, Image Processing Division, National Institute for Space Research – INPE, SP, Brazil.

[17]N. Williams, S. Zander and G. Armitage. (2006). A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification. ACM SIGCOMM Computer Communication Review, vol. 36, pp. 7-15.

[18]M. Dash and H.Liu. (2003). Consistency-based Search in Feature Selection. Artificial intelligence, vol. 151, pp. 155-176.

[19]M. Hall. (1998). Correlation-based Feature Selection for Machine Learning. PHD Thesis, Deptt of Computer Science, Waikato University, Hamilton, New Zealand.

[20]Wireshark. Available: http://