Automatic Ethical Filtering using Semantic Vectors Creating Normative Tag Cloud from Big Data

Full Text (PDF, 463KB), PP.17-25

Views: 0 Downloads: 0


Ahsan N. Khan 1,*

1. Teradata Corporation, Lahore Office, Pakistan

* Corresponding author.


Received: 20 Aug. 2014 / Revised: 14 Nov. 2014 / Accepted: 6 Dec. 2014 / Published: 8 Mar. 2015

Index Terms

Semantic Vectors, Censorship, Distributional Semantics, Normative Systems, Tag Cloud


Ethical filtering has been a painful and controversial issue seen by different angles worldwide. Stalwarts for freedom find newer methods to circumvent banned URLs while generative power of the Internet outpaces velocity of censorship. Hence, keeping online content safe from anti-religious and sexually provocative content is a growing issue in conservative countries in Asia and The Middle East. Solutions for online ethical filters are linearly upper bound given computation and big data growth scales. In this scenario, Semantic Vectors are applied as automatic ethical filters to calculate accuracy and efficiency metrics. The results show a normative tag cloud generated with superior performance to industry solutions.

Cite This Paper

Ahsan N. Khan, "Automatic Ethical Filtering using Semantic Vectors Creating Normative Tag Cloud from Big Data", International Journal of Intelligent Systems and Applications(IJISA), vol.7, no.4, pp.17-25, 2015. DOI:10.5815/ijisa.2015.04.03


[1]Z. Nabi. “The Anatomy of Web Censorship in Pakistan”, arXiv:1307.1144v1 [cs.CY] (2013).
[2]A. Chaabane, T. Chen, M. Cunche, ED. Christofaro, A. Friedman, M.A. Kaafar. “Censorship in the Wild: Analyzing Internet Filtering in Syria”, arXiv:1402.3401v3 [cs.CY], (2014).
[3]E Veer. "Staring: how Facebook facilitates the breaking of social norms." Research in Consumer Behavior 13, 185-198, (2011).
[4]R.J. Diebert, J.G. Palfrey, R. Rohozinsky, J. Zittrain. Access Denied: The Practice and Policy of Global Internet Filtering, The MIT Press, ISBN-10:0-262-54196-3, ISBN-13:978-0-262-54196-1, (2008).
[5]R.J. Deibert, J.G. Palfrey, R. Rohozinsky, J. Zittrain. Access Controlled: The Shaping of Power, Rights and Rule in Cyberspace, The MIT Press. ISBN: 9780262514354, (2010)
[6]Internet Content Filtering and Blocking: Electronic Frontiers Australia, (2006)
[7]D. Widdows, T. Cohen. “The Semantic Vectors Package: New Algorithms and Public Tools for Distributional Semantics”, Fourth IEEE International Conference on Semantic Computing (IEEE ICSC2010), (2010).
[8]T. Bohne, S. Rönnau, and U.M. Borghoff. Efficient keyword extraction for meaningful document perception. In Proceedings of the 11th ACM symposium on Document engineering (DocEng '11). ACM, New York, NY, USA, 185-194. DOI=10.1145/2034691.2034732, (2011)
[9]L. Lessig. Code and Other Laws of Cyberspace. New York: Basic Books, (2006).
[10]J. Zittrain. The Future of The Internet and How to Stop It. Yale University Press. ISBN 978-0-300-15124-4, (2008)
[11]L.M. Shaikh, S. Sarfraz, A.N. Khan. “PsycheTagger: Using Hidden Markov Model to annotate English Text with semantic tags based on emotive content”. In Proceedings: AIKED’12 Proceedings of the 11th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Pages 219-224, World Scientific and Engineering Academy and Society (WSEAS), Steven Point, Wisconsin, USA, ISBN: 978-1-61804-068-8, (2012).
[12]A.N. Khan, M. Aslam, A.M. Enríquez. “Mining for Norms in Clouds: Complying to Ethical Communication through Cloud Text Data Mining”, in Proceedings of Fifth IEEE UCC, IEEE Xplore, Print ISBN: 978-1-4673-4432-6, (2012).
[13]K. Govinda, P.K. Abaru, G.P. Reddy. “On-Demand Secure Streaming of Multimedia Data over Cloud”, International Journal of Engineering and Technology, Vol 5 No 3, ISSN: 0975-4024 (2013)
[14]T. Zhu, D. Phipps, A. Pridgen, J.R. Crandall, and D.S. Wallach. The velocity of censorship: high-fidelity detection of microblog post deletions. In Proceedings of the 22nd USENIX conference on Security (SEC'13). USENIX Association, Berkeley, CA, USA, 227-240, (2013).
[15]J. McLachlan and N. Hopper. “On the Risks of Serving Whenever You Surf: Vulnerabilities in Tor’s Blocking Resistance Design”. In WPES, (2009).
[16]M. Dusi, M. Crotti, F. Gringoli, and L. Salgarelli. “Tunnel Hunter: Detecting Application-layer Tunnels with Statistical Fingerprinting.” Computer Networks, 53(1):81–97, (2009).
[17]C. Leberknight, M. Chiang, H. Poor, and F. Wong. “A Taxonomy of Internet Censorship and Anti-censorship”, (2012).
[18]S.H. Jan. “Internet Filtering Software Tests”, San José Public Library, Revised Report, (2008).
[19]H. P. Luhn. The automatic creation of literature abstracts. IBM J. Res. Dev. 2, 2, 159–165, (1958).
[20]G. Salton, C. Buckley. “Term-weighting approaches in automatic text retrieval”, Inf. Process. Manage. 24, 5, 513–523, (1988).
[21]P. Song, A. Shu, A. Zhou, D. S. Wallach, J. R. Crandall. A pointillism approach for natural language processing of social media. In IEEE International Conference on Natural Language Processing and Knowledge Engineering, arXiv:1206.4958v1 [cs.IR], (2012).
[22]S. Nazirova. Anti-Spam Software for Detecting Information Attacks, I.J. Intelligent Systems and Applications, 10, 25-34, (2012).