Source Code Author Attribution Using Author’s Programming Style and Code Smells

Full Text (PDF, 455KB), PP.27-33

Views: 0 Downloads: 0


Muqaddas Gull 1,* Tehseen Zia 2 Muhammad Ilyas 1

1. University of Sargodha, Sargodha, 40100, Pakistan

2. COMSATS Institute of Information Technology, Islamabad, 44000, Pakistan

* Corresponding author.


Received: 5 Jul. 2016 / Revised: 26 Oct. 2016 / Accepted: 12 Jan. 2017 / Published: 8 May 2017

Index Terms

Authorship, Source Code, Stylistic Feature, Code Smell, Author style


Source code is an intellectual property and using it without author’s permission is a violation of property right. Source code authorship attribution is vital for dealing with software theft, copyright issues and piracies. Characterizing author’s signature for identifying their footprints is the core task of authorship attribution. Different aspects of source code have been considered for characterizing signatures including author’s coding style and programming structure, etc. The objective of this research is to explore another trait of authors’ coding behavior for personifying their footprints. The main question that we want to address is that “can code smells are useful for characterizing authors’ signatures? A machine learning based methodology is described not only to address the question but also for designing a system. Two different aspects of source code are considered for its representation into features: author’s style and code smells. The author’s style related feature representation is used as baseline. Results have shown that code smell can improves the authorship attribution.

Cite This Paper

Muqaddas Gull, Tehseen Zia, Muhammad Ilyas,"Source Code Author Attribution Using Author's Programming Style and Code Smells", International Journal of Intelligent Systems and Applications(IJISA), Vol.9, No.5, pp.27-33, 2017. DOI:10.5815/ijisa.2017.05.04


[1]A. C. Islam, " Poster: source code authorship attribution," Comput. Cardiol IEEE Press, 1997.
[2]R. R. Joshi, R. V. Argiddi, "Author identification: an approach based on style feature metrics of software source codes," International Journal of Computer Science and Information Technologies, vol. 4, no. 4, 2013.
[3]A. Gray, P. Sallis and S. MacDonell, “Software forensics: extending authorship analysis techniques to computer programs”, in Proceedings of the 3rd Biannual Conference of the International Association of Forensic Linguists (IAFL), 1997.
[4]J. Kothari, M. Shevertalov, E. Stehle, and S. Mancoridis, “A probabilistic approach to source code authorship identification,” In 4th International Conference on Information technology, IEEE, 2007, pp. 243–248.
[5]M. Mantyla, J. Vanhanen and C. Lassenius, “A taxonomy and initial empirical study of bad smells in code”, in Proceedings of the IEEE International Conference on Software Maintenance, pp. 381-384, 2003.
[6]M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts,” Refactoring: improving the design of existing code,” New Jersey: Addison-Wesley, 2000.
[7]S. Burrows, A. L. Uitdenbogerd, and A. Turpin, “Application of information retrieval techniques for source code authorship attribution,” Fourteenth International Conference on Database Systems for Advanced Applications, April 2009, pp. 699-713.
[8]G. Frantzeskou, S. Gritzalis and S. G. MacDonell, ”Source code authorship analysis for supporting the cybercrime investigation process,” in 1st International Conference on E-Business and Telecommunication networks, 2004, pp. 85-92.
[9]M. Shevertalov, J. Kothari, E. Stehle, and S. Mancoridis, “On the use of discretized source code metrics for author identification,” in 1st International Symposium on Search Based Software Engineering, 2009, pp. 69-78.
[10]G. Frantzeskou, S. MacDonell ,E. Stamatatos and S. Gritzalis, “Examining the significance of high-level programming features in source code author classification,” in Journal of System and Software, vol. 81,no. 3, pp. 447-460, 2008
[11]E. H. Spafford and S. A. Weeber, “Software forensics: can we track code to its authors?,” Computers & Security, vol. 12, no. 6, 1993 pp. 585-595.
[12]N. Rosenblum, X. Zhu, and B. P. Miller, “Who wrote this code? Identifying the authors of program binaries,” Computer Security–ESORICS 2011, 2011, pp. 172–189.
[13]N. Rosenblum, X. Zhu, and B. P. Miller, “Software forensics applied to the task of Discriminating between Program Authors,” in Journal of System Research and Information Systems 10, 2001, pp. 113-127.
[14]R. R. Joshi, R. V. Argiddi and S. Sulabha, “Author identification: an approach based on code feature metrics using decision trees,” in International Journal of Computer Applications (0975-8887), vol. 66, no.4, March 2013.
[15]F. A. Fontana, P. Braione and M. Zanoni, “Automatic detection of bad smells in code: An experimental assess,” in Journal of Object Technology, vol.11, no.2, 2012.
[16]I. Krsul and E. H. Spafford, “Authorship analysis: identifying the author of a program,” in proceeding of the 8th national Information System Security Conference, National Institute of Standard and Technology, 1995, pp. 514-524.
[17][Online]. Available:
[18]I. Krsul and E. H. Spafford, Authorship analysis: identifying the author of a program, Technical Report TR-96-052, September 1996.
[19]A. Gray, and S. MacDonell, “Identified: A dictionary-based system for extracting source code metrics for software forensics,” in Proceeding of Third Software Engineering: Education and Practice International Conference, IEEE, pp. 252-259, 1998.
[20]M. A., Cusumano & R. W. Shelby, “Microsoft secrets,” New York: NY, 1995.
[21]M. Fowler, K. Beck, J. Brant, W. Opdyke,D. Roberts. Refactoring Improving the Design of Existing Code, Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA 1999 ISBN:0-201-48567-2
[22]T. W. Kim, T. G. Kim and J. H. Seu, “Specification and automated detection of code smells using OCL,” in International Journal of Software Engineering and Its Applications, vol. 7, no. 4, July 2013.
[23]A. Chatzigeorgiou and A. Manakos,” Investigating the evolution of bad smells in object-oriented code,” in International conference on the Quality of Information and Communications Technology (QUATIC), IEEE, pp. 106–115, 2010.
[24]Y. Lin, “Support vector machines and the bayes rule in classification, Data Mining and Knowledge Discovery, vol. 6, no. 3, pp. 259–275, 2002.
[25]G. Dimitoglou, J. A. Adams and C. M. Jim, “Comparison of the C4.5 and a Naïve Bayes Classifier for the Prediction of Lung Cancer Survivability,” Journal of Computing ,vol. 4, Issue 8, August 2012
[26]M.L Zhang, “A k-nearest neighbor based algorithm for multi-label classification” IEEE International Conference on Granular Computing, 2005, pp: 718-721.
[27]M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, “The WEKA data mining software: an update”, ACM SIGKDD Explorations Newslette, vol. 11, pp: 10-18, 2009.
[28]C. C. Chang and C. J. Lin, “LIBSVM: A Library for support vector machines,” in ACM Transactions on Intelligent Systems and Technology ,vol. 2, no. 3, 2011.
[29]H. Zhang, “The Optimality of Naive Bayes”, American association for artificial intelligence, 2004.