Sky-CNN: A CNN-based Learning Approach for Skyline Scene Understanding

Full Text (PDF, 841KB), PP.14-25

Views: 0 Downloads: 0


Ameni Sassi 1,2,* Wael Ouarda 1 Chokri Ben Amar 1 Serge Miguet 2

1. REGIM-Lab.: REsearch Groups in Intelligent Machines, University of Sfax, ENIS, BP 1173, 3038, Sfax, Tunisia

2. LIRIS, Université de Lyon, UMR CNRS 5202, Université Lumière Lyon 2, 5 av. Mendès-France, Bât C, N 123, 69676.

* Corresponding author.


Received: 16 Oct. 2018 / Revised: 20 Nov. 2018 / Accepted: 13 Dec. 2018 / Published: 8 Apr. 2019

Index Terms

Convolutional Neural Network, deep learning, scene categorization, skyline, features representation, deep learned features


Skyline scenes are a scientific matter of interest for some geographers and urbanists. These scenes have not been well-handled in computer vision tasks. Understanding the context of a skyline scene could refer to approaches based on hand-crafted features combined with linear classifiers; which are somewhat side-lined in favor of the Convolutional Neural Networks based approaches. In this paper, we proposed a new CNN learning approach to categorize skyline scenes. The proposed model requires a pre-processing step enhancing the deep-learned features and the training time. To evaluate our suggested system; we constructed the SKYLINEScene database. This new DB contains 2000 images of urban and rural landscape scenes with a skyline view. In order to examine the performance of our Sky-CNN system, many fair comparisons were carried out using well-known CNN architectures and the SKYLINEScene DB for tests. Our approach shows it robustness in Skyline context understanding and outperforms the hand-crafted approaches based on global and local features.

Cite This Paper

Ameni Sassi, Wael Ouarda, Chokri Ben Amar, Serge Miguet, "Sky-CNN: A CNN-based Learning Approach for Skyline Scene Understanding", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.4, pp.14-25, 2019. DOI:10.5815/ijisa.2019.04.02


[1]Wei, X., Phung, S.L., Bouzerdoum, A.: ‘Visual descriptors for scene categorization: experimental evaluation’, Artificial Intelligence Review, 2016, 45, (3), pp. 333–368. Available from:
[2]Sassi, A., Amar, C.B., Miguet, S. ‘Skyline-based approach for natural scene identification’. In: 13th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2016, Agadir, Morocco, November 29 - December 2, 2016. pp. 1–8.
[3]Day, A.: ‘Urban visualization and public inquiries: the case of the heron tower, london’, Architectural Research Quarterly, 2002, 6, (4), pp. 363–372
[4]III, A.S., Nasar, J.L., Hanyu, K.: ‘Using pre-construction validation to regulate urban skylines’, Journal of the American Planning Association, 2005, 71, (1), pp. 73–91
[5]Nasar, J.L., Terzano, K.: ‘The desirability of views of city skylines after dark’, Journal of Environmental. Psychology, 2010, 30, (2), pp. 215 – 225
[6]Ayadi, M., Suta, L., Scuturici, M., Miguet, S., Ben.Amar, C. In: Blanc.Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P., editors. ‘A parametric algorithm for skyline extraction’. (Cham: Springer International Publishing, 2016. pp. 604–615
[7]Tonge, R., Maji, S., Jawahar, C.V. ‘Parsing world’s skylines using shape-constrained mrfs’. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3174–3181
[8]Sassi, A., Ouarda, W., Ben.Amar, C., Miguet, S. ‘Neural Approach for Context Scene Image Classification based on Geometric, Texture and Color Information’. In: Representation, analysis and recognition of shape and motion FroM Image data. (Aussois, France: RFIA, 2017. Availablefrom:
[9]Yassin, F.M., Lazzez, O., Ouarda, W., Alimi, A.M. ‘Travel user interest discovery from visual shared data in social networks’. In: 2017 Sudan Conference on Computer Science and Information Technology (SCCSIT), pp. 1–7
[10]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. ‘Going deeper with convolutions’. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. pp. 1–9
[11]Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., et al. ‘Convolutional recurrent neural networks: Learning spatial dependencies for image representation’. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015. pp. 18–26
[12]Gong, Y., Wang, L., Guo, R., Lazebnik, S.: ‘Multi-scale orderless pooling of deep convolutional activation features’, CoRR, 2014, abs/1403.1840. Available from:
[13]Krizhevsky, A., Sutskever, I., Hinton, G.E. ‘Imagenet classification with deep convolutional neural networks’. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12. (USA: Curran Associates Inc., 2012. pp. 1097–1105
[14]Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: ‘Image classification with the fisher vector: Theory and practice’, Int J Comput Vision, 2013, 105, (3), pp. 222–245
[15]Yang, J., Yu, K., Gong, Y., Huang, T.S. ‘Linear spatial pyramid matching using sparse coding for image classification’. In: CVPR. (IEEE Computer Society, 2009. pp. 1794–1801
[16]Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: ‘Sun database: Exploring a large collection of scene categories’, International Journal of Computer Vision, 2016, 119, (1), pp. 3–22
[17]Oliva, A. & Torralba, A.: ‘Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope’. In: International Journal of Computer Vision, 2001, 42: 145–147.
[18]Ojala, T., PietikÃd’inen, M., Harwood, D.: ‘A comparative study of texture measures with classification based on featured distributions’, Pattern Recognition, 1996, 29, (1), pp. 51 – 59
[19]Huttunen, S., Rahtu, E., Kunttu, I., Gren, J., Heikkilä, J. In: Heyden, A., Kahl, F., editors. ‘Real-time detection of landscape scenes’. (Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. pp. 338–347
[20]Han, X., Chen, Y. ‘Image categorization by learned PCA subspace of combined visual-words and low-level features’. In: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2009), Kyoto, Japan, 12-14 September, 2009, Proceedings, 2009. pp. 1282–1285
[21]Serrano, N., Savakis, A.E., Luo, J.: ‘Improved scene classification using efficient low-level features and semantic cues’, Pattern Recognition, 2004, 37, (9), pp. 1773– 1784
[22]Vailaya, A., Jain, A., Zhang, H.J.: ‘On image classification: City images vs. landscapes’, Pattern Recognition, 1998, 31, (12), pp. 1921 – 1935
[23]Chen, Z., Chi, Z., Fu, H. ‘A hybrid holistic/semantic approach for scene classification’. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, August 24-28, 2014. (, 2014. pp. 2299–2304
[24]Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. ‘Gradient-based learning applied to document recognition’. In: Proceedings of the IEEE. (, 1998. pp. 2278–2324
[25]He, K., Zhang, X., Ren, S., Sun, J. ‘Deep residual learning for image recognition’. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. pp. 770–778
[26]Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. ‘Tensorflow: A system for large-scale machine learning’. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. OSDI’16. (Berkeley, CA, USA: USENIX Association, 2016. pp. 265–283. Available from:
[27]Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., et al.: ‘A survey on deep learning in medical image analysis’, Medical Image Analysis, 2017, 42, pp. 60 – 88. Available from:
[28]Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: ‘Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40, (4), pp. 834–848
[29]Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W., McWilliams, B.: ‘The shattered gradients problem: If resnets are the answer, then what is the question?’, CoRR, 2017, Available from:
[30]Philipp, G., Song, D., Carbonell, J.G.. ‘Gradients explode - deep networks are shallow - resnet explained’, 2018. Available from: https://openreview. net/forum?id=HkpYwMZRb
[31]Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: ‘Rethinking the inception architecture for computer vision’, CoRR, 2015, abs/1512.00567. Available from:
[32]He, K., Zhang, X., Ren, S., Sun, J.: ‘Identity mappings in deep residual networks’, CoRR, 2016, abs/1603.05027. Available from:
[33]Hiippala, T. ‘Recognizing military vehicles in social media images using deep learning’. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 2017. pp. 60–65
[34]Alvarez, S., Vanrell, M.: ‘Texton theory revisited: A bag-of-words approach to combine textons’, Pattern Recognition, 2012, 45, (12), pp. 4312– 4325.