Island Loss for Improving the Classification of Facial Attributes with Transfer Learning on Deep Convolutional Neural Network

Full Text (PDF, 940KB), PP.18-29

Views: 0 Downloads: 0


Shuvendu Roy 1,*

1. Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Bangladesh

* Corresponding author.


Received: 9 Aug. 2019 / Revised: 16 Aug. 2019 / Accepted: 27 Aug. 2019 / Published: 8 Feb. 2020

Index Terms

Island Loss, Transfer learning, Facial attribute classification, CNN


Classification task on the human facial attribute is hard because of the similarities in between classes. For example, emotion classification and age estimation are two important applications. There are very little changes between the different emotions of a person and a different person has a different way of expressing the same emotion. Same for age classification. There is little difference between consecutive ages. Another problem is the image resolution. Small images contain less information and large image requires a large model and lots of data to train properly. To solve both of these problems this work proposes using transfer learning on a pre-trained model combining a custom loss function called Island Loss to reduce the intra-class variation and increase the inter-class variation. The experiments have shown impressive results on both of the application with this method and achieved higher accuracies compared to previous methods on several benchmark datasets.

Cite This Paper

Shuvendu Roy, " Island Loss for Improving the Classification of Facial Attributes with Transfer Learning on Deep Convolutional Neural Network", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.12, No.1, pp. 18-29, 2020. DOI: 10.5815/ijigsp.2020.01.03


[1]T. Ba¨nziger and K. R. Scherer, “Introducing the geneva multimodal emotion portrayal (gemep) corpus,” Blueprint for affective comput- ing: A sourcebook, pp. 271–294, 2010.

[2]M. S. Bartlett, G. Littlewort, M. G. Frank, C. Lainscsek, I. R. Fasel, and J. R. Movellan, “Automatic recognition of facial actions in spontaneous expressions.” Journal of multimedia, vol. 1, no. 6, pp. 22–35, 2006.

[3]M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on. IEEE, 2005, pp. 5–pp.

[4]R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi- pie,” Image and Vision Computing, vol. 28, no. 5, pp. 807–813, 2010.

[5]A. J. O’Toole, J. Harms, S. L. Snow, D. R. Hurst, M. R. Pappas, J. H. Ayyad, and H. Abdi, “A video database of moving faces  and people,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 812–816, 2005.

[6]F. Wallhoff, B. Schuller, M. Hawellek, and G. Rigoll, “Efficient recognition of authentic dynamic facial expressions on the feed- tum database,” in Multimedia and Expo, 2006 IEEE International Conference on. IEEE, 2006, pp. 493–496.

[7]P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A com- plete dataset for action unit and emotion-specified expression,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 2010, pp. 94–101.

[8]G. McKeown, M. F. Valstar, R. Cowie, and M. Pantic, “The se- maine corpus of emotionally coloured character interactions,” in Multimedia and Expo (ICME), 2010 IEEE International Conference on. IEEE, 2010, pp. 1079–1084.

[9]C.-C. Lee, C.-Y. Shih, W.-P. Lai, and P.-C. Lin, “An improved boost- ing algorithm and its application to facial emotion recognition,” Journal of Ambient Intelligence and Humanized Computing, vol. 3,   no. 1, pp. 11–17, 2012.

[10]A. Chakraborty, A. Konar, U. K. Chakraborty, and A. Chatterjee, “Emotion recognition from facial expressions and its control using fuzzy logic,” IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, vol. 39, no. 4, pp. 726–743, 2009.

[11]P. Viola and M. J. Jones, “Robust real-time face detection,” Interna- tional journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.

[12]C. Shan, S. Gong, and P. W. McOwan, “Facial expression recog- nition based on local binary patterns: A comprehensive study,” Image and Vision Computing, vol. 27, no. 6, pp. 803–816, 2009.

[13]N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE,  2005, pp. 886–893.

[14]C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recogni- tion,” IEEE Transactions on Image processing, vol. 11, no. 4, pp. 467– 476, 2002.

[15]B.-K. Kim, H. Lee, J. Roh, and S.-Y. Lee, “Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 2015, pp. 427–434.

[16]Y. H. Kwon and N. da Vitoria Lobo, “Age classification from facial images,” Computer vision and image understanding, vol. 74, no. 1, pp. 1–21, 1999.

[17]N. Ramanathan and R. Chellappa, “Modeling age progression in young faces,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1. IEEE, 2006, pp. 387–394.

[18]C.-T. Shen, F. Huang, W.-H. Lu, S.-W. Shih, and H.-Y.  M. Liao,  “3d age progression prediction in children’s faces with a small exemplar-image set.” Journal of Information Science & Engineering, vol. 30, no. 4, 2014.

[19]A. Gunay and V. V. Nabiyev, “Automatic detection of anthro- pometric features from facial images,” in Signal Processing and Communications Applications, 2007. SIU 2007. IEEE 15th. IEEE,  2007, pp. 1–4.

[20]A. Lanitis, C. J. Taylor, and T. F. Cootes, “Toward automatic simulation of aging effects on face images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 442–455, 2002.

[21]G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimation using bio-inspired features,” in Computer Vision and Pattern Recog- nition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 112– 119.

[22]H. Han, C. Otto, A. K. Jain et al., “Age estimation from face images: Human vs. machine performance.” ICB, vol. 13, pp. 1–8, 2013.

[23]X. Geng, Z.-H. Zhou, and K. Smith-Miles, “Automatic age estima- tion based on facial aging patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 29, no. 12, pp. 2234–2240, 2007.

[24]D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, p. 533, 1986.

[25]Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, “A theoret- ical framework for back-propagation,” in Proceedings of the 1988 connectionist models summer school, vol. 1. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988, pp. 21–28.

[26]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.

[27]Y. LeCun, Y. Bengio et al., “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995

[28]S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[29]J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625–2634.

[30]C.-Y. Chang and Y.-C. Huang, “Personalized facial expression recognition in indoor environments,” in Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, 2010, pp. 1–8.

[31]B. Zhang, C. Quan, and F. Ren, “Study on cnn in the recognition of emotion in audio and images,” in Computer and Information Science (ICIS), 2016 IEEE/ACIS 15th International Conference on. IEEE, 2016, pp. 1–5.

[32]G. Levi and T. Hassner, “Age and gender classification using con- volutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 34– 42.

[33]A. M. Bukar and H. Ugail, “Automatic age estimation from facial profile view,” IET Computer Vision, vol. 11, no. 8, pp. 650–655, 2017.

[34]A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon, “Video and image based emotion recognition challenges in the wild: Emotiw 2015,” in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 2015, pp. 423–426.

[35]A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” in Ap- plications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016, pp. 1–10.

[36]K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[37]C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.

[38]C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.

[39]C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception- v4, inception-resnet and the impact of residual connections on learning.” in AAAI, vol. 4, 2017, p. 12.

[40]K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[41]N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural net- works from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

[42]V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international confer- ence on machine learning (ICML-10), 2010, pp. 807–814.

[43]M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transfer- ring mid-level image representations using convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1717–1724.

[44]J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in neural information processing systems, 2014, pp. 3320–3328.

[45]A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. IEEE, 2014, pp. 512–519.

[46]Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European Confer- ence on Computer Vision. Springer, 2016, pp. 499–515.

[47]J. Cai, Z. Meng, A. S. Khan, Z. Li, J. OReilly, and Y. Tong, “Island loss for learning discriminative features in facial expression recog- nition,” in Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on. IEEE, 2018, pp. 302–309.

[48]G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks.” in CVPR,  vol.  1, no. 2, 2017, p. 3.

[49]O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recogni- tion.” in BMVC, vol. 1, no. 3, 2015, p. 6.

[50]S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.

[51]M. G. Calvo and D. Lundqvist, “Facial expressions of emotion (kdef): Identification under different display-duration conditions,” Behavior research methods, vol. 40, no. 1, pp. 109–115, 2008.

[52]M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” in Pro- ceedings of third international conference on automatic face and gesture recognition, 1998, pp. 14–16.

[53]G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.

[54]S. Y. Zhang, Zhifei and H. Qi, “Age progression/regression by conditional adversarial autoencoder,” in IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR). IEEE, 2017.

[55]B.-C. Chen, C.-S. Chen, and W. H. Hsu, “Cross-age reference cod- ing for age-invariant face recognition and retrieval,” in Proceedings of the European Conference on Computer Vision (ECCV), 2014.

[56]Y. Fu, T. M. Hospedales, T. Xiang, Y. Yao, and S. Gong, “Interest- ingness prediction by robust learning to rank,” in ECCV, 2014.

[57]D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” arXiv preprint arXiv:1412.6980, 2014.

[58]Q. Xiao-xu and J. Wei,  “Application of wavelet energy feature    in facial expression recognition,” in Anti-counterfeiting, Security, Identification, 2007 IEEE International Workshop on.   IEEE, 2007,   pp. 169–174.

[59]X. Feng, M. Pietika¨inen, and A. Hadid, “Facial expression recogni- tion based on local binary patterns,” Pattern Recognition and Image Analysis, vol. 17, no. 4, pp. 592–598, 2007.

[60]L. Zhao, G. Zhuang, and X. Xu, “Facial expression recognition based on pca and nmf,” in Intelligent Control and Automation, 2008. WCICA 2008. 7th World Congress on. IEEE, 2008, pp. 6826–6829.

[61]R. Zhi and Q. Ruan, “Facial expression recognition based on two- dimensional discriminant locality preserving projections,” Neuro- computing, vol. 71, no. 7-9, pp. 1730–1734, 2008.

[62]F. Y. Shih, C.-F. Chuang, and P. S. Wang, “Performance compar- isons of facial expression recognition in jaffe database,” Interna- tional Journal of Pattern Recognition and Artificial Intelligence, vol. 22, no. 03, pp. 445–459, 2008.

[63]C. F. Liew and T. Yairi, “Facial expression recognition and analysis: a comparison study of feature descriptors,” IPSJ transactions on computer vision and applications, vol. 7, pp. 104–120, 2015.

[64]H. Alshamsi, V. Kepuska, and H. Meng, “Real time automated facial expression recognition app development on smart phones,” 2017.

[65]J. Suo, S.-C. Zhu, S. Shan, and X. Chen, “A compositional and dy- namic model for face aging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 385–401, 2010.

[66]K. Luu, K. Seshadri, M. Savvides, T. D. Bui, and C. Y. Suen, “Contourlet appearance model for facial age estimation,” 2011.

[67]K.-Y. Chang, C.-S. Chen, and Y.-P. Hung, “Ordinal hyperplanes ranker with cost sensitivities for age estimation,” in Computer vision and pattern recognition (cvpr), 2011 ieee conference on. IEEE, 2011, pp. 585–592.

[68]T. Wu, P. Turaga, and R. Chellappa, “Age estimation and face verification across aging using landmarks,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, pp. 1780–1788, 2012.

[69]P. Thukral, K. Mitra, and R. Chellappa, “A hierarchical approach for human age estimation,” in Acoustics, Speech and Signal Process- ing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 1529–1532.

[70]W.-L. Chao, J.-Z. Liu, and J.-J. Ding, “Facial age estimation based on label-sensitive learning and age-oriented regression,” Pattern Recognition, vol. 46, no. 3, pp. 628–641, 2013.