A Novel Approach for Video Inpainting Using Autoencoders

Full Text (PDF, 723KB), PP.48-61

Views: 0 Downloads: 0


Irfan Siddavatam 1,* Ashwini Dalvi 1 Dipti Pawade 1 Akshay Bhatt 1 Jyeshtha Vartak 1 Arnav Gupta 1

1. Department of Information Technology, K J Somaiya College of Engineering, Mumbai, Maharashtra, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2021.06.05

Received: 4 Sep. 2021 / Revised: 25 Sep. 2021 / Accepted: 12 Oct. 2021 / Published: 8 Dec. 2021

Index Terms

Object Removal, Image Inpainting, Video Inpainting, Background Regeneration, Autoencoders.


Inpainting is a task undertaken to fill in damaged or missing parts of an image or video frame, with believable content. The aim of this operation is to realistically complete images or frames of videos for a variety of applications such as conservation and restoration of art, editing images and videos for aesthetic purposes, but might cause malpractices such as evidence tampering. From the image and video editing perspective, inpainting is used mainly in the context of generating content to fill the gaps left after removing a particular object from the image or the video. Video Inpainting, an extension of Image Inpainting, is a much more challenging task due to the constraint added by the time dimension. Several techniques do exist that achieve the task of removing an object from a given video, but they are still in a nascent stage. The major objective of this paper is to study the available approaches of inpainting and propose a solution to the limitations of existing inpainting techniques. After studying existing inpainting techniques, we realized that most of them make use of a ground truth frame to generate plausible results. A 'ground truth' frame is an image without the target object or in other words, an image that provides maximum information about the background, which is then used to fill spaces after object removal. In this paper, we propose an approach where there is no requirement of a 'ground truth' frame, provided that the video has enough contexts available about the background that is to be recreated. We would be using frames from the video in hand, to gather context for the background. As the position of the target object to be removed will vary from one frame to the next, each subsequent frame will reveal the region that was initially behind the object, and provide more information about the background as a whole. Later, we have also discussed the potential limitations of our approach and some workarounds for the same, while showing the direction for further research.

Cite This Paper

Irfan Siddavatam, Ashwini Dalvi, Dipti Pawade, Akshay Bhatt, Jyeshtha Vartak, Arnav Gupta, "A Novel Approach for Video Inpainting Using Autoencoders", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.13, No.6, pp. 48-61, 2021. DOI:10.5815/ijieeb.2021.06.05


[1]Feature Learning by Inpainting. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536-2544 http://openaccess.thecvf.com/content_cvpr_2016/html/Pathak_Context_Encoders_Feature_CVPR_2016_paper.html
[2]Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do (2017). Semantic Image Inpainting with Deep Generative Models. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5485-5493. http://openaccess.thecvf.com/content_cvpr_2017/html/Yeh_Semantic_Image_Inpainting_CVPR_2017_paper.html
[3]Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang (2017). Generative Face Completion. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3911-3919. http://openaccess.thecvf.com/content_cvpr_2017/html/Li_Generative_Face_Completion_CVPR_2017_paper.html
[4]Ching-Wei Tseng, Hung Jin Lin, Shang-Hong Lai (2017). General Deep Image Completion with Lightweight Conditional Generative Adversarial Networks. British Machine Vision Conference 2017, London, UK, September 4-7, 2017. http://www.bmva.org/bmvc/2017/papers/paper080/paper080.pdf
[5]Ugur Demir, Gozde Unal (2018). Patch-Based Image Inpainting with Generative Adversarial Networks. arXiv preprint arXiv:1803.07422. https://arxiv.org/abs/1803.07422
[6]Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang (2018). Generative Image Inpainting with Contextual Attention. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5505-5514. http://openaccess.thecvf.com/content_cvpr_2018/html/Yu_Generative_Image_Inpainting_CVPR_2018_paper.html
[7]Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro (2018). Image Inpainting for Irregular Holes Using Partial Convolutions. The European Conference on Computer Vision (ECCV), 2018, pp. 85-100. http://openaccess.thecvf.com/content_ECCV_2018/html/Guilin_Liu_Image_Inpainting_for_ECCV_2018_paper.html
[8]Huy V. Vo, Ngoc Q. K. Duong, Patrick Perez (2018). Structural inpainting. Proceedings of the 26th ACM international conference on Multimedia, Pages 1948–1956. https://doi.org/10.1145/3240508.3240678
[9]Emilien Dupont, Suhas Suresha (2019). Probabilistic Semantic Inpainting with Pixel Constrained CNNs. arXiv preprint arXiv:1810.03728. https://arxiv.org/abs/1810.03728
[10]Qingguo Xiao, Guangyao Li, Qiaochuan Chen (2019). Deep Inception Generative Network for Cognitive Image Inpainting. arXiv preprint arXiv:1812.01458. https://arxiv.org/abs/1812.01458
[11]Zongyu Guo, Zhibo Chen, Tao Yu, Jiale Chen, Sen Liu (2019). Progressive Image Inpainting with Full-Resolution Residual Network. Proceedings of the 27th ACM International Conference on Multimedia, Pages 2496–2504. https://doi.org/10.1145/3343031.3351022
[12]Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang, Hao Li (2017). High-Resolution Image Inpainting Using Multi-Scale Neural Patch Synthesis. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6721-6729. http://openaccess.thecvf.com/content_cvpr_2017/html/Yang_High-Resolution_Image_Inpainting_CVPR_2017_paper.html
[13]Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang (2019). Free-Form Image Inpainting With Gated Convolution. The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 4471-4480. http://openaccess.thecvf.com/content_ICCV_2019/html/Yu_Free-Form_Image_Inpainting_With_Gated_Convolution_ICCV_2019_paper.html
[14]Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo (2019). Foreground-Aware Image Inpainting. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5840-5848. http://openaccess.thecvf.com/content_CVPR_2019/html/Xiong_Foreground-Aware_Image_Inpainting_CVPR_2019_paper.html
[15]Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z. Qureshi, Mehran Ebrahimi (2019). EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. arXiv preprint arXiv:1901.00212. https://arxiv.org/abs/1901.00212
[16]Kedar A. Patwardhan, Guillermo Sapiro, Marcelo Bertalmío (2007). Video inpainting under constrained camera motion. IEEE Transactions on Image Processing, vol. 16, no. 2, pp. 545-553. https://ieeexplore.ieee.org/abstract/document/4060949/
[17]Y. Chang, Z. Y. Liu, W. Hsu (2019). VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, pp. 1785-1794. https://ieeexplore.ieee.org/document/9025383
[18]Dahun Kim, Sanghyun Woo, Joon-Young Lee (2019). Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4263-4272. http://openaccess.thecvf.com/content_CVPR_2019/html/Kim_Deep_Blind_Video_Decaptioning_by_Temporal_Aggregation_and
[19]Rui Xu, Xiaoxiao Li, Bolei Zhou, Chen Change Loy (2019). Deep Flow-Guided Video Inpainting. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3723-3732. http://openaccess.thecvf.com/content_CVPR_2019/html/Xu_Deep_Flow-Guided_Video_Inpainting_CVPR_2019_paper.html
[20]Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon (2019). Deep Video Inpainting. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5792-5801. http://openaccess.thecvf.com/content_CVPR_2019/html/Kim_Deep_Video_Inpainting_CVPR_2019_paper.html
[21]Yifan Ding, Chuan Wang, Haibin Huang, Jiaming Liu, Jue Wang, Liqiang Wang (2019). Frame-Recurrent Video Inpainting by Robust Optical Flow Inference. arXiv preprint arXiv:1905.02882. https://arxiv.org/abs/1905.02882
[22]Ya-Liang Chang, Zhe Yu Liu, Kuan-Ying Lee, Winston Hsu (2019). Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN. The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 9066-9075. http://openaccess.thecvf.com/content_ICCV_2019/html/Chang_Free-Form_Video_Inpainting_With_3D_Gated_Convolution_and_Temporal_PatchGAN_ICCV_2019_paper.html
[23]D. Kim, S. Woo, J. Lee, and I. S. Kweon (2019). Recurrent Temporal Aggregation Framework for Deep Video Inpainting. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 5, pp. 1038-1052. https://ieeexplore.ieee.org/abstract/document/8931251/
[24]Haotian Zhang, Long Mai, Ning Xu, Zhaowen Wang, John Collomosse, Hailin Jin (2019). An Internal Learning Approach to Video Inpainting. The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 2720-2729. http://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_An_Internal_Learning_Approach_to_Video_Inpainting_ICCV_2019_paper
[25]Sungho Lee, Seoung Wug Oh, DaeYeun Won, Seon Joo Kim (2019). Copy-and-Paste Networks for Deep Video Inpainting. The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 4413-4421. http://openaccess.thecvf.com/content_ICCV_2019/html/Lee_Copy-and-Paste_Networks_for_Deep_Video_Inpainting_ICCV_2019_paper.html
[26]Thanh Thi Nguyen, Cuong M. Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen, and Saeid Nahavandi (2019). Deep Learning for Deepfakes Creation and Detection. arXiv preprint arXiv:1909.11573. https://arxiv.org/abs/1909.11573
[27]Luisa Verdoliva (2020). Media Forensics and DeepFakes: an overview. arXiv preprint arXiv:2001.06564. https://arxiv.org/abs/2001.06564
[28]Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros (2019). Everybody Dance Now. The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 5933-5942 http://openaccess.thecvf.com/content_ICCV_2019/html/Chan_Everybody_Dance_Now_ICCV_2019_paper.html
[29]Tero Karras, Samuli Laine, Timo Aila (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401-4410. http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html
[30]Sandipan Banerjee, Walter Scheirer, Kevin Bowyer, Patrick Flynn (2020). On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs. The IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 300-309. http://openaccess.thecvf.com/content_WACV_2020/html/Banerjee_On_Hallucinating_Context_and_Background_Pixels_from_a_
[31]D. Güera and E. J. Delp (2018). Deepfake Video Detection Using Recurrent Neural Networks. 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 2018, pp. 1-6, doi: 10.1109/AVSS.2018.8639163. https://ieeexplore.ieee.org/abstract/document/8639163/