Mask-Aware Localized Inpainting Method for CPU-Based Inference

PDF (1139KB), PP.30-41

Views: 0 Downloads: 0

Author(s)

Volodymyr Oliinyk 1,* Serhii Hatsan 1

1. Department of Information Systems and Technologies of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute,” Kyiv, 03056, Ukraine

* Corresponding author.

DOI: https://doi.org/10.5815/ijem.2026.03.03

Received: 15 Jan. 2026 / Revised: 8 Mar. 2026 / Accepted: 18 May 2026 / Published: 8 Jun. 2026

Index Terms

Image Inpainting, Edge Computing, Model Compression, CPU Inference, Edge AI, Latent Diffusion Models, Resource-Constrained Systems

Abstract

Image generation methods, including inpainting, are evolving rapidly; however, high memory requirements continue to limit their practical deployment. As a result, the efficient utilization of Latent Diffusion Models on edge devices has become increasingly important. This work explores techniques for reducing memory usage in Latent Diffusion Models while preserving their generative capabilities.

We propose a resource-efficient inpainting method optimized for CPU-based inference, based on a combination of VAE tiling, attention slicing, and dynamic region-of-interest slicing. Experimental results demonstrate that the model's memory footprint can be significantly reduced while maintaining output quality, without substantial increases in computation time, enabling execution on systems with as little as 4 GB of memory and only two processing cores. While the introduced optimizations, particularly those based on localized image processing, introduce an inherent trade-off between memory usage and computational cost, resulting in longer inference times compared to GPU-accelerated solutions, they demonstrate strong potential for deployment in memory-limited environments.

Additionally, we provide analysis of key deployment bottlenecks, including model compilation for cold-start overhead mitigation, proper runtime configuration and scheduler selection. These findings confirm the feasibility of effectively deploying Latent Diffusion Models for inpainting tasks on CPU-only, resource-constrained platforms, thereby broadening their applicability to edge computing scenarios.

Cite This Paper

Volodymyr Oliinyk, Serhii Hatsan, "Mask-Aware Localized Inpainting Method for CPU-Based Inference", International Journal of Engineering and Manufacturing (IJEM), Vol.16, No.3, pp.30-41, 2026. DOI:10.5815/ijem.2026.03.03

Reference

[1]Reuss, M., Yağmurlu, Ö. E., Wenzel, F., & Lioutikov, R. (2024). Multimodal diffusion transformer: Learning versatile behavior from multimodal goals. arXiv preprint arXiv:2407.05996.
[2]Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
[3]Black Forest Labs. (2025). FLUX.2: Analyzing and enhancing the latent space of FLUX -- representation comparison. https://bfl.ai/research/representation-comparison
[4]Li, W., Lin, Z., Zhou, K., Qi, L., Wang, Y., & Jia, J. (2022). Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10758-10768).
[5]Manukyan, H., Sargsyan, A., Atanyan, B., Wang, Z., Navasardyan, S., & Shi, H. (2023). Hd-painter: high-resolution and prompt-faithful text-guided image inpainting with diffusion models. In The Thirteenth International Conference on Learning Representations.
[6]Li, F., Li, A., Qin, J., Bai, H., Lin, W., Cong, R., & Zhao, Y. (2022). SRInpaintor: When super-resolution meets transformer for image inpainting. IEEE Transactions on Computational Imaging, 8, 743-758.
[7]Corneanu, C., Gadde, R., & Martinez, A. M. (2024). Latentpaint: Image inpainting in latent space with diffusion models. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 4334-4343).
[8]Sapkota, R., Flores-Calero, M., Qureshi, R., Badgujar, C., Nepal, U., Poulose, A., ... & Karkee, M. (2025). YOLO advances to its genesis: A decadal and comprehensive review of the You Only Look Once (YOLO) series. Artificial Intelligence Review, 58(9), 274. DOI: https://doi.org/10.1007/s10462-025-11253-3
[9]Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., ... & Girshick, R. (2023). Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4015-4026).
[10]Yoshida, D. (2023). NF4 Isn't Information Theoretically Optimal (and that's Good). arXiv preprint arXiv:2306.06965.
[11]Tripathi, O. M. (2024). "GGUF Models and Quantization". DOI:  http://dx.doi.org/10.2139/ssrn.5044207 
[12]Li, D., Xie, X., Zh11ang, D., Vasilakos, A. V., & Leung, M. F. (2025). SEMQ: Efficient non-uniform quantization with sensitivity-based error minimization for large language models. Future Generation Computer Systems, 108120.
[13]Liu, J., Gong, R., Wei, X., Dong, Z., Cai, J., & Zhuang, B. (2023). Qllm: Accurate and efficient low-bitwidth quantization for large language models. arXiv preprint arXiv:2310.08041.
[14]Sauer, A., Lorenz, D., Blattmann, A., & Rombach, R. (2024, September). Adversarial diffusion distillation. In European Conference on Computer Vision (pp. 87-103). Cham: Springer Nature Switzerland.
[15]Kim, Y., Anagnostidis, S., Du, Y., Schönfeld, E., Kohler, J., Georgopoulos, M., ... & Sanakoyeu, A. (2025). Autoregressive distillation of diffusion transformers. In Proceedings of the Computer Vision and Pattern Recognition Conference (pp. 15745-15756).
[16]Wei, Y., Tang, S., Zhao, L., & Yang, Q. (2025). DiffusionX: Efficient Edge-Cloud Collaborative Image Generation with Multi-Round Prompt Evolution. arXiv preprint arXiv:2510.16326.
[17]Yan, C., Liu, S., Liu, H., Peng, X., Wang, X., Chen, F., ... & Mei, X. (2024). Hybrid sd: Edge-cloud collaborative inference for stable diffusion models. arXiv preprint arXiv:2408.06646.
[18]Sabour, A., Fidler, S., & Kreis, K. (2024). Align your steps: Optimizing sampling schedules in diffusion models. arXiv preprint arXiv:2404.14507.