Self-adaptive Resource Allocation in Fog-Cloud Systems Using Multi-agent Deep Reinforcement Learning with Meta-learning

PDF (886KB), PP.107-118

Views: 0 Downloads: 0

Author(s)

Tapas K. Das 1,* Santosh K. Das 2 Swarupananda Bissoyi 2 Deepak K. Patel 1

1. Department of Compute Science and Engineering, Institute of Technical Education and Research (ITER), Siksha 'O' Anusandhan (SOA) Deemed to be University, Bhubaneswar, 751030, India

2. Department of Computer Application, Maharaja Sriram Chandra Bhanja Deo University, Baripada, 757003, India

3. Department of Computer Science and Information Technology, Institute of Technical Education and Research (ITER), Siksha 'O' Anusandhan (SOA) Deemed to be University, Bhubaneswar, 751030, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.01.08

Received: 25 Jul. 2025 / Revised: 21 Sep. 2025 / Accepted: 26 Nov. 2025 / Published: 8 Feb. 2026

Index Terms

Fog Computing, Cloud Computing, Resource Allocation, Meta-Learning, Multi-Agent Reinforcement Learning, Deep RL, Task Offloading

Abstract

The rapid growth of IoT ecosystems has intensified the complexity of fog–cloud infrastructures, necessitating adaptive and energy-efficient task offloading strategies. This paper proposes MADRL-MAML, a Multi-Agent Deep Reinforcement Learning framework enhanced with Model-Agnostic Meta-Learning for dynamic fog–cloud resource allocation. The approach integrates curriculum learning, centralized attention-based critics, and KL-divergence regularization to ensure stable convergence and rapid adaptation to unseen workloads. A unified cost-based reward formulation is used, where less negative values indicate better joint optimization of energy, latency, and utilization. MADRL-MAML is benchmarked against six baselines Greedy, Random, Round-Robin, PPO, Federated PPO, and Meta-RL using consistent energy, latency, utilization, and reward metrics. Across these baselines, performance remains similar: energy (3.64–3.71 J), latency (85.4–86.7 ms), and utilization (0.51–0.54). MADRL-MAML achieves substantially better results with a reward of $-21.92 \pm 3.88$, energy 1.16 J, latency 12.80 ms, and utilization 0.39, corresponding to 68\% lower energy and 85\% lower latency than Round-Robin. For unseen workloads characterized by new task sizes, arrival rates, and node heterogeneity, the meta-learned variant (MADRL-MAML-Unseen) achieves a reward of $-6.50 \pm 3.98$, energy 1.14 J, latency 12.76 ms, and utilization 0.73, demonstrating strong zero-shot generalization. Experiments were conducted in a realistic simulated environment with 10 fog and 2 cloud nodes, heterogeneous compute capacities, and Poisson task arrivals. Inference latency remains below 5 ms, confirming real-time applicability. Overall, MADRL-MAML provides a scalable and adaptive solution for energy-efficient and latency-aware orchestration in fog–cloud systems.

Cite This Paper

Tapas K. Das, Santosh K. Das, Swarupananda Bissoyi, Deepak K. Patel, "Self-adaptive Resource Allocation in Fog-Cloud Systems Using Multi-agent Deep Reinforcement Learning with Meta-learning", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.1, pp.107-118, 2026. DOI:10.5815/ijisa.2026.01.08

Reference

[1]Yuyi Mao, Changsheng You, Jun Zhang, Kaibin Huang, and Khaled B. Letaief. A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys and Tutorials, 19(4):2322–2358, 2017.
[2]Bushra Jamil, Humaira Ijaz, Mohammad Shojafar, Kashif Munir, and Rajkumar Buyya. Resource allocation and task scheduling in fog computing and internet of everything environments: A taxonomy, review, and future directions. ACM Computing Surveys, 53(5):1–38, 2020.
[3]Long Zhang, Zhen Zhao, Qiwu Wu, Hui Zhao, Haitao Xu, and Xiaobo Wu. Energy-aware dynamic resource allocation in uav assisted mobile edge computing over social internet of vehicles. IEEE Transactions on Vehicular Technology, 72(5):6543–6554, 2023.
[4]Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
[5]John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[6]Jiaqi Xiang, Qingdong Li, Xiwang Dong, and Zhang Ren. Continuous control with deep reinforcement learning for mobile robot navigation. Journal of Physics: Conference Series, 1601(5):052047, 2020.
[7]Yi Qin, Junyan Chen, Lei Jin, Rui Yao, and Zidan Gong. Task offloading optimization in mobile edge computing based on a deep reinforcement learning algorithm using density clustering and ensemble learning. Scientific Reports, 13(1):937, 2023.
[8]Latif U. Khan, Shashi Raj Pandey, Nguyen H. Tran, Walid Saad, Zhu Han, and Minh N. H. Nguyen. Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications Magazine, 58(10):88–93, 2021.
[9]Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1126–1135. PMLR, 2017.
[10]Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
[11]H. Yang, L. Ma, H. Liu, and B. Xu. A meta reinforcement learning-based task offloading strategy for iot devices in an edge cloud computing environment. Applied Sciences, 13(9):5412, 2023
[12]Chuan Liu and Zheng Sun. Multi-agent reinforcement learning-based task-offloading strategy in a blockchainenabled edge computing network. Mathematics, 12(14):2264, 2024.
[13]M. Shoaib Munir, Muhammad Asad, Sajjad Hassan, and J. Kim. Multi-agent meta-reinforcement learning for selfpowered and sustainable edge computing systems. arXiv preprint arXiv:2002.08567, 2020.
[14]Yihong Li, Xiaoxi Zhang, Tianyu Zeng, Jingpu Duan, Chuan Wu, Di Wu, and Xu Chen. Task placement and resource allocation for edge machine learning: A gnn-based multi-agent reinforcement learning paradigm. IEEE Transactions on Parallel and Distributed Systems, 34(12):3073–3089, 2023.
[15]Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 4295–4304. PMLR, 2018.
[16]Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[17]Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems (NeurIPS), pages 6379–6390, 2017.
[18]Akito Suzuki, Masahiro Kobayashi, and Eiji Oki. Multi-agent deep reinforcement learning for cooperative computing offloading and route optimization in multi cloud-edge networks. IEEE Transactions on Network and Service Management, 19(1):1–14, 2022.
[19]Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5149–5169, 2022.