KL-triggered Continual Adaptation for Nonstationary Resource Allocation: An Off-policy Actor–critic Approach with Nash Social Welfare

PDF (2133KB), PP.1-23

Views: 0 Downloads: 0

Author(s)

Yih-Chang Chen 1,*

1. Department of Social Work, Chang Jung Christian University, Tainan 711, Taiwan, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2026.03.01

Received: 12 Nov. 2025 / Revised: 4 Jan. 2026 / Accepted: 20 Mar. 2026 / Published: 8 Jun. 2026

Index Terms

Deep Reinforcement Learning, Non-Stationary Environments, Constrained Resource Allocation, Nash Social Welfare, Continual Adaptation

Abstract

This paper proposes a drift-aware off-policy deterministic actor–critic framework for constrained continuous resource allocation in non-stationary environments. Feasible allocations are ensured by a simplex-parameterized policy using softmax normalization with budget scaling, avoiding projection or Lagrangian tuning. The reward integrates Nash social welfare via mean log-utility, efficiency, fairness, and constraint-violation penalties with adaptive weights. To improve sample efficiency, we adopt prioritized experience replay based on TD error and state novelty. Non-stationarity is detected by KL divergence between recent and historical state-visitation distributions; detected drift triggers buffer refresh and incremental fine-tuning, while Elastic Weight Consolidation mitigates catastrophic forgetting. Experiments across six application-motivated domains (food, medical, housing, education services, employment support, and elderly care) demonstrate improved utilization and welfare with reduced inequality and low decision latency compared with optimization, heuristic, and DRL baselines. Results are reported over multiple runs with mean ± standard deviation and corrected significance tests.

Cite This Paper

Yih-Chang Chen, " KL-triggered Continual Adaptation for Non-stationary Resource Allocation: An Off-policy Actor–critic Approach with Nash Social Welfare", International Journal of Intelligent Systems and Applications(IJISA), Vol.18, No.3, pp.1-23, 2026. DOI:10.5815/ijisa.2026.03.01

Reference

[1]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, January 2016. doi: 10.1038/nature16961
[2]O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350-354, 2019. doi: 10.1038/s41586-019-1724-z
[3]B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909-4926, June 2022. doi: 10.1109/TITS.2021.3054625
[4]T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv, 2015.
[5]K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 26-38, Nov. 2017. doi: 10.1109/MSP.2017.2743240
[6]S. S. Mousavi, M. Schukat, and E. Howley, “Deep reinforcement learning: An overview,” in Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, Y. Bi, S. Kapoor, and R. Bhatia, Eds. Springer, 2018, pp. 426-440. doi: 10.1007/978-3-319-56991-8_32
[7]V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, 2015. doi: 10.1038/nature14236
[8]H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proc. Thirtieth AAAI Conf. Artificial Intelligence, 2016, pp. 2094-2100. doi: 10.5555/3016100.3016191
[9]Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, vol. 48, 2016, pp. 1995-2003. doi: 10.5555/3045390.3045601
[10]T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv, 2015.
[11]J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[12]T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” arXiv, 1801.01290, 2018.
[13]R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA: MIT Press, 2018.
[14]S. Fujimoto, H. Van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proc. 35th Int. Conf. Mach. Learn., 2018, pp. 1587-1596.
[15]H. Charkhgard, K. Keshanian, R. Esmaeilbeigi, and P. Charkhgard, “The magic of Nash social welfare in optimization: Do not sum, just multiply!” The ANZIAM Journal, vol. 64, no. 2, pp. 119-134, 2022. doi: 10.1017/S1446181122000074
[16]Z. Xu, Z. Zhong, and B. Shi, “Deep reinforcement learning based resource allocation strategy in cloud-edge computing system,” in 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, 2022, pp. 1-8. doi: 10.1109/IJCNN55064.2022.9892029
[17]Z. Shao, Q. Wu, P. Fan, N. Cheng, Q. Fan, and J. Wang, “Semantic-aware resource allocation based on deep reinforcement learning for 5G-V2X HetNets,” IEEE Commun. Lett., vol. 28, no. 10, pp. 2452-2456, October 2024. doi: 10.1109/LCOMM.2024.3443603
[18]J. Huang, Y. Yang, J. Lee, D. He, and Y. Li, “Deep reinforcement learning-based resource allocation for RSMA in LEO satellite-terrestrial networks,” IEEE Trans. Commun., vol. 72, no. 3, pp. 1341-1354, March 2024. doi: 10.1109/TCOMM.2023.3331021
[19]Z. Fan, N. Peng, M. Tian, and B. Fain, “Welfare and fairness in multi-objective reinforcement learning,” in Proc. 2023 Int. Conf. Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 2023, pp. 1991-1999.
[20]V. X. Chen and J. N. Hooker, “Fairness through social welfare optimization,” arXiv, 2021.
[21]Z. Qiu, D. J. Rosenkrantz, M. O. Jackson, S. A. Levin, S. S. Ravi, R. E. Stearns, and M. V. Marathe, “Welfare optimization for resource allocation with peer effects,” PNAS Nexus, vol. 4, no. 9, Article pgaf298, 2025. doi: 10.1093/pnasnexus/pgaf298
[22]Y. Li, C. Mao, K. Huang, H. Wang, Z. Yu, M. Wang, and Y. Luo, “Deep reinforcement learning for efficient and fair allocation of healthcare resources,” in Proc. 34th Int. Joint Conf. Artif. Intell. (Special Track on AI and Social Good), 2025, pp. 9790-9798. doi: 10.24963/ijcai.2025/1088
[23]K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” in Handbook of Reinforcement Learning and Control, K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, Eds. Cham: Springer, 2021, pp. 321-384. doi: 10.1007/978-3-030-60990-0_12
[24]A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PLoS ONE, vol. 12, no. 4, Article e0172395, April 2017. doi: 10.1371/journal.pone.0172395