Soft-HGRNs：用于多智能体部分可观察场景的随机性层次图递归网络

doi:10.1631/FITEE.2200073

Frontiers of Information Technology & Electronic Engineering

2023, Vol. 24

Issue (1): 117-130 https://doi.org/10.1631/FITEE.2200073

本期目录

Soft-HGRNs：用于多智能体部分可观察场景的随机性层次图递归网络

任一翔¹(

), 叶振辉^1,²(

), 陈弈宁¹(

), 姜晓红², 宋广华¹(

)

¹. 浙江大学航空航天学院, 中国杭州市, 310027
². 浙江大学计算机科学与技术学院, 中国杭州市, 310027

Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments

Yixiang REN¹(

), Zhenhui YE^1,²(

), Yining CHEN¹(

), Xiaohong JIANG², Guanghua SONG¹(

)

¹. School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, China
². College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

全文: PDF(1214 KB)

摘要:

近年来，多智能体深度强化学习（multi-agent deep reinforcement learning，MADRL）的研究进展使其在现实世界的任务中更加实用，但其相对较差的可扩展性和部分可观测的限制为MADRL模型的性能和部署带来了更多的挑战。人类社会可以被视为一个大规模的部分可观测环境，其中每个人都具备与他人交流并记忆经验的功能。基于人类社会的启发，我们提出一种新的网络结构，称为层次图递归网络（hierarchical graph recurrent network，HGRN），用于部分可观测环境下的多智能体合作任务。具体来说，我们将多智能体系统构建为一个图，利用新颖的图卷积结构来实现异构相邻智能体之间的通信，并采用一个递归单元来使智能体具备记忆历史信息的能力。为了鼓励智能体探索并提高模型的鲁棒性，我们进而设计一种最大熵学习方法，令智能体可以学习可配置目标行动熵的随机策略。基于上述技术，我们提出一种名为Soft-HGRN的基于值的MADRL算法，及其名为SAC-HGRN的actor-critic变体。在三个同构场景和一个异构环境中进行实验；实验结果不仅表明我们的方法相比四个MADRL基线取得了明显的改进，而且证明了所提模型的可解释性、可扩展性和可转移性。

Abstract：

The recent progress in multi-agent deep reinforcement learning (MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment. Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment, where everyone has the functions of communicating with neighbors and remembering his/her own experience, we propose a novel network structure called the hierarchical graph recurrent network (HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents, and adopt a recurrent unit to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy. Based on the above technologies, we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines, but also demonstrate the interpretability, scalability, and transferability of the proposed model.

Key words： Deep reinforcement learning Graph-based communication Maximum-entropy learning Partial observability Heterogeneous settings

收稿日期: 2022-02-25 出版日期: 2023-07-05

通讯作者: 宋广华 E-mail: yixiangren@zju.edu.cn;zhenhuiye@zju.edu.cn;ch19930611@zju.edu.cn;ghsong@zju.edu.cn

Corresponding Author(s): Guanghua SONG

引用本文:

任一翔, 叶振辉, 陈弈宁, 姜晓红, 宋广华. Soft-HGRNs：用于多智能体部分可观察场景的随机性层次图递归网络[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(1): 117-130.
Yixiang REN, Zhenhui YE, Yining CHEN, Xiaohong JIANG, Guanghua SONG. Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments. Front. Inform. Technol. Electron. Eng, 2023, 24(1): 117-130.

链接本文:

https://academic.hep.com.cn/fitee/CN/10.1631/FITEE.2200073
https://academic.hep.com.cn/fitee/CN/Y2023/V24/I1/117

[1]	FITEE-0117-23007-YXR_suppl_1	Download
[2]	FITEE-0117-23007-YXR_suppl_2	Download

Viewed

Full text

Abstract

Cited

Shared

Discussed