|
|
Open and real-world human-AI coordination by heterogeneous training with communication |
Cong GUAN1,2, Ke XUE1,2, Chunpeng FAN3, Feng CHEN1,2, Lichao ZHANG3, Lei YUAN1,2,3, Chao QIAN1,3, Yang YU1,2,3( ) |
1. National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2. School of Artificial Intelligence, Nanjing University, Nanjing 210023, China 3. Polixir Technologies, Nanjing 211106, China |
|
|
Abstract Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.
|
Keywords
human-AI coordination
multi-agent reinforcement learning
communication
open-environment coordination
real-world coordination
|
Corresponding Author(s):
Yang YU
|
Just Accepted Date: 22 March 2024
Issue Date: 05 June 2024
|
|
1 |
G, Klein D D, Woods J M, Bradshaw R R, Hoffman P J Feltovich . Ten challenges for making automation a “team player” in joint human-agent activity. IEEE Intelligent Systems, 2004, 19( 6): 91–95
|
2 |
A, Dafoe Y, Bachrach G, Hadfield E, Horvitz K, Larson T Graepel . Cooperative AI: machines must learn to find common ground. Nature, 2021, 593( 7857): 33–36
|
3 |
P, Hernandez-Leal B, Kartal M E Taylor . A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2019, 33( 6): 750–797
|
4 |
W, Du S F Ding . A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artificial Intelligence Review, 2021, 54( 5): 3215–3238
|
5 |
A, Oroojlooy D Hajinezhad . A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53( 11): 13677–13722
|
6 |
R, Lowe Y, Wu A, Tamar J, Harb P, Abbeel I Mordatch . Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6382−6393
|
7 |
P, Sunehag G, Lever A, Gruslys W M, Czarnecki V, Zambaldi M, Jaderberg M, Lanctot N, Sonnerat J Z, Leibo K, Tuyls T Graepel . Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 2085−2087
|
8 |
T, Rashid M, Samvelyan C, Schroeder G, Farquhar J, Foerster S Whiteson . QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 4295−4304
|
9 |
C, Yu A, Velu E, Vinitsky J, Gao Y, Wang A M, Bayen Y Wu . The surprising effectiveness of PPO in cooperative multi-agent games. In: Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022, 24611−24624
|
10 |
R, Gorsane O, Mahjoub Kock R J, De R, Dubb S, Singh A Pretorius . Towards a standardised performance evaluation protocol for cooperative marl. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 2022, 5510−5521
|
11 |
H, Hu A, Lerer A, Peysakhovich J Foerster . “Other-play” for zero-shot coordination. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 409
|
12 |
M, Carroll R, Shah M K, Ho T, Griffiths S A, Seshia P, Abbeel A Dragan . On the utility of learning about humans for human-AI coordination. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 465
|
13 |
L, Yuan L, Li Z, Zhang F, Chen T, Zhang C, Guan Y, Yu Z H Zhou . Learning to coordinate with anyone. In: Proceedings of the 5th International Conference on Distributed Artificial Intelligence, 2023, 4
|
14 |
Z H Zhou . Open-environment machine learning. National Science Review, 2022, 9( 8): nwac123
|
15 |
X, Liu J, Liang D Y, Liu R, Chen S M Yuan . Weapon-target assignment in unreliable peer-to-peer architecture based on adapted artificial bee colony algorithm. Frontiers of Computer Science, 2022, 16( 1): 161103
|
16 |
J, Parmar S, Chouhan V, Raychoudhury S Rathore . Open-world machine learning: applications, challenges, and opportunities. ACM Computing Surveys, 2023, 55( 10): 205
|
17 |
L, Yuan Z, Zhang L, Li C, Guan Y Yu . A survey of progress on cooperative multi-agent reinforcement learning in open environment. 2023, arXiv preprint arXiv: 2312.01058
|
18 |
P, Stone G A, Kaminka S, Kraus J S Rosenschein . Ad hoc autonomous agent teams: Collaboration without pre-coordination. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1504−1509
|
19 |
R, Mirsky I, Carlucho A, Rahman E, Fosong W, Macke M, Sridharan P, Stone S V Albrecht . A survey of ad Hoc teamwork research. In: Proceedings of the 19th European Conference on Multi-Agent Systems. 2022, 275−293
|
20 |
A, Lupu B, Cui H, Hu J Foerster . Trajectory diversity for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 7204−7213
|
21 |
D J, Strouse K R, McKee M, Botvinick E, Hughes R Everett . Collaborating with humans without human data. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 14502−14515
|
22 |
R, Zhao J, Song Y, Yuan H, Hu Y, Gao Y, Wu Z, Sun W Yang . Maximum entropy population-based training for zero-shot human-AI coordination. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 689
|
23 |
C, Yu J, Gao W, Liu B, Xu H, Tang J, Yang Y, Wang Y Wu . Learning zero-shot cooperation with humans, assuming humans are biased. In: Proceedings of the 11th International Conference on Learning Representations. 2023
|
24 |
X, Wang S, Zhang W, Zhang W, Dong J, Chen Y, Wen W Zhang . Quantifying zero-shot coordination capability with behavior preferring partners. In: Proceedings of the 12th International Conference on Learning Representations. 2024
|
25 |
S, Kapetanakis D Kudenko . Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems. 2004, 1258−1259
|
26 |
C, Wang C, Pérez-D’Arpino D, Xu F F, Li K, Liu S Savarese . Co-GAIL: Learning diverse strategies for human-robot collaboration. In: Proceedings of the 5th Conference on Robot Learning. 2022, 1279−1290
|
27 |
K, Xue Y, Wang C, Guan L, Yuan H, Fu Q, Fu C, Qian Y Yu . Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957
|
28 |
C, Cabrera A, Paleyes P, Thodoroff N D Lawrence . Real-world machine learning systems: a survey from a data-oriented architecture perspective. 2023, arXiv preprint arXiv: 2302.04810
|
29 |
T H, Davenport R Ronanki . Artificial intelligence for the real world. Harvard Business Review, 2018, 96(1): 108−116
|
30 |
M C, Fontaine Y C, Hsu Y, Zhang B, Tjanaka S Nikolaidis . On the importance of environments in human-robot coordination. In: Proceedings of the 17th Robotics: Science and Systems 2021. 2021
|
31 |
L, Busoniu R, Babuska Schutter B De . A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38( 2): 156–172
|
32 |
K, Zhang Z, Yang T Başar . Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis K G, Wan Y, Lewis F L, Cansever D, eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021, 321−384
|
33 |
G, Sartoretti J, Kerr Y, Shi G, Wagner T K S, Kumar S, Koenig H Choset . Primal: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4( 3): 2378–2385
|
34 |
J, Wang W, Xu Y, Gu W, Song T C Green . Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Proceedings of the 35th Conference on Advances in Neural Information Processing Systems. 2021, 3271−3284
|
35 |
K, Xue J, Xu L, Yuan M, Li C, Qian Z, Zhang Y Yu . Multi-agent dynamic algorithm configuration. In: Proceedings of the 36th Conference on Advances in Neural Information Processing Systems. 2022, 20147−20161
|
36 |
M, Wen J G, Kuba R, Lin W, Zhang Y, Wen J, Wang Y Yang . Multi-agent reinforcement learning is a sequence modeling problem. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 16509−16521
|
37 |
M, Samvelyan T, Rashid Witt C S, De G, Farquhar N, Nardelli T G J, Rudner C, Hung P H S, Torr J N, Foerster S Whiteson . The starcraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 2186−2188
|
38 |
N, Bard J N, Foerster S, Chandar N, Burch M, Lanctot H F, Song E, Parisotto V, Dumoulin S, Moitra E, Hughes I, Dunning S, Mourad H, Larochelle M G, Bellemare M Bowling . The hanabi challenge: A new frontier for AI research. Artificial Intelligence, 2020, 280: 103216
|
39 |
C, Zhu M, Dastani S Wang . A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975
|
40 |
F, Zhang C, Jia Y C, Li L, Yuan Y, Yu Z Zhang . Discovering generalizable multi-agent coordination skills from multi-task offline data. In: Proceedings of the 11th International Conference on Learning Representations. 2023
|
41 |
X, Wang Z, Zhang W Zhang . Model-based multi-agent reinforcement learning: Recent progress and prospects. 2022, arXiv preprint arXiv: 2203.10603
|
42 |
J, Guo Y, Chen Y, Hao Z, Yin Y, Yu S Li . Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2022
|
43 |
L, Yuan Z, Zhang K, Xue H, Yin F, Chen C, Guan L, Li C, Qian Y Yu . Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 1319
|
44 |
J N, Foerster Y M, Assael Freitas N, De S Whiteson . Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2145−2153
|
45 |
S, Sukhbaatar A, Szlam R Fergus . Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2252−2260
|
46 |
Z, Ding T, Huang Z Lu . Learning individually inferred communication for multi-agent cooperation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1851
|
47 |
H, Mao Z, Zhang Z, Xiao Z, Gong Y Ni . Learning agent communication under limited bandwidth by message pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 5142−5149
|
48 |
L, Yuan J, Wang F, Zhang C, Wang Z, Zhang Y, Yu C Zhang . Multi-agent incentive communication via decentralized teammate modeling. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 9466−9474
|
49 |
S Q, Zhang Q, Zhang J Lin . Efficient communication in multi-agent reinforcement learning via variance based control. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 291
|
50 |
S Q, Zhang Q, Zhang J Lin . Succinct and robust multi-agent communication with temporal message control. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1449
|
51 |
C, Guan F, Chen L, Yuan C, Wang H, Yin Z, Zhang Y Yu . Efficient multi-agent communication via self-supervised information aggregation. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 1020−1033
|
52 |
A, Das T, Gervet J, Romoff D, Batra D, Parikh M, Rabbat J Pineau . TarMAC: Targeted multi-agent communication. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1538−1546
|
53 |
C, Guan F, Chen L, Yuan Z, Zhang Y Yu . Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. 2023, arXiv preprint arXiv: 2302.09605
|
54 |
L, Yuan T, Jiang L, Li F, Chen Z, Zhang Y Yu . Robust multi-agent communication via multi-view message certification. 2023, arXiv preprint arXiv: 2305.13936
|
55 |
L, Yuan F, Chen Z, Zhang Y Yu . Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation. Frontiers of Computer Science, 2024, 18( 6): 186331
|
56 |
J, Gwak J, Jung R, Oh M, Park M A K, Rakhimov J Ahn . A review of intelligent self-driving vehicle software research. KSII Transactions on Internet and Information Systems (TIIS), 2019, 13( 11): 5299–5320
|
57 |
O M, Andrychowicz B, Baker M, Chociej R, Józefowicz B, McGrew J, Pachocki A, Petron M, Plappert G, Powell A, Ray J, Schneider S, Sidor J, Tobin P, Welinder L L, Weng W Zaremba . Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 2020, 39( 1): 3–20
|
58 |
D C Engelbart . Augmenting human intellect: a conceptual framework. Stanford Research Institute, 2023
|
59 |
S, Carter M Nielsen . Using artificial intelligence to augment human intelligence. Distill, 2017, 2( 12): e9
|
60 |
H, Hu A, Lerer B, Cui L, Pineda N, Brown J N Foerster . Off-belief learning. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4369−4379
|
61 |
J, Treutlein M, Dennis C, Oesterheld J Foerster . A new formalism, method and open issues for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 10413−10423
|
62 |
Y, Li S, Zhang J, Sun Y, Du Y, Wen X, Wang W Pan . Cooperative open-ended learning framework for zero-shot coordination. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 844
|
63 |
F A, Oliehoek C Amato . A Concise Introduction to Decentralized POMDPs. Cham: Springer, 2016
|
64 |
W, Xue W, Qiu B, An Z, Rabinovich S, Obraztsova C K Yeo . Mis-spoke or mis-lead: Achieving robustness in multi-agent communicative reinforcement learning. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 1418−1426
|
65 |
D, Silver T, Hubert J, Schrittwieser I, Antonoglou M, Lai A, Guez M, Lanctot L, Sifre D, Kumaran T, Graepel T, Lillicrap K, Simonyan D Hassabis . Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017, arXiv preprint arXiv: 1712.01815
|
66 |
G Tesauro . TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6( 2): 215–219
|
67 |
M, Jaderberg V, Dalibard S, Osindero W M, Czarnecki J, Donahue A, Razavi O, Vinyals T, Green I, Dunning K, Simonyan C, Fernando K Kavukcuoglu . Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846
|
68 |
K, Lucas R E Allen . Any-play: an intrinsic augmentation for zero shot coordination. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 853–861
|
69 |
W U, Mondal M, Agarwal V, Aggarwal S V Ukkusuri . On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC). Journal of Machine Learning Research, 2022, 23( 1): 129
|
70 |
J G, Kuba X, Feng S, Ding H, Dong J, Wang Y Yang . Heterogeneous-agent mirror learning: A continuum of solutions to cooperative MARL. 2022, arXiv preprint arXiv: 2208.01682
|
71 |
R, Charakorn P, Manoonpong N Dilokthanakul . Generating diverse cooperative agents by learning incompatible policies. In: Proceedings of the 11th International Conference on Learning Representations. 2023
|
72 |
X, Lou J, Guo J, Zhang J, Wang K, Huang Y Du . PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination. In: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems. 2023, 679−688
|
73 |
S, Zheng A, Trott S, Srinivasa N, Naik M, Gruesbeck D C, Parkes R Socher . The AI economist: Improving equality and productivity with AI-Driven tax policies. 2020, arXiv preprint arXiv: 2004. 13332
|
74 |
T Bäck . Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. New York: Oxford University Press, 1996
|
75 |
H, Hao X, Zhang A Zhou . Enhancing SAEAs with unevaluated solutions: A case study of relation model for expensive optimization. Science China Information Sciences, 2024, 67( 2): 120103
|
76 |
Y, Wang K, Xue C Qian . Evolutionary diversity optimization with clustering-based selection for reinforcement learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022
|
77 |
J Demšar . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|