1. Key Laboratory of Biomedical Information Engineering of Ministry of Education, Key Laboratory of Neuro-informatics & Rehabilitation Engineering of Ministry of Civil Affairs, and Institute of Health and Rehabilitation Science, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China 2. Lanzhou Center for Theoretical Physics and Key Laboratory of Theoretical Physics of Gansu Province, Lanzhou University, Lanzhou 730000, China
Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback−Leibler divergence and give the parameter position where the period-two appears.
J. Sumpter D., Collective Animal Behavior, Princeton University Press, 2010
2
Procaccini A. , Orlandi A. , Cavagna A. , Giardina I. , Zoratto F. , Santucci D. , Chiarotti F. , K. Hemelrijk C. , Alleva E. , Parisi G. , Carere C. , waves in starling Propagating . Sturnus vulgaris, flocks under predation. Anim. Behav., 2011, 82(4): 759 https://doi.org/10.1016/j.anbehav.2011.07.006
3
King H. , Ocko S. , Mahadevan L. . Termite mounds harness diurnal temperature oscillations for ventilation. Proc. Natl. Acad. Sci. USA, 2015, 112(37): 11589 https://doi.org/10.1073/pnas.1423242112
4
R. Reid C. , Latty T. . Collective behaviour and swarm intelligence in slime moulds. FEMS Microbiol. Rev., 2016, 40(6): 798 https://doi.org/10.1093/femsre/fuw033
5
T. Lin Y. , P. Han X. , K. Chen B. , Zhou J. , H. Wang B. . Evolution of innovative behaviors on scale-free networks. Front. Phys., 2018, 13(4): 130308 https://doi.org/10.1007/s11467-018-0767-1
6
M. Ying L. , Zhou J. , Tang M. , G. Guan S. , Zou Y. . Mean-field approximations of fixation time distributions of evolutionary game dynamics on graphs. Front. Phys., 2018, 13(1): 130201 https://doi.org/10.1007/s11467-017-0698-2
Murakami H. , S. Abe M. , Nishiyama Y. . Toward comparative collective behavior to discover fundamental mechanisms underlying behavior in human crowds and nonhuman animal groups. J. Robot. Mechatron., 2023, 35(4): 922 https://doi.org/10.20965/jrm.2023.p0922
9
B. Muratore I. , Garnier S. . Ontogeny of collective behaviour. Philos. Trans. R. Soc. Lond. B, 2023, 378(1874): 20220065 https://doi.org/10.1098/rstb.2022.0065
10
Liang Y. , P. Huang J. . Robustness of critical points in a complex adaptive system: Effects of hedge behavior. Front. Phys., 2013, 8(4): 461 https://doi.org/10.1007/s11467-013-0339-3
11
B. Arthur W., Inductive reasoning and bounded rationality, Am. Econ. Rev. 84(2), 406 (1994), 106th Annual Meeting of the American-Economic-Association, BOSTON, MA, JAN 03-05, 1994
Zhou T. , H. Wang B. , L. Zhou P. , X. Yang C. , Liu J. . Self-organized Boolean game on networks. Phys. Rev. E, 2005, 72(4): 046139 https://doi.org/10.1103/PhysRevE.72.046139
14
G. Huang Z. , Q. Zhang J. , Q. Dong J. , Huang L. , C. Lai Y. . Emergence of grouping in multi-resource minority game dynamics. Sci. Rep., 2012, 2(1): 703 https://doi.org/10.1038/srep00703
15
Q. Zhang J. , G. Huang Z. , Q. Dong J. , Huang L. , C. Lai Y. . Controlling collective dynamics in complex minority-game resource-allocation systems. Phys. Rev. E, 2013, 87(5): 052808 https://doi.org/10.1103/PhysRevE.87.052808
16
Q. Dong J. , G. Huang Z. , Huang L. , C. Lai Y. . Triple grouping and period-three oscillations in minority-game dynamics. Phys. Rev. E, 2014, 90(6): 062917 https://doi.org/10.1103/PhysRevE.90.062917
H. Li X. , Yang G. , P. Huang J. . Chaotic−periodic transition in a two-sided minority game. Front. Phys., 2016, 11(4): 118901 https://doi.org/10.1007/s11467-016-0552-y
19
Chen L. . Complex network minority game model for the financial market modeling and simulation. Complexity, 2020, 2020: 8877886 https://doi.org/10.1155/2020/8877886
20
Biswas S. , K. Mandal A. . Parallel Minority Game and its application in movement optimization during an epidemic. Physica A, 2021, 561: 125271 https://doi.org/10.1016/j.physa.2020.125271
Majumder B. , G. Venkatesh T. . Mobile data offloading based on minority game theoretic framework. Wirel. Netw., 2022, 28(7): 2967 https://doi.org/10.1007/s11276-022-02993-z
23
Linde J. , Gietl D. , Sonnemans J. , Tuinstra J. . The effect of quantity and quality of information in strategy tournaments. J. Econ. Behav. Organ., 2023, 211: 305 https://doi.org/10.1016/j.jebo.2023.04.024
24
Carlucci D. , Renna P. , Materi S. , Schiuma G. . Intelligent decision-making model based on minority game for resource allocation in cloud manufacturing. Manage. Decis., 2020, 58(11): 2305 https://doi.org/10.1108/MD-09-2019-1303
25
Swain A. , E. Fagan W. . Group size and decision making: experimental evidence for minority games in fish behaviour. Anim. Behav., 2019, 155: 9 https://doi.org/10.1016/j.anbehav.2019.05.017
26
Ritmeester T. , Meyer-Ortmanns H. . The cavity method for minority games between arbitrageurs on financial markets. J. Stat. Mech., 2022, 2022(4): 043403 https://doi.org/10.1088/1742-5468/ac6030
27
Deng Y. , Bao F. , Kong Y. , Ren Z. , Dai Q. . Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst., 2017, 28(3): 653 https://doi.org/10.1109/TNNLS.2016.2522401
28
Jiang Z.Xu D.Liang J., A deep reinforcement learning framework for the financial Portfolio management problem, arXiv: 1706.10059 (2017)
29
Yang H.Y. Liu X.Zhong S.Walid A., in: Proceedings of the First ACM International Conference on AI in Finance, ICAIF’20, Association for Computing Machinery, New York, NY, USA, 2021
30
A. Cruz J. , S. Wishart D. . Applications of machine learning in cancer prediction and prognosis. Cancer Inform., 2007, 2: 59 https://doi.org/10.1177/117693510600200030
31
J. Tompson J. , Jain A. , LeCun Y. , Bregler C. . Joint training of a convolutional network and a graphical model for human pose estimation. Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, 1: 1799 https://doi.org/10.5555/2968826.2969027
32
Silver D. , Huang A. , J. Maddison C. , Guez A. , Sifre L. , van den Driessche G. , Schrittwieser J. , Antonoglou I. , Panneershelvam V. , Lanctot M. , Dieleman S. , Grewe D. , Nham J. , Kalchbrenner N. , Sutskever I. , Lillicrap T. , Leach M. , Kavukcuoglu K. , Graepel T. , Hassabis D. . Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484 https://doi.org/10.1038/nature16961
33
Mnih V. , Kavukcuoglu K. , Silver D. , A. Rusu A. , Veness J. , G. Bellemare M. , Graves A. , Riedmiller M. , K. Fidjeland A. , Ostrovski G. , Petersen S. , Beattie C. , Sadik A. , Antonoglou I. , King H. , Kumaran D. , Wierstra D. , Legg S. , Hassabis D. . Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529 https://doi.org/10.1038/nature14236
34
Huang H. , Cai Y. , Xu H. , Yu H. . A multiagent minority-game-based demand-response management of smart buildings toward peak load reduction. IEEE Trans. Comput. Aided Des. Integrated Circ. Syst., 2017, 36(4): 573 https://doi.org/10.1109/TCAD.2016.2571847
35
Hessel M.Modayil J.Van Hasselt H.Schaul T.Ostrovski G.Dabney W.Horgan D.Piot B.Azar M.Silver D., in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 (2018)
36
P. Zhang S. , Q. Zhang J. , Chen L. , D. Liu X. . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301 https://doi.org/10.1007/s11071-019-05398-4
37
Wang L. , Jia D. , Zhang L. , Zhu P. , Perc M. , Shi L. , Wang Z. . Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning. Nonlinear Dyn., 2022, 108(2): 1837 https://doi.org/10.1007/s11071-022-07289-7
38
Xu J. , Wang L. , Liu Y. , Xue H. . Event-triggered optimal containment control for multi-agent systems subject to state constraints via reinforcement learning. Nonlinear Dyn., 2022, 109(3): 1651 https://doi.org/10.1007/s11071-022-07513-4
39
P. Zhang S. , Q. Dong J. , Liu L. , G. Huang Z. , Huang L. , C. Lai Y. . Reinforcement learning meets minority game: Toward optimal resource allocation. Phys. Rev. E, 2019, 99(3): 032302 https://doi.org/10.1103/PhysRevE.99.032302
40
P. Zhang S. , Q. Zhang J. , G. Huang Z. , H. Guo B. , X. Wu Z. , Wang J. . Collective behavior of artificial intelligence population: Transition from optimization to game. Nonlinear Dyn., 2019, 95(2): 1627 https://doi.org/10.1007/s11071-018-4649-4
41
P. Zhang S. , Q. Zhang J. , Chen L. , D. Liu X. . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301 https://doi.org/10.1007/s11071-019-05398-4
42
V. Banerjee A.Duflo E., Poor economics: A radical rethinking of the way to fight global poverty, Public Affairs, 2012
Cao M. , S. Morse A. , D. Anderson B. . Coordination of an asynchronous multi-agent system via averaging. IFAC Proceedings Volumes, 2005, 38(1): 17 https://doi.org/10.3182/20050703-6-CZ-1902.01081
45
L. Zeng H. , Alava M. , Aurell E. , Hertz J. , Roudi Y. . Maximum likelihood reconstruction for Ising models with asynchronous updates. Phys. Rev. Lett., 2013, 110(21): 210601 https://doi.org/10.1103/PhysRevLett.110.210601
46
Q. Zhang J. , G. Huang Z. , X. Wu Z. , Su R. , C. Lai Y. . Controlling herding in minority game systems. Sci. Rep., 2016, 6(1): 20925 https://doi.org/10.1038/srep20925
Nagy M. , Daruka I. , Vicsek T. . New aspects of the continuous phase transition in the scalar noise model (SNM) of collective motion. Physica A, 2007, 373: 445 https://doi.org/10.1016/j.physa.2006.05.035
51
M. Encinas J. , E. Fiore C. . Influence of distinct kinds of temporal disorder in discontinuous phase transitions. Phys. Rev. E, 2021, 103(3): 032124 https://doi.org/10.1103/PhysRevE.103.032124
52
D. Sokal A., Course 16 - Simulation of Statistical Mechanics Models, Elsevier, 2006