Please wait a minute...
Frontiers of Physics

ISSN 2095-0462

ISSN 2095-0470(Online)

CN 11-5994/O4

Postal Subscription Code 80-965

2018 Impact Factor: 2.483

Front. Phys.    2024, Vol. 19 Issue (4) : 40201    https://doi.org/10.1007/s11467-023-1378-z
Self organizing optimization and phase transition in reinforcement learning minority game system
Si-Ping Zhang1, Jia-Qi Dong2, Hui-Yu Zhang1, Yi-Xuan Lü1, Jue Wang1, Zi-Gang Huang1()
1. Key Laboratory of Biomedical Information Engineering of Ministry of Education, Key Laboratory of Neuro-informatics & Rehabilitation Engineering of Ministry of Civil Affairs, and Institute of Health and Rehabilitation Science, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
2. Lanzhou Center for Theoretical Physics and Key Laboratory of Theoretical Physics of Gansu Province, Lanzhou University, Lanzhou 730000, China
 Download: PDF(6336 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback−Leibler divergence and give the parameter position where the period-two appears.

Keywords oscillatory evolution      collective behaviors      phase transition      reinforcement learning      minority game     
Corresponding Author(s): Zi-Gang Huang   
Issue Date: 19 January 2024
 Cite this article:   
Si-Ping Zhang,Jia-Qi Dong,Hui-Yu Zhang, et al. Self organizing optimization and phase transition in reinforcement learning minority game system[J]. Front. Phys. , 2024, 19(4): 40201.
 URL:  
https://academic.hep.com.cn/fop/EN/10.1007/s11467-023-1378-z
https://academic.hep.com.cn/fop/EN/Y2024/V19/I4/40201
Fig.1  The flow chart of protocol for Q-learning minority games. Green arrows indicates that agent i takes action following h-function in the logic diagram. δ is an arbitrarily given small constant and δ>0.
Fig.2  Curves of σ as the function of the exploration rate ? with different learning rates and discount factors. The parameters α, γ and the system size N in each subplot are (a) γ=0.7 and N=10000; (b) γ=0.9 and N=10000; (c) α=0.9 and N=10000; (d) α=0.3 and γ=0.9. The straight line in each panel represents the standard deviation of a binomial random system, i.e., ?=1. (d) Three lines of different colors represent different system sizes. The Q function of each agent is initialized randomly before simulations.
Fig.3  The evolutionary time series ρ1(t) for the different ?. The exploration rates ?s are smaller than ?c in (a, b). The time series close to the transition point ?c is exhibited in (c). Those for ?[?c,1] are shown in (d?h). The red dashed line indicates the average occupation ratio Cr/N. The learning parameters and the number of agents are identical, i.e. α=0.9, γ=0.9, and N=10000.
Fig.4  Examples of several oscillating modes of resource load ρr are in the RLMG system, respectively ?=0.4, ?=0.2, ?=0.1, ?c0.235. The learning parameters are α=0.3 and γ=0.9, and the system size is N=10000. (a) A two period oscillation mode. (b, c) A non-period oscillation with different amplitude A. (d) The switch between period-two and other oscillation modes near ?c. (e) The power spectrum of ρr with the period τ.
Fig.5  Some phenomena near the phase transition point ?c include critical slowing down, hysteresis like loop, Binder cumulant moment, etc. The learning parameters are α=0.3 and γ=0.9, and the system size is N=10000. (a) Binder cumulant moment Bc(?) as a function of explore rate ?, the insert is Ωr versus ?. (b) The critical state lifetime is taken as an exponential function of system size N, namely τqexp?(νN). The solid green line is the fitting data. (c) σ/N with the hysteresis loop shown by ?, the blue solid circle is increasing for ? and the yellow square is decreasing for ?. (d) The gap Δgap of σ/N with system size N between bistable states near ?c. The solid yellow line is the fitting data.
Fig.6  Evolution scheme of belief mode in two-resource RLMG system. There are two types evolution paths of belief mode: mR-process and tE-process. In mR-process, the evolution paths of belief mode are represented by solid lines (the top layer). In tE-process, the evolution paths are represented by dash lines (the down layer). The arrows between the layers indicate that the AI agent’s belief mode itself is maintained through either mR-process or tE-process. At the end of the arrow, the element Qsa of Q-function is updated when the mR-process or tE-process occurred.
Fig.7  The initialization sensitivity and the self-organization of optimal resources allocation. The learning parameters are α=0.3 and γ=0.9, and the system size is N=10000. (a) The initial occupation ratio of resource 1 is 0.1, i.e., ρ1(0)=0.1, and the Q-function has different initializations, which are [1,0;0,1], [0,1;1,0], and random in intervals (0,1). (b?d) The system has a given initialization Q-function and the initial occupation ratios ρ1(0) are different, respectively, ρ1(0)=0.1, ρ1(0)=0.3 and ρ1(0)=1. (b) The Q-function is randomly initialized in interval (0,1). (c) Q(0)=[1,0,0,1], (d) Q(0)=[0,1,1,0].
Fig.8  As ? moves away from 1, the period-two oscillatory collective behavior of the system can be observed gradually. (a) It is statistical distribution of the resource load ρr for different system sizes, based on the ensemble average of a longer time series when the system reaches steady state, different color curves correspond to different exploration rates ?, the red empty symbol is ?=0.76, the green is ?=0.85 and the blue shade ?=1 (A completely random system). The system shares learning parameters α=0.3 and γ=0.9. (b) It is KL Divergence of resource load ρr distributed between the given exploration rate ?1 and a random system for different combinations of learning parameters α and γ. The insert shows the trend of KL deviation in linear coordinates without re-scaling. The system size N=10000.
  Fig.A1 Evolution snapshot that the agent’s occupancy ratio of four belief modes for the RLMG systems in time step t=1, t=50, t=257, t=600, respectively. The learning parameters is α=0.3, γ=0.9 and the system size N=10000. All agents are fixed to nodes of the regular grid in the RLMG system. If the agent’s belief mode is s|a, its corresponding node is highlighted in grid with belief mode s|a. The color indicates belief strength B [Eq. (9)]. (a1?d1) The grids with A|A belief mode. (a2?d2) The grids with A|B belief mode. (a3?d3) The grids with B|A belief mode. (a4?d4) The grids with B|B belief mode. The number at the top of each subplot indicates the occupancy density of the belief at the current moment.
1 J. Sumpter D., Collective Animal Behavior, Princeton University Press, 2010
2 Procaccini A. , Orlandi A. , Cavagna A. , Giardina I. , Zoratto F. , Santucci D. , Chiarotti F. , K. Hemelrijk C. , Alleva E. , Parisi G. , Carere C. , waves in starling Propagating . Sturnus vulgaris, flocks under predation. Anim. Behav., 2011, 82(4): 759
https://doi.org/10.1016/j.anbehav.2011.07.006
3 King H. , Ocko S. , Mahadevan L. . Termite mounds harness diurnal temperature oscillations for ventilation. Proc. Natl. Acad. Sci. USA, 2015, 112(37): 11589
https://doi.org/10.1073/pnas.1423242112
4 R. Reid C. , Latty T. . Collective behaviour and swarm intelligence in slime moulds. FEMS Microbiol. Rev., 2016, 40(6): 798
https://doi.org/10.1093/femsre/fuw033
5 T. Lin Y. , P. Han X. , K. Chen B. , Zhou J. , H. Wang B. . Evolution of innovative behaviors on scale-free networks. Front. Phys., 2018, 13(4): 130308
https://doi.org/10.1007/s11467-018-0767-1
6 M. Ying L. , Zhou J. , Tang M. , G. Guan S. , Zou Y. . Mean-field approximations of fixation time distributions of evolutionary game dynamics on graphs. Front. Phys., 2018, 13(1): 130201
https://doi.org/10.1007/s11467-017-0698-2
7 T. Ouellette N. . A physics perspective on collective animal behavior. Phys. Biol., 2022, 19(2): 021004
https://doi.org/10.1088/1478-3975/ac4bef
8 Murakami H. , S. Abe M. , Nishiyama Y. . Toward comparative collective behavior to discover fundamental mechanisms underlying behavior in human crowds and nonhuman animal groups. J. Robot. Mechatron., 2023, 35(4): 922
https://doi.org/10.20965/jrm.2023.p0922
9 B. Muratore I. , Garnier S. . Ontogeny of collective behaviour. Philos. Trans. R. Soc. Lond. B, 2023, 378(1874): 20220065
https://doi.org/10.1098/rstb.2022.0065
10 Liang Y. , P. Huang J. . Robustness of critical points in a complex adaptive system: Effects of hedge behavior. Front. Phys., 2013, 8(4): 461
https://doi.org/10.1007/s11467-013-0339-3
11 B. Arthur W., Inductive reasoning and bounded rationality, Am. Econ. Rev. 84(2), 406 (1994), 106th Annual Meeting of the American-Economic-Association, BOSTON, MA, JAN 03-05, 1994
12 Challet D. , Zhang Y. . Emergence of cooperation and organization in an evolutionary game. Physica A, 1997, 246(3‒4): 407
https://doi.org/10.1016/S0378-4371(97)00419-6
13 Zhou T. , H. Wang B. , L. Zhou P. , X. Yang C. , Liu J. . Self-organized Boolean game on networks. Phys. Rev. E, 2005, 72(4): 046139
https://doi.org/10.1103/PhysRevE.72.046139
14 G. Huang Z. , Q. Zhang J. , Q. Dong J. , Huang L. , C. Lai Y. . Emergence of grouping in multi-resource minority game dynamics. Sci. Rep., 2012, 2(1): 703
https://doi.org/10.1038/srep00703
15 Q. Zhang J. , G. Huang Z. , Q. Dong J. , Huang L. , C. Lai Y. . Controlling collective dynamics in complex minority-game resource-allocation systems. Phys. Rev. E, 2013, 87(5): 052808
https://doi.org/10.1103/PhysRevE.87.052808
16 Q. Dong J. , G. Huang Z. , Huang L. , C. Lai Y. . Triple grouping and period-three oscillations in minority-game dynamics. Phys. Rev. E, 2014, 90(6): 062917
https://doi.org/10.1103/PhysRevE.90.062917
17 Cuesta A. , Abreu O. , Alvear D. . Methods for measuring collective behaviour in evacuees. Saf. Sci., 2016, 88: 54
https://doi.org/10.1016/j.ssci.2016.04.021
18 H. Li X. , Yang G. , P. Huang J. . Chaotic−periodic transition in a two-sided minority game. Front. Phys., 2016, 11(4): 118901
https://doi.org/10.1007/s11467-016-0552-y
19 Chen L. . Complex network minority game model for the financial market modeling and simulation. Complexity, 2020, 2020: 8877886
https://doi.org/10.1155/2020/8877886
20 Biswas S. , K. Mandal A. . Parallel Minority Game and its application in movement optimization during an epidemic. Physica A, 2021, 561: 125271
https://doi.org/10.1016/j.physa.2020.125271
21 Ritmeester T. , Meyer-Ortmanns H. . Minority games played by arbitrageurs on the energy market. Physica A, 2021, 573: 125927
https://doi.org/10.1016/j.physa.2021.125927
22 Majumder B. , G. Venkatesh T. . Mobile data offloading based on minority game theoretic framework. Wirel. Netw., 2022, 28(7): 2967
https://doi.org/10.1007/s11276-022-02993-z
23 Linde J. , Gietl D. , Sonnemans J. , Tuinstra J. . The effect of quantity and quality of information in strategy tournaments. J. Econ. Behav. Organ., 2023, 211: 305
https://doi.org/10.1016/j.jebo.2023.04.024
24 Carlucci D. , Renna P. , Materi S. , Schiuma G. . Intelligent decision-making model based on minority game for resource allocation in cloud manufacturing. Manage. Decis., 2020, 58(11): 2305
https://doi.org/10.1108/MD-09-2019-1303
25 Swain A. , E. Fagan W. . Group size and decision making: experimental evidence for minority games in fish behaviour. Anim. Behav., 2019, 155: 9
https://doi.org/10.1016/j.anbehav.2019.05.017
26 Ritmeester T. , Meyer-Ortmanns H. . The cavity method for minority games between arbitrageurs on financial markets. J. Stat. Mech., 2022, 2022(4): 043403
https://doi.org/10.1088/1742-5468/ac6030
27 Deng Y. , Bao F. , Kong Y. , Ren Z. , Dai Q. . Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst., 2017, 28(3): 653
https://doi.org/10.1109/TNNLS.2016.2522401
28 Jiang Z.Xu D.Liang J., A deep reinforcement learning framework for the financial Portfolio management problem, arXiv: 1706.10059 (2017)
29 Yang H.Y. Liu X.Zhong S.Walid A., in: Proceedings of the First ACM International Conference on AI in Finance, ICAIF’20, Association for Computing Machinery, New York, NY, USA, 2021
30 A. Cruz J. , S. Wishart D. . Applications of machine learning in cancer prediction and prognosis. Cancer Inform., 2007, 2: 59
https://doi.org/10.1177/117693510600200030
31 J. Tompson J. , Jain A. , LeCun Y. , Bregler C. . Joint training of a convolutional network and a graphical model for human pose estimation. Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014, 1: 1799
https://doi.org/10.5555/2968826.2969027
32 Silver D. , Huang A. , J. Maddison C. , Guez A. , Sifre L. , van den Driessche G. , Schrittwieser J. , Antonoglou I. , Panneershelvam V. , Lanctot M. , Dieleman S. , Grewe D. , Nham J. , Kalchbrenner N. , Sutskever I. , Lillicrap T. , Leach M. , Kavukcuoglu K. , Graepel T. , Hassabis D. . Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484
https://doi.org/10.1038/nature16961
33 Mnih V. , Kavukcuoglu K. , Silver D. , A. Rusu A. , Veness J. , G. Bellemare M. , Graves A. , Riedmiller M. , K. Fidjeland A. , Ostrovski G. , Petersen S. , Beattie C. , Sadik A. , Antonoglou I. , King H. , Kumaran D. , Wierstra D. , Legg S. , Hassabis D. . Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529
https://doi.org/10.1038/nature14236
34 Huang H. , Cai Y. , Xu H. , Yu H. . A multiagent minority-game-based demand-response management of smart buildings toward peak load reduction. IEEE Trans. Comput. Aided Des. Integrated Circ. Syst., 2017, 36(4): 573
https://doi.org/10.1109/TCAD.2016.2571847
35 Hessel M.Modayil J.Van Hasselt H.Schaul T.Ostrovski G.Dabney W.Horgan D.Piot B.Azar M.Silver D., in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 (2018)
36 P. Zhang S. , Q. Zhang J. , Chen L. , D. Liu X. . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301
https://doi.org/10.1007/s11071-019-05398-4
37 Wang L. , Jia D. , Zhang L. , Zhu P. , Perc M. , Shi L. , Wang Z. . Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning. Nonlinear Dyn., 2022, 108(2): 1837
https://doi.org/10.1007/s11071-022-07289-7
38 Xu J. , Wang L. , Liu Y. , Xue H. . Event-triggered optimal containment control for multi-agent systems subject to state constraints via reinforcement learning. Nonlinear Dyn., 2022, 109(3): 1651
https://doi.org/10.1007/s11071-022-07513-4
39 P. Zhang S. , Q. Dong J. , Liu L. , G. Huang Z. , Huang L. , C. Lai Y. . Reinforcement learning meets minority game: Toward optimal resource allocation. Phys. Rev. E, 2019, 99(3): 032302
https://doi.org/10.1103/PhysRevE.99.032302
40 P. Zhang S. , Q. Zhang J. , G. Huang Z. , H. Guo B. , X. Wu Z. , Wang J. . Collective behavior of artificial intelligence population: Transition from optimization to game. Nonlinear Dyn., 2019, 95(2): 1627
https://doi.org/10.1007/s11071-018-4649-4
41 P. Zhang S. , Q. Zhang J. , Chen L. , D. Liu X. . Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn., 2020, 99(4): 3301
https://doi.org/10.1007/s11071-019-05398-4
42 V. Banerjee A.Duflo E., Poor economics: A radical rethinking of the way to fight global poverty, Public Affairs, 2012
43 J. Watkins C. , Dayan P. . Q-learning. Mach. Learn., 1992, 8: 279
https://doi.org/10.1007/bf00992698
44 Cao M. , S. Morse A. , D. Anderson B. . Coordination of an asynchronous multi-agent system via averaging. IFAC Proceedings Volumes, 2005, 38(1): 17
https://doi.org/10.3182/20050703-6-CZ-1902.01081
45 L. Zeng H. , Alava M. , Aurell E. , Hertz J. , Roudi Y. . Maximum likelihood reconstruction for Ising models with asynchronous updates. Phys. Rev. Lett., 2013, 110(21): 210601
https://doi.org/10.1103/PhysRevLett.110.210601
46 Q. Zhang J. , G. Huang Z. , X. Wu Z. , Su R. , C. Lai Y. . Controlling herding in minority game systems. Sci. Rep., 2016, 6(1): 20925
https://doi.org/10.1038/srep20925
47 Binder K. . Theory of first-order phase transitions. Rep. Prog. Phys., 1987, 50(7): 783
https://doi.org/10.1088/0034-4885/50/7/001
48 Binder K. . Applications of Monte Carlo methods to statistical physics. Rep. Prog. Phys., 1997, 60(5): 487
https://doi.org/10.1088/0034-4885/60/5/001
49 Grégoire G. , Chaté H. . Onset of collective and cohesive motion. Phys. Rev. Lett., 2004, 92(2): 025702
https://doi.org/10.1103/PhysRevLett.92.025702
50 Nagy M. , Daruka I. , Vicsek T. . New aspects of the continuous phase transition in the scalar noise model (SNM) of collective motion. Physica A, 2007, 373: 445
https://doi.org/10.1016/j.physa.2006.05.035
51 M. Encinas J. , E. Fiore C. . Influence of distinct kinds of temporal disorder in discontinuous phase transitions. Phys. Rev. E, 2021, 103(3): 032124
https://doi.org/10.1103/PhysRevE.103.032124
52 D. Sokal A., Course 16 - Simulation of Statistical Mechanics Models, Elsevier, 2006
[1] Shiyao Pan, Zeyu Li, Yulei Han. Electric-field-tunable topological phases in valley-polarized quantum anomalous Hall systems with inequivalent exchange fields[J]. Front. Phys. , 2025, 20(1): 14207-.
[2] Urszula D. Wdowik, Michal Vališka, Andrej Cabala, Fedir Borodavka, Erika Samolová, Dominik Legut. Raman spectroscopy and pressure-induced structural phase transition in UTe2[J]. Front. Phys. , 2025, 20(1): 14204-.
[3] Yuting Tan, Dao-Xin Yao. Spin waves and phase transition on a magnetically frustrated square lattice with long-range interactions[J]. Front. Phys. , 2023, 18(3): 33309-.
[4] San-Dong Guo, Yu-Ling Tao, Wen-Qi Mu, Bang-Gui Liu. Correlation-driven threefold topological phase transition in monolayer OsBr2[J]. Front. Phys. , 2023, 18(3): 33304-.
[5] Jing-Kai Fang, Jun-Han Huang, Han-Qing Wu, Dao-Xin Yao. Dynamical properties of the Haldane chain with bond disorder[J]. Front. Phys. , 2022, 17(3): 33503-.
[6] Jorge A. López, Claudio O. Dorso, Guillermo Frank. Properties of nuclear pastas[J]. Front. Phys. , 2021, 16(2): 24301-.
[7] Zhen-Ming Xu (许震明). Analytic phase structures and thermodynamic curvature for the charged AdS black hole in alternative phase space[J]. Front. Phys. , 2021, 16(2): 24502-.
[8] Lu Qi, Guo-Li Wang, Shutian Liu, Shou Zhang, Hong-Fu Wang. Dissipation-induced topological phase transition and periodic-driving-induced photonic topological state transfer in a small optomechanical lattice[J]. Front. Phys. , 2021, 16(1): 12503-.
[9] Shuang Zhou, Lu You, Hailin Zhou, Yong Pu, Zhigang Gui, Junling Wang. Van der Waals layered ferroelectric CuInP2S6: Physical properties and device applications[J]. Front. Phys. , 2021, 16(1): 13301-.
[10] Jin-Bo Wang, Rao Huang, Yu-Hua Wen. Thermally activated phase transitions in Fe-Ni core-shell nanoparticles[J]. Front. Phys. , 2019, 14(6): 63604-.
[11] Ai-Yuan Hu, Lin Wen, Guo-Pin Qin, Zhi-Min Wu, Peng Yu, Yu-Ting Cui. Possible phase transition of anisotropic frustrated Heisenberg model at finite temperature[J]. Front. Phys. , 2019, 14(5): 53601-.
[12] Gui-Lei Zhu, Xin-You Lü, Shang-Wu Bin, Cai You, Ying Wu. Entanglement and excited-state quantum phase transition in an extended Dicke model[J]. Front. Phys. , 2019, 14(5): 52602-.
[13] Yan-Rong Zhang, Ze-Zheng Zhang, Jia-Qi Yuan, Ming Kang, Jing Chen. High-order exceptional points in non-Hermitian Moiré lattices[J]. Front. Phys. , 2019, 14(5): 53603-.
[14] Ai-Yuan Hu, Huai-Yu Wang. Phase transition of the frustrated antiferromagntic J1-J2-J3 spin-1/2 Heisenberg model on a simple cubic lattice[J]. Front. Phys. , 2019, 14(1): 13605-.
[15] Zhi Lin, Wanli Liu. Analytic calculation of high-order corrections to quantum phase transitions of ultracold Bose gases in bipartite superlattices[J]. Front. Phys. , 2018, 13(5): 136402-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed