A survey on large language model based autonomous agents

doi:10.1007/s11704-024-40231-1

Frontiers of Computer Science

2024, Vol. 18

Issue (6): 186345 https://doi.org/10.1007/s11704-024-40231-1

本期目录

A survey on large language model based autonomous agents

Lei WANG, Chen MA, Xueyang FENG, Zeyu ZHANG, Hao YANG, Jingsen ZHANG, Zhiyuan CHEN, Jiakai TANG, Xu CHEN(

), Yankai LIN(

), Wayne Xin ZHAO, Zhewei WEI, Jirong WEN

Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China

全文: PDF(4242 KB) HTML

Abstract：

Autonomous agents have long been a research focus in academic and industry communities. Previous research often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of Web knowledge, large language models (LLMs) have shown potential in human-level intelligence, leading to a surge in research on LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of LLM-based autonomous agents from a holistic perspective. We first discuss the construction of LLM-based autonomous agents, proposing a unified framework that encompasses much of previous work. Then, we present a overview of the diverse applications of LLM-based autonomous agents in social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field.

Key words： autonomous agent large language model human-level intelligence

收稿日期: 2024-03-05 出版日期: 2024-03-20

Corresponding Author(s): Xu CHEN,Yankai LIN

引用本文:

. [J]. Frontiers of Computer Science, 2024, 18(6): 186345.
Lei WANG, Chen MA, Xueyang FENG, Zeyu ZHANG, Hao YANG, Jingsen ZHANG, Zhiyuan CHEN, Jiakai TANG, Xu CHEN, Yankai LIN, Wayne Xin ZHAO, Zhewei WEI, Jirong WEN. A survey on large language model based autonomous agents. Front. Comput. Sci., 2024, 18(6): 186345.

链接本文:

https://academic.hep.com.cn/fcs/CN/10.1007/s11704-024-40231-1
https://academic.hep.com.cn/fcs/CN/Y2024/V18/I6/186345

Fig.1

Fig.2

Fig.3

Fig.4

Model	Profile	Memory		Planning	Action	CA	Time
Model	Profile	Operation	Structure	Planning	Action	CA	Time
WebGPT [66]	−	−	−	−	②	①	12/2021
SayCan [78]	−	−	−	①	①	②	04/2022
MRKL [72]	−	−	−	①	②	−	05/2022
Inner Monologue [61]	−	−	−	②	①	②	07/2022
Social Simulacra [98]	②	−	−	−	①	−	08/2022
ReAct [59]	−	−	−	②	②	①	10/2022
MALLM [43]	−	①	②	−	①	−	01/2023
DEPS [33]	−	−	−	②	①	②	02/2023
Toolformer [15]	−	−	−	①	②	①	02/2023
Reflexion [12]	−	②	②	②	①	②	03/2023
CAMEL [99]	① ②	−	−	②	①	−	03/2023
API-Bank [69]	−	−	−	②	②	②	04/2023
ViperGPT [74]	−	−	−	−	②	−	03/2023
HuggingGPT [13]	−	−	①	①	②	−	03/2023
Generative Agents [20]	①	②	②	②	①	−	04/2023
LLM+P [57]	−	−	−	①	①	−	04/2023
ChemCrow [75]	−	−	−	②	②	−	04/2023
OpenAGI [73]	−	−	−	②	②	①	04/2023
AutoGPT [100]	−	①	②	②	②	②	04/2023
SCM [35]	−	②	②	−	①	−	04/2023
Socially Alignment [84]	−	①	②	−	①	①	05/2023
GITM [16]	−	②	②	②	①	②	05/2023
Voyager [38]	−	②	②	②	①	②	05/2023
Introspective Tips [101]	−	−	−	②	①	②	05/2023
RET-LLM [42]	−	①	②	−	①	①	05/2023
ChatDB [40]	−	①	②	②	②	−	06/2023
$S 3$ [77]	③	②	②	−	①	−	07/2023
ChatDev [18]	①	②	②	②	①	②	07/2023
ToolLLM [14]	−	−	−	②	②	①	07/2023
MemoryBank [39]	−	②	②	−	①	−	07/2023
MetaGPT [23]	①	②	②	②	②	−	08/2023

Tab.1

Fig.5

Tab.2

Model	Subjective	Objective	Benchmark	Time
WebShop [80]	−	① ③	$√$	07/2022
Social Simulacra [98]	①	②	−	08/2022
TE [102]	−	②	−	08/2022
LIBRO [168]	−	④	−	09/2022
ReAct [59]	−	①	$√$	10/2022
Out of One, Many [29]	②	② ③	−	02/2023
DEPS [33]	−	①	$√$	02/2023
Jalil et al. [169]	−	④	−	02/2023
Reflexion [12]	−	① ③	−	03/2023
IGLU [122]	−	①	$√$	04/2023
Generative Agents [20]	① ②	−	−	04/2023
ToolBench [149]	−	③	$√$	04/2023
GITM [16]	−	①	$√$	05/2023
Two-Failures [162]	−	③	−	05/2023
Voyager [38]	−	①	$√$	05/2023
SocKET [165]	−	② ③	$√$	05/2023
MobileEnv [163]	−	① ③	$√$	05/2023
Clembench [173]	−	① ③	$√$	05/2023
Dialop [175]	−	②	$√$	06/2023
Feldt et al. [170]	−	④	−	06/2023
CO-LLM [22]	①	①	−	07/2023
Tachikuma [164]	①	①	$√$	07/2023
WebArena [171]	−	①	$√$	07/2023
RocoBench [89]	−	① ② ③	−	07/2023
AgentSims [34]	−	②	−	08/2023
AgentBench [167]	−	③	$√$	08/2023
BOLAA [166]	−	① ③ ④	$√$	08/2023
Gentopia [172]	−	③	$√$	08/2023
EmotionBench [160]	①	−	$√$	08/2023
PTB [128]	−	④	−	08/2023

Tab.3

1	V, Mnih K, Kavukcuoglu D, Silver A A, Rusu J, Veness M G, Bellemare A, Graves M, Riedmiller A K, Fidjeland G, Ostrovski S, Petersen C, Beattie A, Sadik I, Antonoglou H, King D, Kumaran D, Wierstra S, Legg D Hassabis . Human-level control through deep reinforcement learning. Nature, 2015, 518( 7540): 529–533
2	T P, Lillicrap J J, Hunt A, Pritzel N, Heess T, Erez Y, Tassa D, Silver D Wierstra . Continuous control with deep reinforcement learning. 2019, arXiv preprint arXiv: 1509.02971
3	J, Schulman F, Wolski P, Dhariwal A, Radford O Klimov . Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347
4	T, Haarnoja A, Zhou P, Abbeel S Levine . Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1861−1870
5	T B, Brown B, Mann N, Ryder M, Subbiah J D, Kaplan P, Dhariwal A, Neelakantan P, Shyam G, Sastry A, Askell S, Agarwal A, Herbert-Voss G, Krueger T, Henighan R, Child A, Ramesh D M, Ziegler J, Wu C, Winter C, Hesse M, Chen E, Sigler M, Litwin S, Gray B, Chess J, Clark C, Berner S, McCandlish A, Radford I, Sutskever D Amodei . Language models are few-shot learners. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020, 1877−1901
6	A, Radford J, Wu R, Child D, Luan D, Amodei I Sutskever . Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9
7	OpenAI. GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774
8	Anthropic. Model card and evaluations for Claude models. See Files.anthropic.com/production/images/Model-Card-Claude-2, 2023
9	H, Touvron T, Lavril G, Izacard X, Martinet M A, Lachaux T, Lacroix B, Rozière N, Goyal E, Hambro F, Azhar A, Rodriguez A, Joulin E, Grave G Lample . LLaMA: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971
10	H, Touvron L, Martin K, Stone P, Albert A, Almahairi , et al.. Llama 2: open foundation and fine-tuned chat models. 2023, arXiv preprint arXiv: 2307.09288
11	X, Chen S, Li H, Li S, Jiang Y, Qi L Song . Generative adversarial user model for reinforcement learning based recommendation system. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1052−1061
12	N, Shinn F, Cassano A, Gopinath K, rasimhan S Yao . Reflexion: language agents with verbal reinforcement learning. NaIn: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
13	Y, Shen K, Song X, Tan D, Li W, Lu Y Zhuang . HuggingGPT: solving AI tasks with chatGPT and its friends in hugging face. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
14	Y, Qin S, Liang Y, Ye K, Zhu L, Yan Y, Lu Y, Lin X, Cong X, Tang B, Qian S, Zhao L, Hong R, Tian R, Xie J, Zhou M, Gerstein D, Li Z, Liu M Sun . ToolLLM: facilitating large language models to master 16000+ real-world APIs. 2023, arXiv preprint arXiv: 2307.16789
15	T, Schick J, Dwivedi-Yu R, Dessì R, Raileanu M, Lomeli E, Hambro L, Zettlemoyer N, Cancedda T Scialom . Toolformer: language models can teach themselves to use tools. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
16	X, Zhu Y, Chen H, Tian C, Tao W, Su C, Yang G, Huang B, Li L, Lu X, Wang Y, Qiao Z, Zhang J Dai . Ghost in the minecraft: generally capable agents for open-world environments via large language models with text-based knowledge and memory. 2023, arXiv preprint arXiv: 2305.17144
17	M, Sclar S, Kumar P, West A, Suhr Y, Choi Y Tsvetkov . Minding language models’ (lack of) theory of mind: a plug-and-play multi-character belief tracker. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 13960–13980
18	C, Qian X, Cong W, Liu C, Yang W, Chen Y, Su Y, Dang J, Li J, Xu S, Li Z, Liu M Sun . Communicative agents for software development. 2023, arXiv preprint arXiv: 2307.07924
19	Chen W, Su Y, Zuo J, Yang C, Yuan C, Chan C, Yu H, Lu Y, Hung Y, Qian C, Qin Y, Cong X, Xie R, Liu Z, Sun M, Zhou, J. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848 .
20	J S, Park J, O’Brien C J, Cai M R, Morris P, Liang M S Bernstein . Generative agents: interactive simulacra of human behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 2
21	H, Zhang W, Du J, Shan Q, Zhou Y, Du J B, Tenenbaum T, Shu C Gan . Building cooperative embodied agents modularly with large language models. 2024, arXiv preprint arXiv: 2307.02485
22	S, Hong M, Zhuge J, Chen X, Zheng Y, Cheng C, Zhang J, Wang Z, Wang S K S, Yau Z, Lin L, Zhou C, Ran L, Xiao C, Wu J Schmidhuber . MetaGPT: meta programming for a multi-agent collaborative framework. 2023, arXiv preprint arXiv: 2308.00352
23	Y, Dong X, Jiang Z, Jin G Li . Self-collaboration code generation via chatGPT. 2023, arXiv preprint arXiv: 2304.07590
24	G, Serapio-García M, Safdari C, Crepy L, Sun S, Fitz P, Romero M, Abdulhai A, Faust M Matarić . Personality traits in large language models. 2023, arXiv preprint arXiv: 2307.00184
25	J A Johnson . Measuring thirty facets of the five factor model with a 120-item public domain inventory: development of the IPIP-NEO-120. Journal of Research in Personality, 2014, 51: 78–89
26	John O P, Donahue E M, Kentle R L. Big five inventory. Journal of personality and social psychology, 1991.
27	A, Deshpande V, Murahari T, Rajpurohit A, Kalyan K Narasimhan . Toxicity in chatGPT: analyzing persona-assigned language models. In: Proceedings of Findings of the Association for Computational Linguistics. 2023, 1236–1270
28	L, Wang J, Zhang H, Yang Z, Chen J, Tang Z, Zhang X, Chen Y, Lin R, Song W X, Zhao J, Xu Z, Dou J, Wang J R Wen . User behavior simulation with large language model based agents. 2024, arXiv preprint arXiv: 2306.02552
29	L P, Argyle E C, Busby N, Fulda J R, Gubler C, Rytting D Wingate . Out of one, many: using language models to simulate human samples. Political Analysis, 2023, 31( 3): 337–351
30	K A Fischer . Reflective linguistic programming (RLP): a stepping stone in socially-aware AGI (socialAGI). 2023, arXiv preprint arXiv: 2305.12647
31	K, Rana J, Haviland S, Garg J, Abou-Chakra I, Reid N Suenderhauf . SayPlan: grounding large language models using 3D scene graphs for scalable robot task planning. In: Proceedings of the 7th Conference on Robot Learning. 2023, 23−72
32	A, Zhu L, Martin A, Head C Callison-Burch . CALYPSO: LLMs as dungeon master’s assistants. In: Proceedings of the 19th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2023, 380−390
33	Z, Wang S, Cai G, Chen A, Liu X, Ma Y Liang . Describe, explain, plan and select: interactive planning with large language models enables open-world multi-task agents. 2023, arXiv preprint arXiv: 2302.01560
34	J, Lin H, Zhao A, Zhang Y, Wu H, Ping Q Chen . AgentSims: an open-source sandbox for large language model evaluation. 2023, arXiv preprint arXiv: 2308.04026
35	B, Wang X, Liang J, Yang H, Huang S, Wu P, Wu L, Lu Z, Ma Z Li . Enhancing large language model with self-controlled memory framework. 2024, arXiv preprint arXiv: 2304.13343
36	Y, Ng D, Miyashita Y, Hoshi Y, Morioka O, Torii T, Kodama J Deguchi . SimplyRetrieve: a private and lightweight retrieval-centric generative AI tool. 2023, arXiv preprint arXiv: 2308.03983
37	Z, Huang S, Gutierrez H, Kamana S Macneil . Memory sandbox: transparent and interactive memory management for conversational agents. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023, 97
38	G, Wang Y, Xie Y, Jiang A, Mandlekar C, Xiao Y, Zhu L, Fan A Anandkumar . Voyager: an open-ended embodied agent with large language models. 2023, arXiv preprint arXiv: 2305.16291
39	W, Zhong L, Guo Q, Gao H, Ye Y Wang . MemoryBank: enhancing large language models with long-term memory. 2023, arXiv preprint arXiv: 2305.10250
40	C, Hu J, Fu C, Du S, Luo J, Zhao H Zhao . ChatDB: augmenting LLMs with databases as their symbolic memory. 2023, arXiv preprint arXiv: 2306.03901
41	X, Zhou G, Li Z Liu . LLM as DBA. 2023, arXiv preprint arXiv: 2308.05481
42	A, Modarressi A, Imani M, Fayyaz H Schütze . RET-LLM: towards a general read-write memory for large language models. 2023, arXiv preprint arXiv: 2305.14322
43	D Schuurmans . Memory augmented large language models are computationally universal. 2023, arXiv preprint arXiv: 2301.04589
44	A, Zhao D, Huang Q, Xu M, Lin Y J, Liu G Huang . Expel: LLM agents are experiential learners. 2023, arXiv preprint arXiv: 2308.10144
45	J, Wei X, Wang D, Schuurmans M, Bosma B, Ichter F, Xia E H, Chi Q V, Le D Zhou . Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 24824−24837
46	T, Kojima S S, Gu M, Reid Y, Matsuo Y Iwasawa . Large language models are zero-shot reasoners. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 22199−22213
47	S S, Raman V, Cohen E, Rosen I, Idrees D, Paulius S Tellex . Planning with large language models via corrective re-prompting. In: Proceedings of Foundation Models for Decision Making Workshop at Neural Information Processing Systems. 2022
48	B, Xu Z, Peng B, Lei S, Mukherjee Y, Liu D Xu . ReWOO: decoupling reasoning from observations for efficient augmented language models. 2023, arXiv preprint arXiv: 2305.18323
49	Wang X, Wei J, Schuurmans D, Le Q V, Chi E H, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023
50	Yao S, Yu D, Zhao J, Shafran I, Griffiths T L, Cao Y, Narasimhan K. Tree of thoughts: deliberate problem solving with large language models. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
51	Wang Y, Jiang Z, Chen Z, Yang F, Zhou Y, Cho E, Fan X, Huang X, Lu Y, Yang Y. RecMind: Large language model powered agent for recommendation. 2023, arXiv preprint arXiv: 2308.14296
52	M, Besta N, Blach A, Kubicek R, Gerstenberger M, Podstawski L, Gianinazzi J, Gajda T, Lehmann H, Niewiadomski P, Nyczyk T Hoefler . Graph of thoughts: solving elaborate problems with large language models. 2024, arXiv preprint arXiv: 2308.09687
53	B, Sel A, Al-Tawaha V, Khattar R, Jia M Jin . Algorithm of thoughts: enhancing exploration of ideas in large language models. 2023, arXiv preprint arXiv: 2308.10379
54	W, Huang P, Abbeel D, Pathak I Mordatch . Language models as zero-shot planners: extracting actionable knowledge for embodied agents. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 9118−9147
55	M, Gramopadhye D Szafir . Generating executable action plans with environmentally-aware language models. In: Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2023, 3568−3575
56	S, Hao Y, Gu H, Ma J, Hong Z, Wang D, Wang Z Hu . Reasoning with language model is planning with world model. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 8154–8173
57	B, Liu Y, Jiang X, Zhang Q, Liu S, Zhang J, Biswas P Stone . LLM+P: empowering large language models with optimal planning proficiency. 2023, arXiv preprint arXiv: 2304.11477
58	G, Dagan F, Keller A Lascarides . Dynamic planning with a LLM. 2023, arXiv preprint arXiv: 2308.06391
59	Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K R, Cao Y. ReAct: synergizing reasoning and acting in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023
60	C H, Song B M, Sadler J, Wu W L, Chao C, Washington Y Su . LLM-planner: few-shot grounded planning for embodied agents with large language models. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 2986−2997
61	W, Huang F, Xia T, Xiao H, Chan J, Liang P, Florence A, Zeng J, Tompson I, Mordatch Y, Chebotar P, Sermanet T, Jackson N, Brown L, Luu S, Levine K, Hausman B Ichter . Inner monologue: embodied reasoning through planning with language models. In: Proceedings of the 6th Conference on Robot Learning, 2023, 1769−1782
62	A, Madaan N, Tandon P, Gupta S, Hallinan L, Gao S, Wiegreffe U, Alon N, Dziri S, Prabhumoye Y, Yang S, Gupta B P, Majumder K, Hermann S, Welleck A, Yazdanbakhsh P Clark . Self-refine: iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 2024, 36.
63	N, Miao Y W, Teh T Rainforth . SelfCheck: using LLMs to zero-shot check their own step-by-step reasoning. 2023, arXiv preprint arXiv: 2308.00436
64	P L, Chen C S Chang . InterAct: exploring the potentials of chatGPT as a cooperative agent. 2023, arXiv preprint arXiv: 2308.01552
65	Z, Chen K, Zhou B, Zhang Z, Gong X, Zhao J R Wen . ChatCoT: tool-augmented chain-of-thought reasoning on chat-based large language models. In: Proceedings of Findings of the Association for Computational Linguistics. 2023, 14777–14790
66	R, Nakano J, Hilton S, Balaji J, Wu L, Ouyang C, Kim C, Hesse S, Jain V, Kosaraju W, Saunders X, Jiang K, Cobbe T, Eloundou G, Krueger K, Button M, Knight B, Chess J Schulman . WebGPT: browser-assisted question-answering with human feedback. 2022, arXiv preprint arXiv: 2112.09332
67	J, Ruan Y, Chen B, Zhang Z, Xu T, Bao G, Du S, Shi H, Mao Z, Li X, Zeng R Zhao . TPTU: large language model-based AI agents for task planning and tool usage. 2023, arXiv preprint arXiv: 2308.03427
68	S G, Patil T, Zhang X, Wang J E Gonzalez . Gorilla: large language model connected with massive APIs. 2023, arXiv preprint arXiv: 2305.15334
69	M, Li Y, Zhao B, Yu F, Song H, Li H, Yu Z, Li F, Huang Y Li . API-bank: a comprehensive benchmark for tool-augmented LLMs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 3102–3116
70	Y, Song W, Xiong D, Zhu W, Wu H, Qian M, Song H, Huang C, Li K, Wang R, Yao Y, Tian S Li . RestGPT: connecting large language models with real-world RESTful APIs. 2023, arXiv preprint arXiv: 2306.06624
71	Y, Liang C, Wu T, Song W, Wu Y, Xia Y, Liu Y, Ou S, Lu L, Ji S, Mao Y, Wang L, Shou M, Gong N Duan . TaskMatrix.AI: Completing tasks by connecting foundation models with millions of APIs. 2023, arXiv preprint arXiv: 2303.16434
72	E, Karpas O, Abend Y, Belinkov B, Lenz O, Lieber N, Ratner Y, Shoham H, Bata Y, Levine K, Leyton-Brown D, Muhlgay N, Rozen E, Schwartz G, Shachaf S, Shalev-Shwartz A, Shashua M Tenenholtz . MRKL systems: a modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. 2022, arXiv preprint arXiv: 2205.00445
73	Y, Ge W, Hua K, Mei J, Tan S, Xu Z, Li Y Zhang . OpenAGI: When LLM meets domain experts. In: Proceedings of the 37th Conference on Neural Information Processing Systems, 2023, 36
74	D, Surís S, Menon C Vondrick . ViperGPT: visual inference via python execution for reasoning. 2023, arXiv preprint arXiv: 2303.08128
75	A M, Bran S, Cox O, Schilter C, Baldassari A D, White P Schwaller . ChemCrow: augmenting large-language models with chemistry tools. 2023, arXiv preprint arXiv: 2304.05376
76	Z, Yang L, Li J, Wang K, Lin E, Azarnasab F, Ahmed Z, Liu C, Liu M, Zeng L Wang . MM-REACT: Prompting chatGPT for multimodal reasoning and action. 2023, arXiv preprint arXiv: 2303.11381
77	C, Gao X, Lan Z, Lu J, Mao J, Piao H, Wang D, Jin Y Li . S3: social-network simulation system with large language model-empowered agents. 2023, arXiv preprint arXiv: 2307.14984
78	B, Ichter A, Brohan Y, Chebotar C, Finn K, Hausman , et al.. Do as I can, not as I say: grounding language in robotic affordances. In: Proceedings of the 6th Conference on Robot Learning. 2023, 287−318
79	H, Liu C, Sferrazza P Abbeel . Chain of hindsight aligns language models with feedback. arXiv preprint arXiv: 2302.02676
80	Yao S, Chen H, Yang J, Narasimhan K. WebShop: towards scalable real-world Web interaction with grounded language agents. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 20744−20757
81	Y, Dan Z, Lei Y, Gu Y, Li J, Yin J, Lin L, Ye Z, Tie Y, Zhou Y, Wang A, Zhou Z, Zhou Q, Chen J, Zhou L, He X Qiu . EduChat: a large-scale language model-based chatbot system for intelligent education. 2023, arXiv preprint arXiv: 2308.02773
82	B Y, Lin Y, Fu K, Yang F, Brahman S, Huang C, Bhagavatula P, Ammanabrolu Y, Choi X Ren . SwiftSage: a generative agent with fast and slow thinking for complex interactive tasks. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
83	J S B T, Evans K E Stanovich . Dual-process theories of higher cognition: advancing the debate. Perspectives on Psychological Science, 2013, 8( 3): 223–241
84	R, Liu R, Yang C, Jia G, Zhang D, Zhou A M, Dai D, Yang S Vosoughi . Training socially aligned language models on simulated social interactions. 2023, arXiv preprint arXiv: 2305.16960
85	Weng X, Gu Y, Zheng B, Chen S, Stevens S, Wang B, Sun H, Su Y. Mind2Web: towards a generalist agent for the Web. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
86	R, Sun S O, Arik H, Nakhost H, Dai R, Sinha P, Yin T Pfister . SQL-PaLm: improved large language model adaptation for text-to-SQL. 2023, arXiv preprint arXiv: 2306.00739
87	Yao W, Heinecke S, Niebles J C, Liu Z, Feng Y, Xue L, Murthy R, Chen Z, Zhang J, Arpit D, Xu R, Mui P, Wang H, Xiong C, Savarese S. Retroformer: retrospective large language agents with policy gradient optimization, 2023, arXiv preprint arXiv: 2308.02151
88	Y, Shu H, Zhang H, Gu P, Zhang T, Lu D, Li N Gu . RAH! RecSys-assistant-human: a human-centered recommendation framework with LLM agents. 2023, arXiv preprint arXiv: 2308.09904
89	Z, Mandi S, Jain S Song . RoCo: dialectic multi-robot collaboration with large language models. 2023, arXiv preprint arXiv: 2307.04738
90	C, Zhang L, Liu J, Wang C, Wang X, Sun H, Wang M Cai . PREFER: prompt ensemble learning via feedback-reflect-refine. 2023, arXiv preprint arXiv: 2308.12033
91	Y, Du S, Li A, Torralba J B, Tenenbaum I Mordatch . Improving factuality and reasoning in language models through multiagent debate. 2023, arXiv preprint arXiv: 2305.14325
92	C, Zhang Z, Yang J, Liu Y, Han X, Chen Z, Huang B, Fu G Yu . AppAgent: multimodal agents as smartphone users. 2023, arXiv preprint arXiv: 2312.13771
93	A, Madaan N, Tandon P, Clark Y Yang . Memory-assisted prompt editing to improve GPT-3 after deployment. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 2833–2861
94	C, Colas L, Teodorescu P Y, Oudeyer X, Yuan M A Côté . Augmenting autotelic agents with large language models. In: Proceedings of the 2nd Conference on Lifelong Learning Agents. 2023, 205–226
95	N, Nascimento P, Alencar D Cowan . Self-adaptive large language model (LLM)-based multiagent systems. In: Proceedings of 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion. 2023, 104−109
96	S, Saha P, Hase M Bansal . Can language models teach weaker agents? Teacher explanations improve students via personalization. 2023, arXiv preprint arXiv: 2306.09299
97	M, Zhuge H, Liu F, Faccio D R, Ashley R, Csordás A, Gopalakrishnan A, Hamdi H A A K, Hammoud V, Herrmann K, Irie L, Kirsch B, Li G, Li S, Liu J, Mai P, Piękos A, Ramesh I, Schlag W, Shi A, Stanić W, Wang Y, Wang M, Xu D P, Fan B, Ghanem J Schmidhuber . Mindstorms in natural language-based societies of mind. 2023, arXiv preprint arXiv: 2305.17066
98	J S, Park L, Popowski C, Cai M R, Morris P, Liang M S Bernstein . Social simulacra: creating populated prototypes for social computing systems. In: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022, 74
99	G, Li H A A K, Hammoud H, Itani D, Khizbullin B Ghanem . CAMEL: communicative agents for "mind" exploration of large language model society. 2023, arXiv preprint arXiv: 2303.17760
100	AutoGPT. See Github.com/Significant-Gravitas/Auto, 2023
101	L, Chen L, Wang H, Dong Y, Du J, Yan F, Yang S, Li P, Zhao S, Qin S, Rajmohan Q, Lin D Zhang . Introspective tips: large language model for in-context decision making. 2023, arXiv preprint arXiv: 2305.11598
102	G V, Aher R I, Arriaga A T Kalai . Using large language models to simulate multiple humans and replicate human subject studies. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 337−371
103	E, Akata L, Schulz J, Coda-Forno S J, Oh M, Bethge E Schulz . Playing repeated games with large language models. 2023, arXiv preprint arXiv: 2305.16867
104	Z, Ma Y, Mei Z Su . Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. In: Proceedings of AMIA Symposium. 2023, 1105−1114
105	Ziems C, Held W, Shaikh O, Chen J, Zhang Z, Yang D. Can large language models transform computational social science? 2024, arXiv preprint arXiv: 2305.03514
106	Horton J J. Large language models as simulated economic agents: what can we learn from homo silicus? 2023, arXiv preprint arXiv: 2301.07543
107	S, Li J, Yang K Zhao . Are you in a masquerade? Exploring the behavior and impact of large language model driven social bots in online social networks. 2023, arXiv preprint arXiv: 2307.10337
108	C, Li X, Su H, Han C, Xue C, Zheng C Fan . Quantifying the impact of large language models on collective opinion dynamics. 2023, arXiv preprint arXiv: 2308.03313
109	G, Kovač R, Portelas P F, Dominey P Y Oudeyer . The SocialAI school: insights from developmental psychology towards artificial socio-cultural agents. 2023, arXiv preprint arXiv: 2307.07871
110	R, Williams N, Hosseinichimeh A, Majumdar N Ghaffarzadegan . Epidemic modeling with generative agents. 2023, arXiv preprint arXiv: 2307.04986
111	J, Shi J, Zhao Y, Wang X, Wu J, Li L He . CGMI: configurable general multi-agent interaction framework. 2023, arXiv preprint arXiv: 2308.12503
112	J, Cui Z, Li Y, Yan B, Chen L Yuan . ChatLaw: open-source legal large language model with integrated external knowledge bases. 2023, arXiv preprint arXiv: 2306.16092
113	S Hamilton . Blind judgement: agent-based supreme court modelling with GPT. 2023, arXiv preprint arXiv: 2301.05327
114	Bail C A. Can generative AI improve social science? 2023
115	D A, Boiko R, MacKnight G Gomes . Emergent autonomous scientific research capabilities of large language models. 2023, arXiv preprint arXiv: 2304.05332
116	Y, Kang J Kim . ChatMOF: an autonomous AI system for predicting and generating metal-organic frameworks. 2023, arXiv preprint arXiv: 2308.01423
117	M, Swan T, Kido E, Roland R P D Santos . Math agents: computational infrastructure, mathematical embedding, and genomics. 2023, arXiv preprint arXiv: 2307.02502
118	I, Drori S, Zhang R, Shuttleworth L, Tang A, Lu E, Ke K, Liu L, Chen S, Tran N, Cheng R, Wang N, Singh T L, Patti J, Lynch A, Shporer N, Verma E, Wu G Strang . A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level. Proceedings of the National Academy of Sciences of the United States of America, 2022, 119( 32): e2123433119
119	M, Chen J, Tworek H, Jun Q, Yuan Oliveira Pinto H P, de , et al.. Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374
120	M, Liffiton B E, Sheese J, Savelka P Denny . CodeHelp: using large language models with guardrails for scalable support in programming classes. In: Proceedings of the 23rd Koli Calling International Conference on Computing Education Research. 2023, 8
121	J K, Matelsky F, Parodi T, Liu R D, Lange K P Kording . A large language model-assisted education tool to provide feedback on open-ended responses. 2023, arXiv preprint arXiv: 2308.02439
122	N, Mehta M, Teruel P F, Sanz X, Deng A H, Awadallah J Kiseleva . Improving grounded language understanding in a collaborative environment by interacting with agents through help feedback. 2024, arXiv preprint arXiv: 2304.10750
123	SmolModels. See Githubcom/smol-ai/developer website, 2023
124	DemoGPT. See Github.com/melih-unsal/Demo website, 2023
125	GPT-engineer. See Github.com/AntonOsika/gpt website, 2023
126	H, Li Y, Hao Y, Zhai Z Qian . The hitchhiker’s guide to program analysis: a journey with large language models. 2023, arXiv preprint arXiv: 2308.00245
127	He Z, Wu H, Zhang X, Yao X, Zheng S, Zheng H, Yu B. ChatEDA: a large language model powered autonomous agent for EDA. In: Proceedings of the 5th ACM/IEEE Workshop on Machine Learning for CAD. 2023, 1−6
128	G, Deng Y, Liu V, Mayoral-Vilches P, Liu Y, Li Y, Xu T, Zhang Y, Liu M, Pinzger S Rass . PentestGPT: an LLM-empowered automatic penetration testing tool. 2023, arXiv preprint arXiv: 2308.06782
129	Y, Xia M, Shenoy N, Jazdi M Weyrich . Towards autonomous system: flexible modular production system enhanced with large language model agents. In: Proceedings of the 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation. 2023, 1−8
130	O, Ogundare S, Madasu N Wiggins . Industrial engineering with large language models: a case study of chatGPT’s performance on oil & gas problems. In: Proceedings of the 2023 11th International Conference on Control, Mechatronics and Automation. 2023, 458−461
131	B, Hu C, Zhao P, Zhang Z, Zhou Y, Yang Z, Xu B Liu . Enabling intelligent interactions between an agent and an LLM: a reinforcement learning approach. 2024, arXiv preprint arXiv: 2306.03604
132	Y, Wu S Y, Min Y, Bisk R, Salakhutdinov A, Azaria Y, Li T, Mitchell S Prabhumoye . Plan, eliminate, and track−language models are good teachers for embodied agents. 2023, arXiv preprint arXiv: 2305.02412
133	D, Zhang L, Chen S, Zhang H, Xu Z, Zhao K Yu . Large language models are semi-parametric reinforcement learning agents. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
134	Di P N, Byravan A, Hasenclever L, Wulfmeier M, Heess N, Riedmiller M. Towards a unified agent with foundation models. 2023, arXiv preprint arXiv: 2307.09668
135	I, Dasgupta C, Kaeser-Chen K, Marino A, Ahuja S, Babayan F, Hill R Fergus . Collaborating with language models for embodied reasoning. 2023, arXiv preprint arXiv: 2302.00763
136	W, Zhou X, Peng M O Riedl . Dialogue shaping: empowering agents through NPC interaction. 2023, arXiv preprint arXiv: 2307.15833
137	K, Nottingham P, Ammanabrolu A, Suhr Y, Choi H, Hajishirzi S, Singh R Fox . Do embodied agents dream of pixelated sheep: embodied decision making using language guided world modelling. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 26311–26325
138	Z, Wu Z, Wang X, Xu J, Lu H Yan . Embodied task planning with large language models. 2023, arXiv preprint arXiv: 2307.01848
139	J, Wu R, Antonova A, Kan M, Lepert A, Zeng S, Song J, Bohg S, Rusinkiewicz T Funkhouser . TidyBot: personalized robot assistance with large language models. In: Proceedings of 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2023, 3546−3553
140	AgentGPT. See Github.com/reworkd/Agent website, 2023
141	Ai-legion. See Github.com/eumemic/ai website, 2023
142	AGiXT. See Githubcom/Josh-XT/AGiXT website, 2023
143	Xlang. See Githubcom/xlang-ai/xlang website, 2023
144	Babyagi. See Githubcom/yoheinakajima website, 2023
145	LangChain. See Docs.langchaincom/docs/ website, 2023
146	WorkGPT. See Githubcom/team-openpm/workgpt website, 2023
147	LoopGPT. See Githubcom/farizrahman4u/loopgpt website, 2023
148	GPT-researcher. See Github.com/assafelovic/gpt website, 2023
149	Y, Qin S, Hu Y, Lin W, Chen N, Ding G, Cui Z, Zeng Y, Huang C, Xiao C, Han Y R, Fung Y, Su H, Wang C, Qian R, Tian K, Zhu S, Liang X, Shen B, Xu Z, Zhang Y, Ye B, Li Z, Tang J, Yi Y, Zhu Z, Dai L, Yan X, Cong Y, Lu W, Zhao Y, Huang J, Yan X, Han X, Sun D, Li J, Phang X, Yang T, Wu H, Ji Z, Liu M Sun . Tool learning with foundation models. 2023, arXiv preprint arXiv: 2304.08354
150	Transformers agent. See Huggingface.co/docs/transformers/transformers website, 2023
151	Mini-agi. See Github.com/muellerberndt/mini website, 2023
152	SuperAGI. See Github.com/TransformerOptimus/Super website, 2023
153	Q, Wu G, Bansal J, Zhang Y, Wu B, Li E, Zhu L, Jiang X, Zhang S, Zhang J, Liu A H, Awadallah R W, White D, Burger C Wang . AutoGen: enabling next-gen LLM applications via multi-agent conversation. 2023, arXiv preprint arXiv: 2308.08155
154	I, Grossmann M, Feinberg D C, Parker N A, Christakis P E, Tetlock W A Cunningham . AI and the transformation of social science research: careful bias management and data fidelity are key. Science, 2023, 380( 6650): 1108–1109
155	Huang X, Lian J, Lei Y, Yao J, Lian D, Xie X. Recommender AI agent: integrating large language models for interactive recommendations. 2023, arXiv preprint arXiv: 2308.16505
156	C, Zhang K, Yang S, Hu Z, Wang G, Li Y, Sun C, Zhang Z, Zhang A, Liu S C, Zhu X, Chang J, Zhang F, Yin Y, Liang Y Yang . ProAgent: building proactive cooperative agents with large language models. 2024, arXiv preprint arXiv: 2308.11339
157	J, Xiang T, Tao Y, Gu T, Shu Z, Wang Z, Yang Z Hu . Language models meet world models: embodied experiences enhance language models. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
158	M, Lee M, Srivastava A, Hardy J, Thickstun E, Durmus A, Paranjape I, Gerard-Ursin X L, Li F, Ladhak F, Rong R E, Wang M, Kwon J S, Park H, Cao T, Lee R, Bommasani M, Bernstein P Liang . Evaluating human-language model interaction. 2024, arXiv preprint arXiv: 2212.09746
159	R, Krishna D, Lee L, Fei-Fei M S Bernstein . Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences of the United States of America, 2022, 119( 39): e2115730119
160	J T, Huang M H, Lam E J, Li S, Ren W, Wang W, Jiao Z, Tu M R Lyu . Emotionally numb or empathetic? Evaluating how LLMs feel using emotionbench. 2024, arXiv preprint arXiv: 2308.03656
161	C M, Chan W, Chen Y, Su J, Yu W, Xue S, Zhang J, Fu Z Liu . ChatEval: towards better LLM-based evaluators through multi-agent debate. 2023, arXiv preprint arXiv: 2308.07201
162	A, Chen J, Phang A, Parrish V, Padmakumar C, Zhao S R, Bowman K Cho . Two failures of self-consistency in the multi-step reasoning of LLMs. 2024, arXiv preprint arXiv: 2305.14279
163	D, Zhang H, Xu Z, Zhao L, Chen R, Cao K Yu . Mobile-env: an evaluation platform and benchmark for LLM-GUI interaction. 2024, arXiv preprint arXiv: 2305.08144
164	Y, Liang L, Zhu Y Yang . Tachikuma: understading complex interactions with multi-character and novel objects by large language models. 2023, arXiv preprint arXiv: 2307.12573
165	M, Choi J, Pei S, Kumar C, Shu D Jurgens . Do LLMs understand social knowledge? Evaluating the sociability of large language models with socKET benchmark. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 11370–11403
166	Z, Liu W, Yao J, Zhang L, Xue S, Heinecke R, Murthy Y, Feng Z, Chen J C, Niebles D, Arpit R, Xu P, Mui H, Wang C, Xiong S Savarese . BOLAA: benchmarking and orchestrating LLM-augmented autonomous agents. 2023, arXiv preprint arXiv: 2308.05960
167	X, Liu H, Yu H, Zhang Y, Xu X, Lei H, Lai Y, Gu H, Ding K, Men K, Yang S, Zhang X, Deng A, Zeng Z, Du C, Zhang S, Shen T, Zhang Y, Su H, Sun M, Huang Y, Dong J Tang . AgentBench: evaluating LLMs as agents. 2023, arXiv preprint arXiv: 2308.03688
168	Kang S, Yoon J, Yoo S. Large language models are few-shot testers: exploring LLM-based general bug reproduction. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering. 2023, 2312−2323
169	S, Jalil S, Rafi T D, LaToza K, Moran W Lam . ChatGPT and software testing education: Promises & perils. In: Proceedings of 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops. 2023, 4130−4137
170	Feldt R, Kang S, Yoon J, Yoo S. Towards autonomous testing agents via conversational large language models. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering. 2023, 1688−1693
171	Zhou S, Xu F F, Zhu H, Zhou X, Lo R, Sridhar A, Cheng X, Ou T, Bisk Y, Fried D, Alon U, Neubig G. WebArena: a realistic Web environment for building autonomous agents. 2023, arXiv preprint arXiv: 2307.13854
172	B, Xu X, Liu H, Shen Z, Han Y, Li M, Yue Z, Peng Y, Liu Z, Yao D Xu . Gentopia.AI: a collaborative platform for tool-augmented LLMs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023, 237−245
173	K, Chalamalasetti J, Götze S, Hakimov B, Madureira P, Sadler D Schlangen . clembench: Using game play to evaluate chat-optimized language models as conversational agents. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 11174–11219
174	D, Banerjee P, Singh A, Avadhanam S Srivastava . Benchmarking LLM powered chatbots: methods and metrics. 2023, arXiv preprint arXiv: 2308.04624
175	J, Lin N, Tomlin J, Andreas J Eisner . Decision-oriented dialogue for human-AI collaboration. 2023, arXiv preprint arXiv: 2305.20076
176	W X, Zhao K, Zhou J, Li T, Tang X, Wang Y, Hou Y, Min B, Zhang J, Zhang Z, Dong Y, Du C, Yang Y, Chen Z, Chen J, Jiang R, Ren Y, Li X, Tang Z, Liu P, Liu J Y, Nie J R Wen . A survey of large language models. 2023, arXiv preprint arXiv: 2303.18223
177	J, Yang H, Jin R, Tang X, Han Q, Feng H, Jiang S, Zhong B, Yin X Hu . Harnessing the power of LLMs in practice: a survey on chatGPT and beyond. ACM Transactions on Knowledge Discovery from Data, 2024, doi:
178	Y, Wang W, Zhong L, Li F, Mi X, Zeng W, Huang L, Shang X, Jiang Q Liu . Aligning large language models with human: a survey. 2023, arXiv preprint arXiv: 2307.12966
179	J, Huang K C C Chang . Towards reasoning in large language models: a survey. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2023. 2023, 1049–1065
180	G, Mialon R, Dessì M, Lomeli C, Nalmpantis R, Pasunuru R, Raileanu B, Rozière T, Schick J, Dwivedi-Yu A, Celikyilmaz E, Grave Y, LeCun T Scialom . Augmented language models: a survey. 2023, arXiv preprint arXiv: 2302.07842
181	Y, Chang X, Wang J, Wang Y, Wu L, Yang K, Zhu H, Chen X, Yi C, Wang Y, Wang W, Ye Y, Zhang Y, Chang P S Yu . A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 2023, doi:
182	Chang T A, Bergen B K. Language model behavior: a comprehensive survey. Computational Linguistics, 2024, doi:
183	C, Li J, Wang K, Zhu Y, Zhang W, Hou J, Lian X Xie . Emotionprompt: Leveraging psychology for large language models enhancement via emotional stimulus. 2023, arXiv preprint arXiv: 2307.11760
184	T Y, Zhuo Z, Li Y, Huang F, Shiri W, Wang G, Haffari Y F Li . On robustness of prompt-based semantic parsing with large pre-trained language model: an empirical study on codex. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023, 1090–1102
185	Z, Gekhman N, Oved O, Keller I, Szpektor R Reichart . On the robustness of dialogue history representation in conversational question answering: a comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics, 2023, 11( 11): 351–366
186	Z, Ji N, Lee R, Frieske T, Yu D, Su Y, Xu E, Ishii Y J, Bang A, Madotto P Fung . Survey of hallucination in natural language generation. ACM Computing Surveys, 2023, 55( 12): 248

Viewed

Full text

Abstract

Cited

Shared

Discussed