Tool learning with large language models: a survey

doi:10.1007/s11704-024-40678-2

Front. Comput. Sci.

2025, Vol. 19

Issue (8) : 198343 https://doi.org/10.1007/s11704-024-40678-2

Artificial Intelligence

Tool learning with large language models: a survey

Changle QU¹, Sunhao DAI¹, Xiaochi WEI², Hengyi CAI³, Shuaiqiang WANG², Dawei YIN², Jun XU¹(

), Ji-rong WEN¹

¹. Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China
². Baidu Inc., Beijing 100193, China
³. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100864, China

Download: PDF(4156 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs. In this survey, we focus on reviewing existing literature from the two primary aspects (1) why tool learning is beneficial and (2) how tool learning is implemented, enabling a comprehensive understanding of tool learning with LLMs. We first explore the “why” by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects. In terms of “how”, we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow: task planning, tool selection, tool calling, and response generation. Additionally, we provide a detailed summary of existing benchmarks and evaluation methods, categorizing them according to their relevance to different stages. Finally, we discuss current challenges and outline potential future directions, aiming to inspire both researchers and industrial developers to further explore this emerging and promising area.

Keywords tool learning large language models agent

Corresponding Author(s): Jun XU

Just Accepted Date: 15 October 2024 Issue Date: 21 November 2024

Cite this article:

Changle QU,Sunhao DAI,Xiaochi WEI, et al. Tool learning with large language models: a survey[J]. Front. Comput. Sci., 2025, 19(8): 198343.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-40678-2
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I8/198343

Fig.1 An illustration of the development trajectory of tool learning. We present the statistics of papers with the publication year and venue, with each venue uniquely represented by a distinct color. For each time period, we have selected a range of representative landmark studies that have significantly contributed to the field. (Note that we use the institution of the first author as the representing institution in the figure.)

Fig.2 The overall structure of this paper

Fig.3 The overall workflow for tool learning with large language models. The left part illustrates the four stages of tool learning: task planning, tool selection, tool calling, and response generation. The right part shows two paradigms of tool learning: tool learning with one-step task solving and tool learning with iterative task solving

Fig.4

Fig.5

Fig.6

Fig.7

Benchmark	Focus	# Tools	# Instances	Tool Source	Multi-tool?	Executable?	Time
General benchmarks
API-Bank [30]	①, ②, ③, ④	$73$	$314$	Manual Creation	√	√	2023-04
APIBench [53]	②, ③	$1, 645$	$16, 450$	Public Models	×	×	2023-05
ToolBench1 [33]	②, ③	$232$	$2, 746$	Public APIs	×	√	2023-05
ToolAlpaca [19]	②, ③, ④	$426$	$3, 938$	Public APIs	×	×	2023-06
RestBench [93]	①, ②, ③	$94$	$157$	RESTful APIs	√	×	2023-06
ToolBench2 [18]	①, ②, ③, ④	$16, 464$	$126, 486$	Rapid API	√	√	2023-07
MetaTool [31]	①, ②	$199$	$21, 127$	OpenAI Plugins	√	×	2023-10
TaskBench [188]	①, ②, ③	$103$	$28, 271$	Public APIs	√	√	2023-11
T-Eval [32]	①, ②, ③	$15$	$533$	Manual Creation	√	√	2023-12
ToolEyes [138]	①, ②, ③, ④	$568$	$382$	Manual Creation	√	√	2024-01
UltraTool [139]	①, ②, ③	$2, 032$	$5, 824$	Manual Creation	√	×	2024-01
API-BLEND [141]	②, ③	−	$189, 040$	Exsiting Datasets	√	√	2024-02
Seal-Tools [140]	②, ③	$4, 076$	$14, 076$	Manual Creation	√	×	2024-05
ShortcutsBench [142]	②, ③	$1, 414$	$7, 627$	Public APIs	√	√	2024-07
GTA [143]	②, ③ ④	$14$	$229$	Public APIs	√	√	2024-07
WTU-Eval [144]	①	$4$	$916$	BMTools	√	√	2024-07
AppWorld [145]	①, ②, ③	$457$	$750$	FastAPI	√	√	2024-07
Other benchmarks
ToolQA [55]	QA	$13$	$1, 530$	Manual Creation	×	√	2023-06
ToolEmu [146]	Safety	$311$	$144$	Manual Creation	×	√	2023-09
ToolTalk [147]	Conversation	$28$	$78$	Manual Creation	×	√	2023-11
VIoT [148]	VIoT	$11$	$1, 841$	Public Models	×	√	2023-12
RoTBench [149]	Robustness	$568$	$105$	ToolEyes	√	√	2024-01
MLLM-Tool [88]	Multi-modal	$932$	$11, 642$	Public Models	√	√	2024-01
ToolSword [150]	Safety	$100$	$440$	Manual Creation	√	√	2024-02
SciToolBench [151]	Sci-Reasoning	$2, 446$	$856$	Manual Creation	√	√	2024-02
InjecAgent [153]	Safety	$17$	$1, 054$	Public APIs	×	√	2024-02
StableToolBench [152]	Stable	$16, 464$	$126, 486$	ToolBench2	√	√	2024-03
m&m’s [87]	Multi-modal	$33$	$4, 427$	Public Models	√	√	2024-03
GeoLLM-QA [166]	Remote Sensing	$117$	$1, 000$	Public Models	√	√	2024-04
ToolLens [124]	Tool Retrieval	$464$	$18, 770$	ToolBench2	√	√	2024-05
SoAyBench [111]	Academic Seeking	$7$	$792$	AMiner	√	√	2024-05
ToolSandbox [155]	Conversation	$34$	$1, 032$	Rapid API	√	√	2024-08
CToolEval [154]	Chinese	$398$	$6, 816$	Public Apps	√	√	2024-08

Tab.1 A detailed list of different benchmarks and their specific configurations. Symbols ①, ②, ③, and ④ represent the four stages in tool learning—task planning, tool selection, tool calling, and response generation, respectively

1	S L Washburn . Tools and human evolution. Scientific American, 1960, 203( 3): 62–75
2	K R, Gibson T Ingold . Tools, Language, and Cognition in Human Evolution. Cambridge: Cambridge University Press, 1994
3	Von Eckardt B. What Is Cognitive Science? The MIT Press, 1995, ISBN: 9780262720236
4	R W, Shumaker K R, Walkup B B Beck . Animal Tool Behavior: the Use and Manufacture of Tools by Animals. Baltimore: Johns Hopkins University Press, 2011
5	J, Achiam S, Adler S, Agarwal L, Ahmad I, Akkaya , et al.. GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774
6	W S, El-Kassas C R, Salama A A, Rafea H K Mohamed . Automatic text summarization: a comprehensive survey. Expert Systems with Applications, 2021, 165: 113679
7	T, Zhang F, Ladhak E, Durmus P, Liang K, McKeown T B Hashimoto . Benchmarking large language models for news summarization. Transactions of the Association for Computational Linguistics, 2024, 12: 39–57
8	B, Zhang B, Haddow A Birch . Prompting large language model for machine translation: a case study. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 41092–41110
9	Z, Feng Y, Zhang H, Li W, Liu J, Lang Y, Feng J, Wu Z Liu . Improving LLM-based machine translation with systematic self-correction. 2024, arXiv preprint arXiv: 2402.16379v2
10	Z, Yang P, Qi S, Zhang Y, Bengio W W, Cohen R, Salakhutdinov C D Manning . HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 2369–2380
11	T, Kwiatkowski J, Palomaki O, Redfield M, Collins A, Parikh C, Alberti D, Epstein I, Polosukhin J, Devlin K, Lee K, Toutanova L, Jones M, Kelcey M W, Chang A M, Dai J, Uszkoreit Q, Le S Petrov . Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 2019, 7: 452–466
12	A, Mallen A, Asai V, Zhong R, Das D, Khashabi H Hajishirzi . When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 9802–9822
13	T, Vu M, Iyyer X, Wang N, Constant J, Wei J, Wei C, Tar Y H, Sung D, Zhou Q, Le T Luong . FreshLLMs: refreshing large language models with search engine augmentation. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 13697–13720
14	Z, Ji N, Lee R, Frieske T, Yu D, Su Y, Xu E, Ishii Y J, Bang A, Madotto P Fung . Survey of hallucination in natural language generation. ACM Computing Surveys, 2023, 55( 12): 248
15	Y, Zhang Y, Li L, Cui D, Cai L, Liu T, Fu X, Huang E, Zhao Y, Zhang Y, Chen L, Wang A T, Luu W, Bi F, Shi S Shi . Siren’s song in the AI ocean: a survey on hallucination in large language models. 2023, arXiv preprint arXiv: 2309.01219
16	Y, Qin S, Hu Y, Lin W, Chen N, Ding , et al.. Tool learning with foundation models. 2024, arXiv preprint arXiv: 2304.08354
17	T, Schick J, Dwivedi-Yu R, Dessì R, Raileanu M, Lomeli E, Hambro L, Zettlemoyer N, Cancedda T Scialom . Toolformer: language models can teach themselves to use tools. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
18	Y, Qin S, Liang Y, Ye K, Zhu L, Yan Y, Lu Y, Lin X, Cong X, Tang B, Qian S, Zhao L, Hong R, Tian R, Xie J, Zhou M, Gerstein D, Li Z, Liu M Sun . ToolLLM: facilitating large language models to master 16000+ real-world APIs. In: Proceedings of the 12th International Conference on Learning Representations. 2024
19	Q, Tang Z, Deng H, Lin X, Han Q, Liang B, Cao L Sun . ToolAlpaca: generalized tool learning for language models with 3000 simulated cases. 2023, arXiv preprint arXiv: 2306.05301
20	H, Wang Y, Qin Y, Lin J Z, Pan K F Wong . Empowering large language models: tool learning for real-world interaction. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 2983–2986
21	S, Yao H, Chen J, Yang K Narasimhan . WebShop: towards scalable real-world web interaction with grounded language agents. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 20744–20757
22	A, Lazaridou E, Gribovskaya W, Stokowiec N Grigorev . Internet-augmented language models through few-shot prompting for open-domain question answering. 2022, arXiv preprint arXiv: 2203.05115
23	Y, Lu H, Yu D Khashabi . GEAR: augmenting language models with generalizable and efficient tool resolution. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 2024, 112–138
24	L, Pan X, Wu X, Lu A T, Luu W Y, Wang M Y, Kan P Nakov . Fact-checking complex claims with program-guided reasoning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 6981–7004
25	X, Wang Z, Wang J, Liu Y, Chen L, Yuan H, Peng H Ji . MINT: evaluating LLMs in multi-turn interaction with tools and language feedback. In: Proceedings of the 12th International Conference on Learning Representations. 2024
26	A, Parisi Y, Zhao N Fiedel . TALM: tool augmented language models. 2022, arXiv preprint arXiv: 2205.12255
27	E, Karpas O, Abend Y, Belinkov B, Lenz O, Lieber N, Ratner Y, Shoham H, Bata Y, Levine K, Leyton-Brown D, Muhlgay N, Rozen E, Schwartz G, Shachaf S, Shalev-Shwartz A, Shashua M Tenenholtz . MRKL systems: a modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. 2022, arXiv preprint arXiv: 2205.00445
28	R, Nakano J, Hilton S, Balaji J, Wu L, Ouyang C, Kim C, Hesse S, Jain V, Kosaraju W, Saunders X, Jiang K, Cobbe T, Eloundou G, Krueger K, Button M, Knight B, Chess J Schulman . WebGPT: Browser-assisted question-answering with human feedback. 2022, arXiv preprint arXiv: 2112.09332
29	D, Surís S, Menon C Vondrick . ViperGPT: visual inference via python execution for reasoning. In: Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. 2023, 11854–11864
30	M, Li Y, Zhao B, Yu F, Song H, Li H, Yu Z, Li F, Huang Y Li . API-Bank: a comprehensive benchmark for tool-augmented LLMs. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 3102−3116
31	Y, Huang J, Shi Y, Li C, Fan S, Wu Q, Zhang Y, Liu P, Zhou Y, Wan N Z, Gong L C Sun . MetaTool benchmark for large language models: deciding whether to use tools and which to use. In: Proceedings of the 12th International Conference on Learning Representations. 2024
32	Z, Chen W, Du W, Zhang K, Liu J, Liu M, Zheng J, Zhuo S, Zhang D, Lin K, Chen F Zhao . T-Eval: evaluating the tool utilization capability of large language models step by step. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 9510−9529
33	Q, Xu F, Hong B, Li C, Hu Z, Chen J Zhang . On the tool manipulation capability of open-source large language models. 2023, arXiv preprint arXiv: 2305.16504
34	S, Gao Z, Shi M, Zhu B, Fang X, Xin P, Ren Z, Chen J, Ma Z Ren . Confucius: iterative tool learning from introspection feedback by easy-to-difficult curriculum. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 18030−18038
35	Y, Zhao J, Wu X, Wang W, Tang D, Wang Rijke M De . Let me do it for you: towards LLM empowered recommendation via tool learning. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 1796−1806
36	W X, Zhao K, Zhou J, Li T, Tang X, Wang , et al.. A survey of large language models. 2024, arXiv preprint arXiv: 2303.18223
37	X, Huang W, Liu X, Chen X, Wang H, Wang D, Lian Y, Wang R, Tang E Chen . Understanding the planning of LLM agents: a survey. 2024, arXiv preprint arXiv: 2402.02716
38	S, Qiao Y, Ou N, Zhang X, Chen Y, Yao S, Deng C, Tan F, Huang H Chen . Reasoning with language model prompting: a survey. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 5368−5393
39	J, Sun C, Zheng E, Xie Z, Liu R, Chu , et al.. A survey of reasoning with foundation models. 2024, arXiv preprint arXiv: 2312.11562
40	L, Wang C, Ma X, Feng Z, Zhang H, Yang J, Zhang Z, Chen J, Tang X, Chen Y, Lin W X, Zhao Z, Wei J Wen . A survey on large language model based autonomous agents. Frontiers of Computer Science, 2024, 18( 6): 186345
41	T R, Sumers S, Yao K, Narasimhan T L Griffiths . Cognitive architectures for language agents. Transactions on Machine Learning Research, 2024, ISSN: 2835-8856
42	Z, Xi W, Chen X, Guo W, He Y, Ding , et al.. The rise and potential of large language model based agents: a survey. 2023, arXiv preprint arXiv: 2309.07864
43	Y, Gao Y, Xiong X, Gao K, Jia J, Pan Y, Bi Y, Dai J, Sun M, Wang H Wang . Retrieval-augmented generation for large language models: a survey. 2024, arXiv preprint arXiv: 2312.10997
44	P, Zhao H, Zhang Q, Yu Z, Wang Y, Geng F, Fu L, Yang W, Zhang J, Jiang B Cui . Retrieval-augmented generation for AI-generated content: a survey. 2024, arXiv preprint arXiv: 2402.19473
45	G, Mialon R, Dessì M, Lomeli C, Nalmpantis R, Pasunuru R, Raileanu B, Rozière T, Schick J, Dwivedi-Yu A, Celikyilmaz E, Grave Y, LeCun T Scialom . Augmented language models: a survey. Transactions on Machine Learning Research, 2023, ISSN:2835-8856
46	Z, Wang Z, Cheng H, Zhu D, Fried G Neubig . What are tools anyway? A survey from the language model perspective. 2024, arXiv preprint arXiv: 2403.15452
47	M, Komeili K, Shuster J Weston . Internet-augmented dialogue generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 8460–8478
48	Zhang K, Zhang H, Li G, Li J, Li Z, Jin Z. Toolcoder: Teach code generation models to use api search tools. arXiv preprint arXiv:2305.04032, 2023
49	W, Shi S, Min M, Yasunaga M, Seo R, James M, Lewis L, Zettlemoyer W T Yih . REPLUG: retrieval-augmented black-box language models. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 8371–8384
50	B, Paranjape S, Lundberg S, Singh H, Hajishirzi L, Zettlemoyer M T Ribeiro . ART: automatic multi-step reasoning and tool-use for large language models. 2023, arXiv preprint arXiv: 2303.09014
51	Z, Gou Z, Shao Y, Gong Y, Shen Y, Yang N, Duan W Chen . CRITIC: large language models can self-correct with tool-interactive critiquing. In: Proceedings of the 12th International Conference on Learning Representations. 2024
52	R, Thoppilan Freitas D, De J, Hall N, Shazeer A, Kulshreshtha , et al.. LaMDA: language models for dialog applications. 2022, arXiv preprint arXiv: 2201.08239
53	S G, Patil T, Zhang X, Wang J E Gonzalez . Gorilla: large language model connected with massive APIs. 2023, arXiv preprint arXiv: 2305.15334
54	S, Hao T, Liu Z, Wang Z Hu . ToolkenGPT: augmenting frozen language models with massive tools via tool embeddings. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
55	Y, Zhuang Y, Yu K, Wang H, Sun C Zhang . ToolQA: a dataset for LLM question answering with external tools. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
56	K, Zhang H, Chen L, Li W Wang . Syntax error-free and generalizable tool use for LLMS via finite-state decoding. 2024, arXiv preprint arXiv: 2310.07075v1
57	Y, Gu Y, Shu H, Yu X, Liu Y, Dong J, Tang J, Srinivasa H, Latapie Y Su . Middleware for LLMs: tools are instrumental for language agents in complex environments. 2024, arXiv preprint arXiv: 2402.14672
58	K, Cobbe V, Kosaraju M, Bavarian M, Chen H, Jun L, Kaiser M, Plappert J, Tworek J, Hilton R, Nakano C, Hesse J Schulman . Training verifiers to solve math word problems. 2021, arXiv preprint arXiv: 2110.14168
59	Z, Shao F, Huang M Huang . Chaining simultaneous thoughts for numerical reasoning. In: Proceedings of Findings of the Association for Computational Linguistics. 2022, 2533−2547
60	M, Kadlčík M, Štefánik O, Sotolar V Martinek . Calc-X and calcformers: empowering arithmetical chain-of-thought through interaction with symbolic systems. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 12101−12108
61	J, He-Yueya G, Poesia R E, Wang N D Goodman . Solving math word problems by combining language models with symbolic solvers. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023
62	B, Zhang K, Zhou X, Wei X, Zhao J, Sha S, Wang J R Wen . Evaluating and improving tool-augmented computation-intensive math reasoning. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 1023
63	Z, Gou Z, Shao Y, Gong Y, Shen Y, Yang M, Huang N, Duan W Chen . ToRA: a tool-integrated reasoning agent for mathematical problem solving. In: Proceedings of the 12th International Conference on Learning Representations. 2024
64	D, Das D, Banerjee S, Aditya A Kulkarni . MATHSENSEI: a tool-augmented large language model for mathematical reasoning. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 942–966
65	V, Veerendranath V, Shah K Ghate . Calc-CMU at semeval-2024 task 7: pre-calc – learning to use the calculator improves numeracy in language models. In: Proceedings of the 18th International Workshop on Semantic Evaluation. 2024, 1468–1475
66	A, Bulusu B, Man A, Jagmohan A, Vempaty J, Mari-Wyka D Akkil . MathViz-E: A case-study in domain-specialized tool-using agents. 2024, arXiv preprint arXiv: 2407.17544
67	L, Gao A, Madaan S, Zhou U, Alon P, Liu Y, Yang J, Callan G Neubig . PAL: program-aided language models. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 10764–10799
68	W, Chen X, Ma X, Wang W W Cohen . Program of thoughts prompting: disentangling computation from reasoning for numerical reasoning tasks. Transactions on Machine Learning Research, 2023, ISSN: 2835-8856
69	P, Lu B, Peng H, Cheng M, Galley K W, Chang Y N, Wu S C, Zhu J Gao . Chameleon: plug-and-play compositional reasoning with large language models. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
70	X, Wang H, Peng R, Jabbarvand H Ji . LETI: learning to generate from textual interactions. In: Proceedings of Findings of the Association for Computational Linguistics. 2024, 223−239
71	J, Wu R, Zhu N, Chen Q, Sun X, Li M Gao . Structure-aware fine-tuning for code pre-trained models. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 15362−15372
72	K, Zhang J, Li G, Li X, Shi Z Jin . CodeAgent: enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 13643−13658
73	T, Inaba H, Kiyomaru F, Cheng S Kurohashi . MultiTool-CoT: GPT-3 can use multiple external tools with chain of thought prompting. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 1522–1532
74	A M, Bran S, Cox O, Schilter C, Baldassari A D, White P Schwaller . Augmenting large language models with chemistry tools. Nature Machine Intelligence, 2024, 6( 5): 525–535
75	M C, Ramos C J, Collison A D White . A review of large language models and autonomous agents in chemistry. 2024, arXiv preprint arXiv: 2407.01603
76	Q, Jin Y, Yang Q, Chen Z Lu . GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 2024, 40( 2): btae075
77	A, Theuma E Shareghi . Equipping language models with tool use capability for tabular data analysis in finance. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics. 2024, 90–103
78	S, Gao Y, Wen M, Zhu J, Wei Y, Cheng Q, Zhang S Shang . Simulating financial market via large language model based agents. 2024, arXiv preprint arXiv: 2406.19966
79	W, Zhang L, Zhao H, Xia S, Sun J, Sun M, Qin X, Li Y, Zhao Y, Zhao X, Cai L T, Zheng X R, Wang B An . A multimodal foundation agent for financial trading: tool-augmented, diversified, and generalist. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 4314−4325
80	Q, Jin Z, Wang Y, Yang Q, Zhu D, Wright T, Huang W J, Wilbur Z, He A, Taylor Q, Chen Z Lu . AgentMD: empowering language agents for risk prediction with large-scale clinical tool learning. 2024, arXiv preprint arXiv: 2402.13225
81	B, Li T, Yan Y, Pan J, Luo R, Ji J, Ding Z, Xu S, Liu H, Dong Z, Lin Y Wang . MmedAgent: learning to use medical tools with multi-modal agent. 2024, arXiv preprint arXiv: 2407.02483
82	Z, Yang L, Li J, Wang K, Lin E, Azarnasab F, Ahmed Z, Liu C, Liu M, Zeng L Wang . MM-REACT: prompting chatGPT for multimodal reasoning and action. 2023, arXiv preprint arXiv: 2303.11381
83	Z, Liu Y, He W, Wang W, Wang Y, Wang S, Chen Q, Zhang Y, Yang Q, Li J, Yu K, Li Z, Chen X, Yang X, Zhu Y, Wang L, Wang P, Luo J, Dai Y Qiao . InternChat: solving vision-centric tasks by interacting with chatbots beyond language. 2023, arXiv preprint arXiv: 2305.05662v2
84	D, Gao L, Ji L, Zhou K Q, Lin J, Chen Z, Fan M Z Shou . AssistGPT: a general multi-modal assistant that can plan, execute, inspect, and learn. 2023, arXiv preprint arXiv: 2306.08640
85	Z, Gao Y, Du X, Zhang X, Ma W, Han S C, Zhu Q Li . CLOVA: a closed-loop visual assistant with tool usage and update. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 13258−13268
86	L, Zhao Y, Yang K, Zhang W, Shao Y, Zhang Y, Qiao P, Luo R Ji . DiffAgent: fast and accurate text-to-image API selection with large language model. In: Proceedings of 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 6390−6399
87	Z, Ma W, Huang J, Zhang T, Gupta R Krishna . m&m’s: a benchmark to evaluate tool-use for multi-step multi-modal tasks. 2024, arXiv preprint arXiv: 2403.11085
88	C, Wang W, Luo Q, Chen H, Mai J, Guo S, Dong X, Xuan Z, Li L, Ma S Gao . Tool-LMM: a large multi-modal model for tool agent learning. 2023, arXiv preprint arXiv: 2401.10727v1
89	Y, Shen K, Song X, Tan D, Li W, Lu Y Zhuang . HuggingGPT: solving AI tasks with chatGPT and its friends in hugging face. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2024, 36
90	B, Lyu X, Cong H, Yu P, Yang Y, Qin Y, Ye Y, Lu Z, Zhang Y, Yan Y, Lin Z, Liu M Sun . GitAgent: facilitating autonomous agent with GitHub by tool extension. 2023, arXiv preprint arXiv: 2312.17294
91	J, Wei X, Wang D, Schuurmans M, Bosma B, Ichter F, Xia E H, Chi Q V, Le D Zhou . Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 24824–24837
92	S, Yao J, Zhao D, Yu N, Du I, Shafran K R, Narasimhan Y Cao . ReAct: synergizing reasoning and acting in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023
93	Y, Song W, Xiong D, Zhu C, Li K, Wang Y, Tian S Li . RestGPT: connecting large language models with real-world applications via RESTful APIs. 2023, arXiv preprint arXiv: 2306.06624v1
94	J, Ruan Y, Chen B, Zhang Z, Xu T, Bao G, Du S, Shi H, Mao X, Zeng R Zhao . TPTU: task planning and tool usage of large language model-based AI agents. 2023, arXiv preprint arXiv: 2308.03427v1
95	Y, Zhuang X, Chen T, Yu S, Mitra V S, Bursztyn R A, Rossi S, Sarkhel C Zhang . ToolChain: efficient action space navigation in large language models with a search. In: Proceedings of the 12th International Conference on Learning Representations. 2024
96	Z, Liu Z, Lai Z, Gao E, Cui Z, Li X, Zhu L, Lu Q, Chen Y, Qiao J, Dai W Wang . ControlLLM: augment language models with tools by searching on graphs. 2023, arXiv preprint arXiv: 2310.17796
97	Y, Chen A, Lv T E, Lin C, Chen Y, Wu F, Huang Y, Li R Yan . Fortify the shortest stave in attention: enhancing context awareness of large language models for effective tool use. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 11160−11174
98	T, Huang D, Jung V, Kumar M, Kachuee X, Li P, Xu M Chen . Planning and editing what you retrieve for enhanced tool learning. In: Proceedings of Findings of the Association for Computational Linguistics. 2024, 975−988
99	Z, Shi S, Gao X, Chen Y, Feng L, Yan H, Shi D, Yin Z, Chen S, Verberne Z Ren . Chain of tools: large language model is an automatic multi-tool learner. 2024, arXiv preprint arXiv: 2405.16533v1
100	Wu X, Shen Y, Shan C, Song K, Wang S, Zhang B, Feng J, Cheng H, Chen W, Xiong Y, Li D. Can graph learning improve task planning? 2024, arXiv preprint arXiv: 2405.19119v1
101	Y, Liu Y, Yuan C, Wang J, Han Y, Ma L, Zhang N, Zheng H Xu . From summary to action: enhancing large language models for complex tasks with open world APIs. 2024, arXiv preprint arXiv: 2402.18157
102	Y, Zheng P, Li M, Yan J, Zhang F, Huang Y Liu . Budget-constrained tool learning with planning. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 9039−9052
103	C, Qu S, Dai X, Wei H, Cai S, Wang D, Yin J, Xu J R Wen . From exploration to mastery: enabling LLMs to master tools via self-driven interactions. 2024, arXiv preprint arXiv: 2410.08197
104	Y, Liang C, Wu T, Song W, Wu Y, Xia Y, Liu Y, Ou S, Lu L, Ji S, Mao Y, Wang L, Shou M, Gong N Duan . Taskmatrix. AI: Completing tasks by connecting foundation models with millions of APIs. Intelligent Computing, 2024, 3: 0063
105	C, Qian C, Xiong Z, Liu Z Liu . Toolink: linking toolkit creation and using through chain-of-solving on open-source model. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024, 831−854
106	Y, Kong J, Ruan Y, Chen B, Zhang T, Bao S, Shi G, Du X, Hu H, Mao Z, Li X, Zeng R Zhao . TPTU-v2: boosting task planning and tool usage of large language model-based agents in real-world systems. 2023, arXiv preprint arXiv: 2311.11315
107	W, Shen C, Li H, Chen M, Yan X, Quan H, Chen J, Zhang F Huang . Small LLMs are weak tool learners: a multi-LLM agent. 2024, arXiv preprint arXiv: 2401.07324
108	S, Gao J, Dwivedi-Yu P, Yu X E, Tan R, Pasunuru O, Golovneva K, Sinha A, Celikyilmaz A, Bosselut T Wang . Efficient tool use with chain-of-abstraction reasoning. 2024, arXiv preprint arXiv: 2401.17464
109	A, Gui J, Li Y, Dai N, Du H Xiao . Look before you leap: towards decision-aware and generalizable tool-usage for large language models. 2024, arXiv preprint arXiv: 2402.16696
110	Y, Ge W, Hua K, Mei J, Tan S, Xu Z, Li Y Zhang . OpenAGI: when LLM meets domain experts. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
111	Y, Wang J, Yu Z, Yao J, Zhang Y, Xie S, Tu Y, Fu Y, Feng J, Zhang J, Zhang B, Huang Y, Li H, Yuan L, Hou J, Li J Tang . A solution-based LLM API-using methodology for academic information seeking. 2024, arXiv preprint arXiv: 2405.15165
112	S, Chen Y, Wang Y F, Wu Q G, Chen Z, Xu W, Luo K, Zhang L Zhang . Advancing tool-augmented large language models: integrating insights from errors in inference trees. 2024, arXiv preprint arXiv: 2406.07115
113	Z, Liu T, Hoang J, Zhang M, Zhu T, Lan S, Kokane J, Tan W, Yao Z, Liu Y, Feng R, Murthy L, Yang S, Savarese J C, Niebles H, Wang S, Heinecke C Xiong . APIGen: automated pipeline for generating verifiable and diverse function-calling datasets. 2024, arXiv preprint arXiv: 2406.18518
114	Jones K Sparck . A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 1972, 28( 1): 11–21
115	S, Robertson H Zaragoza . The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 2009, 3( 4): 333–389
116	N, Reimers I Gurevych . Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 3982–3992
117	L, Xiong C, Xiong Y, Li K F, Tang J, Liu P N, Bennett J, Ahmed A Overwijk . Approximate nearest neighbor negative contrastive learning for dense text retrieval. In: Proceedings of the 9th International Conference on Learning Representations. 2021
118	S, Hofstätter S C, Lin J H, Yang J, Lin A Hanbury . Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021, 113–122
119	G, Izacard M, Caron L, Hosseini S, Riedel P, Bojanowski A, Joulin E Grave . Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research, 2022, ISSN: 2385-8856
120	L, Gao J Callan . Unsupervised corpus aware language model pre-training for dense passage retrieval. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 2843–2853
121	L, Yuan Y, Chen X, Wang Y R, Fung H, Peng H Ji . CRAFT: customizing LLMs by creating and retrieving from specialized toolsets. In: Proceedings of the 12th International Conference on Learning Representations. 2024
122	R, Anantha B, Bandyopadhyay A, Kashi S, Mahinder A W, Hill S Chappidi . ProTIP: progressive tool retrieval improves planning. 2023, arXiv preprint arXiv: 2312.10332
123	Y, Zheng P, Li W, Liu Y, Liu J, Luan B Wang . ToolRerank: adaptive and hierarchy-aware reranking for tool retrieval. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 16263−16273
124	Qu C, Dai S, Wei X, Cai H, Wang S, Yin D, Xu J, Wen J R. Towards completeness-oriented tool retrieval for large language models. In: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. 2024, 1930–1940
125	Z, Chen K, Zhou B, Zhang Z, Gong X, Zhao J R Wen . ChatCoT: tool-augmented chain-of-thought reasoning on chat-based large language models. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, 14777−14790
126	X, Liu Z, Peng X, Yi X, Xie L, Xiang Y, Liu D Xu . ToolNet: connecting large language models with massive tools via tool graph. 2024, arXiv preprint arXiv: 2403.00839
127	D, Mekala J, Weston J, Lanchantin R, Raileanu M, Lomeli J, Shang J Dwivedi-Yu . TOOLVERIFIER: generalization to new tools via self-verification. 2024, arXiv preprint arXiv: 2402.14158
128	S, Qiao H, Gui C, Lv Q, Jia H, Chen N Zhang . Making language models better tool learners with execution feedback. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics. 2024, 3550−3568
129	Y, Du F, Wei H Zhang . AnyTool: self-reflective, hierarchical agents for large-scale API calls. In: Proceedings of the 41st International Conference on Machine Learning. 2024
130	M, Fore S, Singh D Stamoulis . GeckOpt: LLM system efficiency via intent-based tool selection. In: Proceedings of Great Lakes Symposium on VLSI 2024. 2024, 353−354
131	Y, Zhang H, Cai X, Song Y, Chen R, Sun J Zheng . Reverse chain: a generic-rule for LLMs to master multi-API planning. In: Proceedings of Findings of the Association for Computational Linguistics. 2024, 302−325
132	S, Yuan K, Song J, Chen X, Tan Y, Shen R, Kan D, Li D Yang . EASYTOOL: enhancing LLM-based agents with concise tool instruction. 2024, arXiv preprint arXiv: 2401.06201
133	Z, Shi S, Gao X, Chen Y, Feng L, Yan H, Shi D, Yin P, Ren S, Verberne Z Ren . Learning to use tools via cooperative and interactive agents. 2024, arXiv preprint arXiv: 2403.03031v4
134	R, Yang L, Song Y, Li S, Zhao Y, Ge X, Li Y Shan . GPT4Tools: teaching large language model to use tools via self-instruction. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 36
135	L, Li Y, Chai S, Wang Y, Sun H, Tian N, Zhang H Wu . Tool-augmented reward modeling. In: Proceedings of the 12th International Conference on Learning Representations. 2024
136	B, Wang H, Fang J, Eisner Durme B, Van Y Su . LLMs in the imaginarium: tool learning through simulated trial and error. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024,10583−10604
137	F, Xu W, Shi E Choi . RECOMP: improving retrieval-augmented LMS with compression and selective augmentation. In: Proceedings of the 12th International Conference on Learning Representations. 2024
138	J, Ye G, Li S, Gao C, Huang Y, Wu S, Li X, Fan S, Dou Q, Zhang T, Gui X Huang . ToolEyes: fine-grained evaluation for tool learning capabilities of large language models in real-world scenarios. 2024, arXiv preprint arXiv: 2401.00741
139	S, Huang W, Zhong J, Lu Q, Zhu J, Gao W, Liu Y, Hou X, Zeng Y, Wang L, Shang X, Jiang R, Xu Q Liu . Planning, creation, usage: benchmarking LLMs for comprehensive tool utilization in real-world complex scenarios. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 4363−4400
140	M, Wu T, Zhu H, Han C, Tan X, Zhang W Chen . Seal-tools: self-instruct tool learning dataset for agent tuning and detailed benchmark. 2024, arXiv preprint arXiv: 2405.08355v1
141	K, Basu I, Abdelaziz S, Chaudhury S, Dan M, Crouse A, Munawar V, Austel S, Kumaravel V, Muthusamy P, Kapanipathi L A Lastras . API-BLEND: a comprehensive corpora for training and benchmarking API LLMs. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 12859−12870
142	H, Shen Y, Li D, Meng D, Cai S, Qi L, Zhang M, Xu Y Ma . ShortcutsBench: a large-scale real-world benchmark for API-based agents. 2024, arXiv preprint arXiv: 2407.00132v2
143	J, Wang Z, Ma Y, Li S, Zhang C, Chen K, Chen X Le . GTA: a benchmark for general tool agents. 2024, arXiv preprint arXiv: 2407.08713
144	K, Ning Y, Su X, Lv Y, Zhang J, Liu K, Liu J Xu . WTU-EVAL: a whether-or-not tool usage evaluation benchmark for large language models. 2024, arXiv preprint arXiv: 2407.12823
145	H, Trivedi T, Khot M, Hartmann R, Manku V, Dong E, Li S, Gupta A, Sabharwal N Balasubramanian . AppWorld: a controllable world of apps and people for benchmarking interactive coding agents. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 16022−16076
146	Y, Ruan H, Dong A, Wang S, Pitis Y, Zhou J, Ba Y, Dubois C J, Maddison T Hashimoto . Identifying the risks of lm agents with an lm-emulated sandbox. In: Proceedings of the 12th International Conference on Learning Representations. 2024
147	N, Farn R Shin . ToolTalk: evaluating tool-usage in a conversational setting. 2023, arXiv preprint arXiv: 2311.10775
148	Y, Zhong M, Qi R, Wang Y, Qiu Y, Zhang H Ma . VioTGPT: learning to schedule vision tools towards intelligent video internet of things. 2023, arXiv preprint arXiv: 2312.00401
149	J, Ye Y, Wu S, Gao C, Huang S, Li G, Li X, Fan Q, Zhang T, Gui X Huang . RoTBench: a multi-level benchmark for evaluating the robustness of large language models in tool learning. 2024, arXiv preprint arXiv: 2401.08326
150	J, Ye S, Li G, Li C, Huang S, Gao Y, Wu Q, Zhang T, Gui X Huang . ToolSword: unveiling safety issues of large language models in tool learning across three stages. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 2181−2211
151	Y, Ma Z, Gou J, Hao R, Xu S, Wang L, Pan Y, Yang Y, Cao A, Sun H, Awadalla W Chen . SciAgent: Tool-augmented language models for scientific reasoning. 2024, arXiv preprint arXiv: 2402.11451
152	Z, Guo S, Cheng H, Wang S, Liang Y, Qin P, Li Z, Liu M, Sun Y Liu . StableToolBench: towards stable large-scale benchmarking on tool learning of large language models. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 11143−11156
153	Q, Zhan Z, Liang Z, Ying D Kang . InjecAgent: benchmarking indirect prompt injections in tool-integrated large language model agents. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 10471−10506
154	Z, Guo Y, Huang D Xiong . CToolEval: a Chinese benchmark for LLM-powered agent evaluation in real-world API interactions. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 15711−15724
155	J, Lu T, Holleis Y, Zhang B, Aumayer F, Nan F, Bai S, Ma S, Ma M, Li G, Yin Z, Wang R Pang . ToolSandbox: a stateful, conversational, interactive evaluation benchmark for LLM tool use capabilities. 2024, arXiv preprint arXiv: 2408.04682
156	M Zhu . Recall, precision and average precision. University of Waterloo, Dissertation, 2004, 6
157	K, Järvelin J Kekäläinen . Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 2002, 20( 4): 422–446
158	K, Papineni S, Roukos T, Ward W J Zhu . Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, 311–318
159	C Y Lin . ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Text Summarization Branches Out. 2004, 74–81
160	M, Blackwell S, Iacus G, King G Porro . CEM: coarsened exact matching in Stata. The Stata Journal: Promoting communications on statistics and Stata, 2009, 9( 4): 524–546
161	L, Ouyang J, Wu X, Jiang D, Almeida C, Wainwright P, Mishkin C, Zhang S, Agarwal K, Slama A, Ray J, Schulman J, Hilton F, Kelton L, Miller M, Simens A, Askell P, Welinder P, Christiano J, Leike R Lowe . Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011
162	X Q, Dao N B Le . Investigating the effectiveness of chatGPT in mathematical reasoning and problem solving: evidence from the Vietnamese national high school graduation examination. 2023, arXiv preprint arXiv: 2306.06331
163	Wei T, Luan J, Liu W, Dong S, Wang B. CMATH: can your language model pass Chinese elementary school math test? 2023, arXiv preprint arXiv: 2306.16636
164	M, Chen J, Tworek H, Jun Q, Yuan Oliveira Pinto H P, de , et al.. Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374
165	J, Austin A, Odena M, Nye M, Bosma H, Michalewski D, Dohan E, Jiang C, Cai M, Terry Q, Le C Sutton . Program synthesis with large language models. 2021, arXiv preprint arXiv: 2108.07732
166	S, Singh M, Fore D Stamoulis . Evaluating tool-augmented agents in remote sensing platforms. 2024, arXiv preprint arXiv: 2405.00709v1
167	P, Linardatos V, Papastefanopoulos S Kotsiantis . Explainable AI: a review of machine learning interpretability methods. Entropy, 2021, 23( 1): 18
168	H, Zhao H, Chen F, Yang N, Liu H, Deng H, Cai S, Wang D, Yin M Du . Explainability for large language models: a survey. ACM Transactions on Intelligent Systems and Technology, 2024, 15( 2): 20
169	L, Weidinger J, Mellor M, Rauh C, Griffin J, Uesato , et al.. Ethical and social risks of harm from language models. 2021, arXiv preprint arXiv: 2112.04359
170	T, Gao H, Yen J, Yu D Chen . Enabling large language models to generate text with citations. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 6465–6488
171	H, Sun H, Cai B, Wang Y, Hou X, Wei S, Wang Y, Zhang D Yin . Towards verifiable text generation with evolving memory and self-reflection. 2024, arXiv preprint arXiv: 2312.09075
172	E, Wallace S, Feng N, Kandpal M, Gardner S Singh . Universal adversarial triggers for attacking and analyzing NLP. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 2153–2162
173	D, Jin Z, Jin J T, Zhou P Szolovits . Is Bert really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8018–8025
174	F, Wu N, Zhang S, Jha P, McDaniel C Xiao . A new era in LLM security: exploring security concerns in real-world LLM-based systems. 2024, arXiv preprint arXiv: 2402.18649
175	J Zhang . Graph-toolFormer: to empower LLMs with graph reasoning ability via prompt augmented by chatGPT. 2023, arXiv preprint arXiv: 2304.11116
176	C, Li R, Yang T, Li M, Bafarassat K, Sharifi D, Bergemann Z Yang . STRIDE: a tool-assisted LLM agent framework for strategic and interactive decision-making. 2024, arXiv preprint arXiv: 2405.16376
177	W, Huang P, Abbeel D, Pathak I Mordatch . Language models as zero-shot planners: extracting actionable knowledge for embodied agents. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 9118–9147
178	I C, Chern S, Chern S, Chen W, Yuan K, Feng C, Zhou J, He G, Neubig P Liu . FacTool: factuality detection in generative AI–a tool augmented framework for multi-task and multi-domain scenarios. 2023, arXiv preprint arXiv: 2307.13528
179	S, Xu L, Pang H, Shen X, Cheng T S Chua . Search-in-the-chain: interactively enhancing large language models with search for knowledge-intensive tasks. In: Proceedings of ACM Web Conference 2024. 2024, 1362–1373
180	G, Kim P, Baldi S McAleer . Language models can solve computer tasks. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 1723
181	Y, Liu X, Peng Y, Zhang J, Cao X, Zhang S, Cheng X, Wang J, Yin T Du . Tool-planner: dynamic solution tree planning for large language model with tool clustering. 2024, arXiv preprint arXiv: 2406.03807v1
182	P, Erbacher L, Falissard V, Guigue L Soulier . Navigating uncertainty: optimizing API dependency for hallucination reduction in closed-book QA. In: Proceedings of the 46th European Conference on Information Retrieval. 2024, 393–402
183	Q, Xu Y, Li H, Xia W Li . Enhancing tool retrieval with iterative feedback from large language models. 2024, arXiv preprint arXiv: 2406.17465
184	Xiao S, Liu Z, Zhang P, Muennighoff N. C-pack: Pack-aged resources to advance general chinese embedding, 2023
185	Team Google Gemini . Gemini: a family of highly capable multimodal models. 2024, arXiv preprint arXiv: 2312.11805
186	C Y, Hsieh S A, Chen C L, Li Y, Fujii A, Ratner C Y, Lee R, Krishna T Pfister . Tool documentation enables zero-shot tool-usage with large language models. 2023, arXiv preprint arXiv: 2308.00675
187	Y, Xu Y, Feng H, Mu Y, Hou Y, Li X, Wang W, Zhong Z, Li D, Tu Q, Zhu M, Zhang W Che . Concise and precise context compression for tool-using language models. In: Proceedings of Findings of the Association for Computational Linguistics ACL 2024. 2024, 16430−16441
188	Y, Shen K, Song X, Tan W, Zhang K, Ren S, Yuan W, Lu D, Li Y Zhuang . TaskBench: benchmarking large language models for task automation. 2023, arXiv preprint arXiv: 2311.18760
189	H, Wang H, Wang L, Wang M, Hu R, Wang B, Xue H, Lu F, Mi K F Wong . TPE: towards better compositional reasoning over conceptual tools with multi-persona collaboration. 2023, arXiv preprint arXiv: 2309.16090
190	C, Qian C, Han Y, Fung Y, Qin Z, Liu H Ji . CREATOR: tool creation for disentangling abstract and concrete reasoning of large language models. In: Proceedings of Findings of Association for Computational Linguistics: EMNLP 2023. 2023, 6922–6939
191	A, Jacovi A, Caciularu J, Herzig R, Aharoni B, Bohnet M Geva . A comprehensive evaluation of tool-assisted generation strategies. In: Proceedings of Findings of Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 13856−13878
192	D, Nathani D, Wang L, Pan W Y Wang . MAF: multi-aspect feedback for improving reasoning in large language models. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 6591−6616
193	Y, Yao J, Duan K, Xu Y, Cai Z, Sun Y Zhang . A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. High-Confidence Computing, 2024, 4: 100211
194	T, Cui Y, Wang C, Fu Y, Xiao S, Li X, Deng Y, Liu Q, Zhang Z, Qiu P, Li Z, Tan J, Xiong X, Kong Z, Wen K, Xu Q Li . Risk taxonomy, mitigation, and assessment benchmarks of large language model systems. 2024, arXiv preprint arXiv: 2401.05778
195	B C, Das M H, Amini Y Wu . Security and privacy challenges of large language models: a survey. 2024, arXiv preprint arXiv: 2402.00888
196	Y, Qin Z, Cai D, Jin L, Yan S, Liang K, Zhu Y, Lin X, Han N, Ding H, Wang R, Xie F, Qi Z, Liu M, Sun J Zhou . WebCPM: interactive web search for Chinese long-form question answering. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 8968−8988
197	X, Miao G, Oliaro Z, Zhang X, Cheng H, Jin T, Chen Z Jia . Towards efficient generative large language model serving: a survey from algorithms to systems. 2023, arXiv preprint arXiv: 2312.15234
198	T, Cai X, Wang T, Ma X, Chen D Zhou . Large language models as tool makers. In: Proceedings of the 12th International Conference on Learning Representations. 2024
199	Z, Wang G, Neubig D Fried . TroVE: inducing verifiable and efficient toolboxes for solving programmatic tasks. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 51177−51191

[1]

Download

[1]	Yuren MAO, Yuhang GE, Yijiang FAN, Wenyi XU, Yu MI, Zhonghao HU, Yunjun GAO. A survey on LoRA of large language models[J]. Front. Comput. Sci., 2025, 19(7): 197605-.
[2]	Cong GUAN, Ke XUE, Chunpeng FAN, Feng CHEN, Lichao ZHANG, Lei YUAN, Chao QIAN, Yang YU. Open and real-world human-AI coordination by heterogeneous training with communication[J]. Front. Comput. Sci., 2025, 19(4): 194314-.
[3]	Derong XU, Wei CHEN, Wenjun PENG, Chao ZHANG, Tong XU, Xiangyu ZHAO, Xian WU, Yefeng ZHENG, Yang WANG, Enhong CHEN. Large language models for generative information extraction: a survey[J]. Front. Comput. Sci., 2024, 18(6): 186357-.
[4]	Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU. Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation[J]. Front. Comput. Sci., 2024, 18(6): 186331-.
[5]	Lei WANG, Chen MA, Xueyang FENG, Zeyu ZHANG, Hao YANG, Jingsen ZHANG, Zhiyuan CHEN, Jiakai TANG, Xu CHEN, Yankai LIN, Wayne Xin ZHAO, Zhewei WEI, Jirong WEN. A survey on large language model based autonomous agents[J]. Front. Comput. Sci., 2024, 18(6): 186345-.
[6]	Shuzhe LI, Hongwei XU, Qiong LI, Qi HAN. Simulation study on the security of consensus algorithms in DAG-based distributed ledger[J]. Front. Comput. Sci., 2024, 18(3): 183704-.
[7]	Lijun WU, Kaile SU, Yabiao HAN, Jingyu CHEN, Xiangyu LU. Reasoning about knowledge, belief and certainty in hierarchical multi-agent systems[J]. Front. Comput. Sci., 2017, 11(3): 499-510.
[8]	Qingliang CHEN,Kaile SU,Yong HU,Guiwu HU. A complete coalition logic of temporal knowledge for multi-agent systems[J]. Front. Comput. Sci., 2015, 9(1): 75-86.
[9]	Cuiyun HU, Xinjun MAO, Mengjun LI, Zhi ZHU. Organization-based agent-oriented programming: model, mechanisms, and language[J]. Front. Comput. Sci., 2014, 8(1): 33-51.
[10]	Weidong MIN, Ke CHEN, Yongzhen KE. A matrix grammar approach for automatic distributed network resource management[J]. Front Comput Sci, 2013, 7(4): 583-594.
[11]	Wei DUAN, Xiaogang QIU. Fostering artificial societies using social learning and social control in parallel emergency management systems[J]. Front Comput Sci, 2012, 6(5): 604-610.
[12]	Xiaojun BI, Jing XIAO. Classification-based self-adaptive differential evolution and its application in multi-lateral multi-issue negotiation[J]. Front Comput Sci, 2012, 6(4): 442-461.
[13]	Yutaka OKAIE, Tadashi NAKANO. Non-cooperative optimization games in market-oriented overlay networks: an integrated model of resource pricing and network formation[J]. Front Comput Sci Chin, 2011, 5(4): 496-505.
[14]	Ming C. LIN, Dinesh MANOCHA, . Virtual cityscapes: recent advances in crowd modeling and traffic simulation[J]. Front. Comput. Sci., 2010, 4(3): 405-416.

Viewed

Full text

Abstract

Cited

Shared

Discussed