A stable actor-critic algorithm for solving robotic tasks with multiple constraints

doi:10.1007/s11704-022-1612-9

Front. Comput. Sci.

2023, Vol. 17

Issue (4) : 174328 https://doi.org/10.1007/s11704-022-1612-9

LETTER

A stable actor-critic algorithm for solving robotic tasks with multiple constraints

Peiyao ZHAO, Fei ZHU(

), Quan LIU, Xinghong LING

School of Computer Science and Technology, Soochow University, Suzhou 215006, China

Download: PDF(2368 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Corresponding Author(s): Fei ZHU

Just Accepted Date: 07 September 2022 Issue Date: 05 December 2022

Cite this article:

Peiyao ZHAO,Fei ZHU,Quan LIU, et al. A stable actor-critic algorithm for solving robotic tasks with multiple constraints[J]. Front. Comput. Sci., 2023, 17(4): 174328.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1612-9
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I4/174328

	<threshold	$≥$ threshold
Increased	Not update	Update
Decreased	Update	Not update
Stable	Update	Update

Tab.1 Update strategy considering the growth trend

Fig.1 Structure of Algorithm 1

Tab.2 Average results of Ant environment for main objective, electronic cost, progress reward and joint cost respectively

Tab.3 Average results of Halfcheetah environment for main objective, electronic cost, progress reward and joint cost respectively

Tab.4 Average results of Hopper environment for main objective, electronic cost, progress reward and joint cost respectively

Tab.5 Average results of Humanoid environment for main objective, electronic cost, progress reward and joint cost respectively

Tab.6 Average results of Walker2D environment for main objective, electronic cost, progress reward and joint cost respectively

Fig.2 Average results of Walker2D

1	D, Silver J, Schrittwieser K, Simonyan I, Antonoglou A, Huang A, Guez T, Hubert L, Baker M, Lai A, Bolton Y, Chen T, Lillicrap F, Hui L, Sifre den Driessche G, van T, Graepel D Hassabis . Mastering the game of Go without human knowledge. Nature, 2017, 550( 7676): 354–359
2	A, Wachi Y Sui . Safe reinforcement learning in constrained Markov decision processes. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 908
3	M, Yu Z, Yang M, Kolar Z Wang . Convergent policy optimization for safe reinforcement learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 281−293
4	S, Xiao L, Guo Z, Jiang L, Lv Y, Chen J, Zhu S Yang . Model-based constrained MDP for budget allocation in sequential incentive marketing. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 971−980
5	H M, Le C, Voloshin Y Yue . Batch policy learning under constraints. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3703−3712

[1]	FCS-21612-OF-PZ_suppl_1	Download
[2]	FCS-21612-OF-PZ_suppl_2	Download

Viewed

Full text

Abstract

Cited

Shared

Discussed