Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (4) : 174328    https://doi.org/10.1007/s11704-022-1612-9
LETTER
A stable actor-critic algorithm for solving robotic tasks with multiple constraints
Peiyao ZHAO, Fei ZHU(), Quan LIU, Xinghong LING
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
 Download: PDF(2368 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Corresponding Author(s): Fei ZHU   
Just Accepted Date: 07 September 2022   Issue Date: 05 December 2022
 Cite this article:   
Peiyao ZHAO,Fei ZHU,Quan LIU, et al. A stable actor-critic algorithm for solving robotic tasks with multiple constraints[J]. Front. Comput. Sci., 2023, 17(4): 174328.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1612-9
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I4/174328
<threshold threshold
Increased Not update Update
Decreased Update Not update
Stable Update Update
Tab.1  Update strategy considering the growth trend
Fig.1  Structure of Algorithm 1
  
SCPOCA RCPO PPO PPOCA TH
4277.54 2000.32 5177.66 5209.23 4000
?775.81 ?36.33 ?835.45 ?804.66 ?750
3139.76 256.56 4070.88 4140.9 3200
?134.4 ?267.89 ?105.77 ?175.0 ?160
Tab.2  Average results of Ant environment for main objective, electronic cost, progress reward and joint cost respectively
SCPOCA RCPO PPO PPOCA TH
3948.49 1637.32 4020.68 2650.7 3200
?468.07 ?61.42 ?841.3 ?371.08 ?600
3190.4 71.53 3335.42 1404.46 3000
?506.2 ?416.89 ?492.15 ?430.47 ?400
Tab.3  Average results of Halfcheetah environment for main objective, electronic cost, progress reward and joint cost respectively
SCPOCA RCPO PPO PPOCA TH
4325.3 3877.45 5053.3 4445.24 4000
?646.58 ?516.51 ?902.67 ?738.21 ?650
3069.84 2390.18 3955.2 3250.03 3200
?247.87 ?214.76 ?297.78 ?242.46 ?250
Tab.4  Average results of Hopper environment for main objective, electronic cost, progress reward and joint cost respectively
SCPOCA RCPO PPO PPOCA TH
4420.84 4084.54 5216.68 4114.5 4000
?800.1 ?169.26 ?1130.12 ?739.63 ?800
1910.41 486.02 3086.11 1485.78 2400
?389.47 ?277.46 ?832.53 ?696.58 ?800
Tab.5  Average results of Humanoid environment for main objective, electronic cost, progress reward and joint cost respectively
SCPOCA RCPO PPO PPOCA TH
3639.66 1803.27 3020.74 1848.27 3700
?689.81 ?47.91 ?601.12 ?44.48 ?700
2599.02 183.88 1693.99 59.11 2000
?317.25 ?378.83 ?119.4 ?214.16 ?200
Tab.6  Average results of Walker2D environment for main objective, electronic cost, progress reward and joint cost respectively
Fig.2  Average results of Walker2D
1 D, Silver J, Schrittwieser K, Simonyan I, Antonoglou A, Huang A, Guez T, Hubert L, Baker M, Lai A, Bolton Y, Chen T, Lillicrap F, Hui L, Sifre den Driessche G, van T, Graepel D Hassabis . Mastering the game of Go without human knowledge. Nature, 2017, 550( 7676): 354–359
2 A, Wachi Y Sui . Safe reinforcement learning in constrained Markov decision processes. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 908
3 M, Yu Z, Yang M, Kolar Z Wang . Convergent policy optimization for safe reinforcement learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 281−293
4 S, Xiao L, Guo Z, Jiang L, Lv Y, Chen J, Zhu S Yang . Model-based constrained MDP for budget allocation in sequential incentive marketing. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 971−980
5 H M, Le C, Voloshin Y Yue . Batch policy learning under constraints. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3703−3712
[1] FCS-21612-OF-PZ_suppl_1 Download
[2] FCS-21612-OF-PZ_suppl_2 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed