|
|
Proximal policy optimization with an integral compensator for quadrotor control |
Huan HU, Qing-ling WANG( ) |
School of Automation, Southeast University, Nanjing 210096, China |
|
|
Abstract We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.
|
Keywords
Reinforcement learning
Proximal policy optimization
Quadrotor control
Neural network
|
Corresponding Author(s):
Qing-ling WANG
|
Issue Date: 17 June 2020
|
|
[1] |
FITEE-0777-20010-HH_suppl_1
|
Download
|
[2] |
FITEE-0777-20010-HH_suppl_2
|
Download
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|