Please wait a minute...
Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184

Front. Inform. Technol. Electron. Eng    2020, Vol. 21 Issue (5) : 777-795    https://doi.org/10.1631/FITEE.1900641
Orginal Article
Proximal policy optimization with an integral compensator for quadrotor control
Huan HU, Qing-ling WANG()
School of Automation, Southeast University, Nanjing 210096, China
 Download: PDF(8776 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.

Keywords Reinforcement learning      Proximal policy optimization      Quadrotor control      Neural network     
Corresponding Author(s): Qing-ling WANG   
Issue Date: 17 June 2020
 Cite this article:   
Huan HU,Qing-ling WANG. Proximal policy optimization with an integral compensator for quadrotor control[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(5): 777-795.
 URL:  
https://academic.hep.com.cn/fitee/EN/10.1631/FITEE.1900641
https://academic.hep.com.cn/fitee/EN/Y2020/V21/I5/777
[1] FITEE-0777-20010-HH_suppl_1 Download
[2] FITEE-0777-20010-HH_suppl_2 Download
[1] Xiang-zhou HUANG, Si-liang TANG, Yin ZHANG, Bao-gang WEI. Hybrid embedding and joint training of stacked encoder for opinion questionmachine reading comprehension[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(9): 1346-1355.
[2] Yi-ning CHEN, Ni-qi LYU, Guang-hua SONG, Bo-wei YANG, Xiao-hong JIANG. Atraffic-aware Q-network enhanced routing protocol based onGPSRfor unmanned aerial vehicle ad-hoc networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(9): 1308-1320.
[3] Yun-peng WANG, Kun-xian ZHENG, Da-xin TIAN, Jian-shan ZHOU. Cooperative channel assignment forVANETs based on multiagent reinforcement learning[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1047-1058.
[4] Liang HOU, Xiao-yi LUO, Zi-yang WANG, Jun LIANG. Representation learning via a semi-supervised stacked distance autoencoder for image classification[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1005-1018.
[5] Shu-you ZHANG, Ye GU, Guo-dong YI, Zi-li WANG. Aknowledge matching approach based onmulticlassification radial basis function neural network for knowledge push system[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 981-994.
[6] Zhao-qi WU, Jin WEI, Fan ZHANG, Wei GUO, Guang-wei XIE. MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1034-1046.
[7] Xu-na WANG, Qing-mei TAN. DAN: a deep association neural network approach for personalization recommendation[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 963-980.
[8] Li-ping CHEN, Hao YIN, Li-guo YUAN, António M. LOPES, J. A. Tenreiro MACHADO, Ran-chao WU. Anovel color image encryption algorithm based on a fractional-order discrete chaotic neural network and DNAsequence operations[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(6): 866-879.
[9] K. UDHAYAKUMAR, R. RAKKIYAPPAN, Jin-de CAO, Xue-gang TAN. Mittag-Leffler stability analysis ofmultiple equilibrium points in impulsive fractional-order quaternion-valued neural networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(2): 234-246.
[10] Chaouki AOUITI, Mahjouba Ben REZEG, Yang CAO. New results on impulsive type inertial bidirectional associativememory neural networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(2): 324-339.
[11] Hao-nan WANG, Ning LIU, Yi-yun ZHANG, Da-wei FENG, Feng HUANG, Dong-sheng LI, Yi-ming ZHANG. Deep reinforcement learning: a survey[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(12): 1726-1744.
[12] Hao WANG, Zhi-yuan WANG, Ben-dong WANG, Zhuo-qun YU, Zhong-he JIN, John L. CRASSIDIS. Anartificial intelligence enhanced star identification algorithm[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(11): 1661-1670.
[13] Si-yue YU, Jian PU. Aggregated context network for crowd counting[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(11): 1626-1638.
[14] Guan-qing LI, Zhi-yong SONG, Qiang FU. A convolutional neural network based approach to sea clutter suppression for small boat detection[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(10): 1504-1520.
[15] Tian-yang ZHOU, Yi-chao ZANG, Jun-hu ZHU, Qing-xian WANG. NIG-AP: a newmethod for automated penetration testing[J]. Front. Inform. Technol. Electron. Eng, 2019, 20(9): 1277-1298.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed