Proximal policy optimization with an integral compensator for quadrotor control

doi:10.1631/FITEE.1900641

Front. Inform. Technol. Electron. Eng

2020, Vol. 21

Issue (5) : 777-795 https://doi.org/10.1631/FITEE.1900641

Orginal Article

Proximal policy optimization with an integral compensator for quadrotor control

Huan HU, Qing-ling WANG(

)

School of Automation, Southeast University, Nanjing 210096, China

Download: PDF(8776 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.

Keywords Reinforcement learning Proximal policy optimization Quadrotor control Neural network

Corresponding Author(s): Qing-ling WANG

Issue Date: 17 June 2020

Cite this article:

Huan HU,Qing-ling WANG. Proximal policy optimization with an integral compensator for quadrotor control[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(5): 777-795.

URL:

https://academic.hep.com.cn/fitee/EN/10.1631/FITEE.1900641
https://academic.hep.com.cn/fitee/EN/Y2020/V21/I5/777

[1]	FITEE-0777-20010-HH_suppl_1	Download
[2]	FITEE-0777-20010-HH_suppl_2	Download

[1]	Xiang-zhou HUANG, Si-liang TANG, Yin ZHANG, Bao-gang WEI. Hybrid embedding and joint training of stacked encoder for opinion questionmachine reading comprehension[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(9): 1346-1355.
[2]	Yi-ning CHEN, Ni-qi LYU, Guang-hua SONG, Bo-wei YANG, Xiao-hong JIANG. Atraffic-aware Q-network enhanced routing protocol based onGPSRfor unmanned aerial vehicle ad-hoc networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(9): 1308-1320.
[3]	Yun-peng WANG, Kun-xian ZHENG, Da-xin TIAN, Jian-shan ZHOU. Cooperative channel assignment forVANETs based on multiagent reinforcement learning[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1047-1058.
[4]	Liang HOU, Xiao-yi LUO, Zi-yang WANG, Jun LIANG. Representation learning via a semi-supervised stacked distance autoencoder for image classification[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1005-1018.
[5]	Shu-you ZHANG, Ye GU, Guo-dong YI, Zi-li WANG. Aknowledge matching approach based onmulticlassification radial basis function neural network for knowledge push system[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 981-994.
[6]	Zhao-qi WU, Jin WEI, Fan ZHANG, Wei GUO, Guang-wei XIE. MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 1034-1046.
[7]	Xu-na WANG, Qing-mei TAN. DAN: a deep association neural network approach for personalization recommendation[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(7): 963-980.
[8]	Li-ping CHEN, Hao YIN, Li-guo YUAN, António M. LOPES, J. A. Tenreiro MACHADO, Ran-chao WU. Anovel color image encryption algorithm based on a fractional-order discrete chaotic neural network and DNAsequence operations[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(6): 866-879.
[9]	K. UDHAYAKUMAR, R. RAKKIYAPPAN, Jin-de CAO, Xue-gang TAN. Mittag-Leffler stability analysis ofmultiple equilibrium points in impulsive fractional-order quaternion-valued neural networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(2): 234-246.
[10]	Chaouki AOUITI, Mahjouba Ben REZEG, Yang CAO. New results on impulsive type inertial bidirectional associativememory neural networks[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(2): 324-339.
[11]	Hao-nan WANG, Ning LIU, Yi-yun ZHANG, Da-wei FENG, Feng HUANG, Dong-sheng LI, Yi-ming ZHANG. Deep reinforcement learning: a survey[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(12): 1726-1744.
[12]	Hao WANG, Zhi-yuan WANG, Ben-dong WANG, Zhuo-qun YU, Zhong-he JIN, John L. CRASSIDIS. Anartificial intelligence enhanced star identification algorithm[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(11): 1661-1670.
[13]	Si-yue YU, Jian PU. Aggregated context network for crowd counting[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(11): 1626-1638.
[14]	Guan-qing LI, Zhi-yong SONG, Qiang FU. A convolutional neural network based approach to sea clutter suppression for small boat detection[J]. Front. Inform. Technol. Electron. Eng, 2020, 21(10): 1504-1520.
[15]	Tian-yang ZHOU, Yi-chao ZANG, Jun-hu ZHU, Qing-xian WANG. NIG-AP: a newmethod for automated penetration testing[J]. Front. Inform. Technol. Electron. Eng, 2019, 20(9): 1277-1298.

Viewed

Full text

Abstract

Cited

Shared

Discussed