Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (5) : 155333    https://doi.org/10.1007/s11704-020-0431-0
RESEARCH ARTICLE
Parallel exploration via negatively correlated search
Peng YANG, Qi YANG, Ke TANG(), Xin YAO
Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China.
 Download: PDF(876 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Effective exploration is key to a successful search process. The recently proposed negatively correlated search (NCS) tries to achieve this by coordinated parallel exploration, where a set of search processes are driven to be negatively correlated so that different promising areas of the search space can be visited simultaneously. Despite successful applications of NCS, the negatively correlated search behaviors were mostly devised by intuition, while deeper (e.g., mathematical) understanding is missing. In this paper, a more principled NCS, namely NCNES, is presented, showing that the parallel exploration is equivalent to a process of seeking probabilistic models that both lead to solutions of high quality and are distant from previous obtained probabilistic models. Reinforcement learning, for which exploration is of particular importance, are considered for empirical assessment. The proposed NCNES is applied to directly train a deep convolution network with 1.7 million connection weights for playing Atari games. Empirical results show that the significant advantages of NCNES, especially on games with uncertain and delayed rewards, can be highly owed to the effective parallel exploration ability.

Keywords evolutionary computation      reinforcement learning      exploration     
Corresponding Author(s): Ke TANG   
Just Accepted Date: 16 March 2021   Issue Date: 30 August 2021
 Cite this article:   
Peng YANG,Qi YANG,Ke TANG, et al. Parallel exploration via negatively correlated search[J]. Front. Comput. Sci., 2021, 15(5): 155333.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-0431-0
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I5/155333
1 K Tang, P Yang, X Yao. Negatively correlated search. IEEE Journal on Selected Areas in Communications, 2016, 34(3): 542–550
https://doi.org/10.1109/JSAC.2016.2525458
2 T Back. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, 1996
https://doi.org/10.1093/oso/9780195099713.001.0001
3 M ˇ Crepinšek , S H Liu, M Mernik. Exploration and exploitation in evolutionary algorithms: a survey. ACM Computing Surveys (CSUR), 2013, 45(3): 1–33
https://doi.org/10.1145/2480741.2480752
4 Q Niu, B Liu, K Jiang, H Wang. An improved negatively correlated search inspired by Particle Swarm Optimization. Journal of Physics: Conference Series, IOP Publishing, 2019, 1267(1): 12–37
https://doi.org/10.1088/1742-6596/1267/1/012037
5 S Wang, X Yang, Z Cai, L Zou, S Gao. An improved firefly algorithm enhanced by negatively correlated search mechanism. In: Proceedings of IEEE International Conference on Progress in Informatics and Computing. 2018, 67–72
https://doi.org/10.1109/PIC.2018.8706281
6 H Chen, Q Peng, X Li, Y Todo, S Gao. An efficient negative correlation gravitational search algorithm. In: Proceedings of IEEE International Conference on Progress in Informatics and Computing (PIC). 2018, 73–79
https://doi.org/10.1109/PIC.2018.8706274
7 Z Xu, Z Lei, L Yang, X Li, S Gao. Negative correlation learning enhanced search behavior in backtracking search optimization. In: Proceedings of the 10th International Conference on Intelligent Humanmachine Systems and Cybernetics. 2018, 310–314
https://doi.org/10.1109/IHMSC.2018.00078
8 P Yang, K Tang, J A Lozano, X Cao. Path planning for single unmanned aerial vehicle by separately evolving waypoints. IEEE Transactions on Robotics, 2015, 31(5): 1130–1146
https://doi.org/10.1109/TRO.2015.2459812
9 G Li, C Qian, C Jiang, X Lu, K Tang. Optimization based layer-wise magnitude-based pruning for DNN compression. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 2383–2389
https://doi.org/10.24963/ijcai.2018/330
10 Q Niu, K Jiang, B Liu. A novel binary negatively correlated search for wind farm layout optimization. In: Proceedings of IEEE Congress on Evolutionary Computation. 2019, 191–196
https://doi.org/10.1109/CEC.2019.8790201
11 D Jiao, P Yang, L Fu, L Ke, K Tang. Optimal energy-delay scheduling for energy-harvesting WSNs with interference channel via negatively correlated Search. IEEE Internet of Things Journal, 2020, 7(3): 1690–1703
https://doi.org/10.1109/JIOT.2019.2954604
12 Y Lin, H Liu, G Xie, Y Zhang. Time series forecasting by evolving deep belief network with negative correlation search. In: Proceedings of Chinese Automation Congress. 2018, 3839–3843
https://doi.org/10.1109/CAC.2018.8623511
13 D Wierstra, T Schaul, T Glasmachers, Y Sun, J Peters, J Schmidhuber. Natural evolution strategies. The Journal ofMachine Learning Research, 2014, 15(1): 949–980
14 L Zhang, K Tang, X Yao. Explicit planning for efficient exploration in reinforcement learning. Advances in Neural Information Processing Systems, 2019, 32: 7488–7497
15 P Chrabaszcz, I Loshchilov, F Hutter. Back to basics: benchmarking canonical evolution strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426
https://doi.org/10.24963/ijcai.2018/197
16 J He, X Yao. From an individual to a population: an analysis of the first hitting time of population-based evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 2002, 6(5): 495–511
https://doi.org/10.1109/TEVC.2002.800886
17 Y Liu, X Yao. Ensemble learning via negative correlation. Neural Networks, 1999, 12(10): 1399–1404
https://doi.org/10.1016/S0893-6080(99)00073-8
18 X Yao, Y Liu, G Lin. Evolutionary programming made faster. IEEE Transactions on Evolutionary Computation, 1999, 3(2): 82–102
https://doi.org/10.1109/4235.771163
19 P Yang, K Tang, X Lu. Improving estimation of distribution algorithm on multimodal problems by detecting promising areas. IEEE Transactions on Cybernetics, 2015, 45(8): 1438–1449
https://doi.org/10.1109/TCYB.2014.2352411
20 D A Reynolds. Gaussian mixture models. Encyclopedia of Biometrics, 2009, 741: 659–663
https://doi.org/10.1007/978-0-387-73003-5_196
21 O Schütze, C A C Coello, A A Tantar, E Tantar, P Bouvry. EVOLVE—a Bridge Between Probability, Set Oriented Numerics and Evolutionary Computation. Springer Berlin Heidelberg, 2013
https://doi.org/10.1007/978-3-642-31519-0
22 T Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology, 1967, 15(1): 52–60
https://doi.org/10.1109/TCOM.1967.1089532
23 Y Lei, P Yang, K Tang, D X Zhou. Optimal stochastic and online learning with individual iterates. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 5392–5402
24 P Yang, K Tang, X Yao. Turning high-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156
https://doi.org/10.1109/TEVC.2017.2672689
25 P Yang, K Tang, X Yao. A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access, 2019, 7: 163105–163118
https://doi.org/10.1109/ACCESS.2019.2938765
26 C Qian. Distributed pareto optimization for large-scale noisy subset selection. IEEE Transactions on Evolutionary Computation, 2020, 24(4): 694707
https://doi.org/10.1109/TEVC.2019.2929555
27 N Hou, F He, Y Zhou, Y Chen. An efficient GPU-based parallel tabu search algorithm for hardware/software co-design. Frontiers of Computer Science, 2020, 14(5): 145316
https://doi.org/10.1007/s11704-019-8184-3
28 C Qian, Y Yu, K Tang, Y Jin, X Yao, Z H Zhou. On the effectiveness of sampling for evolutionary optimization in noisy environments. Parallel Problem Solving from Nature PPSN XIII, 2014, 8672: 302–311
https://doi.org/10.1007/978-3-319-10762-2_30
29 C Qian, Y Yu, Z H Zhou. Analyzing evolutionary optimization in noisy environments. Evolutionary Computation, 2018, 26(1): 141
https://doi.org/10.1162/evco_a_00170
30 Z H Zhou, J Feng. Deep forest: towards an alternative to deep neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence. 2017, 3553–3559
https://doi.org/10.24963/ijcai.2017/497
31 J Oh, S Singh, H Lee. Value prediction network. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6118–6128
32 D Ha, J Schmidhuber. Recurrent world models facilitate policy evolution. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 2450–2462
33 S Zhong, Q Liu, Z Zhang, Q Fu. Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation. Frontiers of Computer Science, 2019, 13(1): 106126
https://doi.org/10.1007/s11704-017-6222-6
34 V Mnih, K Kavukcuoglu, D Silver, A A Rusu, J Veness, MG Bellemare, et al.. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533
https://doi.org/10.1038/nature14236
35 V Mnih, A P Badia, M Mirza, A Graves, T Lillicrap, T Harley,et al.. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2016, 1928–1937
36 K Arulkumaran, M P Deisenroth, M Brundage, A A Bharath. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34(6): 26–38
https://doi.org/10.1109/MSP.2017.2743240
37 H Qian, Y Yu. Derivative-free reinforcement learning: a review. Frontiers of Computer Science. 2020, DOI:10.1007/s11704-020-0241-4
38 H Tang, R Houthooft, D Foote, A Stooke, X Chen, Y Duan, J Schulman, F DeTurck, P Abbeel. Exploration: a study of count-based exploration for deep reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 2753–2762
39 V Raykar, P Agrawal. Sequential crowdsourced labeling as an epsilongreedy exploration in a markov decision process. In: Proceedings of the 7th International Conference on Artificial Intelligence and Statistics. 2014, 832–840
40 C Andrieu, N D Freitas, A Doucet, M Jordan. An introduction toMCMC for machine learning. Machine Learning, 2003, 50(1–2): 5–43
https://doi.org/10.1023/A:1020281327116
41 M Plappert, R Houthooft, P Dhariwal, S Sidor, R Chen, X Chen, T Asfour, P Abbeel, M Andrychowicz. Parameter space noise for exploration. In: Proceedings of International Conference on Machine learning. 2018
42 D Pathak, P Agrawal, A A Efros, T Darrell. Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning. 2017, 2778–2787
https://doi.org/10.1109/CVPRW.2017.70
43 J Lehman, K O Stanley. Abandoning objectives: evolution through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189223
https://doi.org/10.1162/EVCO_a_00025
44 E Conti, V Madhavan, F P Such, J Lehman, K Stanley, J Clune. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5027–5038
[1] Article highlights Download
[1] Xiaoqin ZHANG, Huimin MA, Xiong LUO, Jian YUAN. LIDAR: learning from imperfect demonstrations with advantage rectification[J]. Front. Comput. Sci., 2022, 16(1): 161312-.
[2] Hong QIAN, Yang YU. Derivative-free reinforcement learning: a review[J]. Front. Comput. Sci., 2021, 15(6): 156336-.
[3] Li ZHANG, Yuxuan CHEN, Wei WANG, Ziliang HAN, Shijian Li, Zhijie PAN, Gang PAN. A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games[J]. Front. Comput. Sci., 2021, 15(5): 155334-.
[4] Yao QIN, Hua WANG, Shanwen YI, Xiaole LI, Linbo ZHAI. A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds[J]. Front. Comput. Sci., 2021, 15(5): 155105-.
[5] Hongwei LI, Yingpeng HU, Yixuan CAO, Ganbin ZHOU, Ping LUO. Rich-text document styling restoration via reinforcement learning[J]. Front. Comput. Sci., 2021, 15(4): 154328-.
[6] Kok-Lim Alvin YAU, Kae Hsiang KWONG, Chong SHEN. Reinforcement learning models for scheduling in wireless networks[J]. Front Comput Sci, 2013, 7(5): 754-766.
[7] Huafeng YU, Yue MA, Thierry GAUTIER, Lo?c BESNARD, Jean-Pierre TALPIN, Paul Le GUERNIC, Yves SOREL. Exploring system architectures in AADL via Polychrony and SynDEx[J]. Front Comput Sci, 2013, 7(5): 627-649.
[8] Amit Kumar SINGH, Akash KUMAR, Jigang WU, Thambipillai SRIKANTHAN. CADSE: communication aware design space exploration for efficient run-time MPSoC management[J]. Front Comput Sci, 2013, 7(3): 416-430.
[9] Sertan GIRGIN, Jérémie MARY, Philippe PREUX, Olivier NICOL. Managing advertising campaigns—an approximate planning approach[J]. Front Comput Sci, 2012, 6(2): 209-229.
[10] Ruochen LIU, Licheng JIAO, Yangyang LI, Jing LIU, . An immune memory clonal algorithm for numerical and combinatorial optimization[J]. Front. Comput. Sci., 2010, 4(4): 536-559.
[11] Edward TSANG. Forecasting- where computational intelligence meets the stock market[J]. Front Comput Sci Chin, 2009, 3(1): 53-63.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed