Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (3) : 193309    https://doi.org/10.1007/s11704-024-3380-1
Artificial Intelligence
Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach
Yizhou YANG1,2, Longde CHEN3, Sha LIU3, Lanning WANG4, Haohuan FU5, Xin LIU3(), Zuoning CHEN3
1. Zhongguancun Laboratory, Beijing 100081, China
2. Zhejiang Lab, Hangzhou 311121, China
3. National Research Centre of Parallel Computer Engineering and Technology, Wuxi 214000, China
4. Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
5. Department of Earth System Science, Tsinghua University, Beijing 100084, China
 Download: PDF(5451 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Reinforcement Learning (RL) is gaining importance in automating penetration testing as it reduces human effort and increases reliability. Nonetheless, given the rapidly expanding scale of modern network infrastructure, the limited testing scale and monotonous strategies of existing RL-based automated penetration testing methods make them less effective in practical application. In this paper, we present CLAP (Coverage-Based Reinforcement Learning to Automate Penetration Testing), an RL penetration testing agent that provides comprehensive network security assessments with diverse adversary testing behaviours on a massive scale. CLAP employs a novel neural network, namely the coverage mechanism, to address the enormous and growing action spaces in large networks. It also utilizes a Chebyshev decomposition critic to identify various adversary strategies and strike a balance between them. Experimental results across various scenarios demonstrate that CLAP outperforms state-of-the-art methods, by further reducing attack operations by nearly 35%. CLAP also provides enhanced training efficiency and stability and can effectively perform pen-testing over large-scale networks with up to 500 hosts. Additionally, the proposed agent is also able to discover pareto-dominant strategies that are both diverse and effective in achieving multiple objectives.

Keywords network security      penetration testing      reinforcement learning      artificial intelligence     
Corresponding Author(s): Xin LIU   
Just Accepted Date: 28 February 2024   Issue Date: 29 May 2024
 Cite this article:   
Yizhou YANG,Longde CHEN,Sha LIU, et al. Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach[J]. Front. Comput. Sci., 2025, 19(3): 193309.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-3380-1
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I3/193309
Fig.1  Penetration testing as a sequential decision making process
Fig.2  Architecture of our proposed method. The attack agent’s observations are fed through separate MLP extractors to Actor-Critic Network and RND respectively. Different CLAP’s Neural Network components are highlighted with coloured boxes. The coverage score Cout is fused with logits output of the actor network aout. (a) System architecture; (b) fusion layer
Fig.3  An illustration of vectorised host information. It encompasses the physical address of the host, agent reachability, and other info of the host

The specific configurations presented here vary based on the used pen-testing simulators.

  
Scenario Hosts OS Services Processes
NASim-Pocp1 35 2 50 2
NASim-200 200 2 5 3
NASim-500 500 3 7 3
Tab.1  Benchmark network scenarios
Module NN type Layers Activation Hidden size Input shape Output shape
Critic MLP 4 Tanh 256 |S| 1
Actor MLP 3 Tanh 256 |S| |A|
Coverage MLP 3 Tanh 256 |A| |A|
Cw_learner MLP 1 Tanh 128 |S| |A|
Aw_learner MLP 1 Tanh 128 |S| |A|
RND Predictor MLP 3 Tanh 256 |S| 256
RND Target MLP 3 Tanh 256 |S| 256
Tab.2  Neural network module configurations
Hyperparameter Default value
RL Hyperparameters
Learning Rate 2.5×10?4
Num Steps 512
Gamma 0.99
GAE Lambda 0.95
Batch Size 4096
Update Epochs 4
Clip Coefficient 0.2
Entropy Coefficient 0.01
Value Function Coefficient 0.5
Max Grad Norm 0.5
RND Hyperparameters
Intrinsic Reward Coefficient 1.0
Extrinsic Reward Coefficient 2.0
Intrinsic Reward Gamma 0.99
Tab.3  Hyperparameters configurations
Fig.4  Training performance of different methods across various NASim scenarios.

In Fig.4 and Fig.4 Baselines are not compared due to their poor performance under these scenarios

(a) Training performance in the NASim-Pocp1 Network scenario (left: number of actions, right: rewards); (b) training performance in the NASim-200 hosts network scenario (left: number of actions, right: rewards); (c) training performance in the NASim-500 host network scenario (left: number of actions; right: rewards)
Fig.5  Mean episodic length for different methods
Fig.6  Ablation study on different modules of CLAP in the nasim:Pocp2Gen network
Fig.7  Distribution of attack strategies under multi-objective settings
  
  
  
1 A, Applebaum D, Miller B, Strom C, Korban R Wolf . Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. 2016
2 J Hoffmann . Simulated penetration testing: from "dijkstra" to "turing test++". In: Proceedings of the 25th International Conference on Automated Planning and Scheduling. 2015
3 D, Silver J, Schrittwieser K, Simonyan I, Antonoglou A, Huang A, Guez T, Hubert L, Baker M, Lai A, Bolton Y, Chen T, Lillicrap F, Hui L, Sifre den Driessche G, van T, Graepel D Hassabis . Mastering the game of Go without human knowledge. Nature, 2017, 550( 7676): 354–359
4 O, Vinyals I, Babuschkin W M, Czarnecki M, Mathieu A, Dudzik J, Chung D H, Choi R, Powell T, Ewalds P, Georgiev J, Oh D, Horgan M, Kroiss I, Danihelka A, Huang L, Sifre T, Cai J P, Agapiou M, Jaderberg A S, Vezhnevets R, Leblond T, Pohlen V, Dalibard D, Budden Y, Sulsky J, Molloy T L, Paine C, Gulcehre Z, Wang T, Pfaff Y, Wu R, Ring D, Yogatama D, Wünsch K, Mckinney O, Smith T, Schaul T, Lillicrap K, Kavukcuoglu D, Hassabis C, Apps D Silver . Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575( 7782): 350–354
5 M, Laskin K, Lee A, Stooke L, Pinto P, Abbeel A Srinivas . Reinforcement learning with augmented data. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020
6 Z, Hu R, Beuran Y Tan . Automated penetration testing using deep reinforcement learning. In: Proceedings of 2020 IEEE European Symposium on Security and Privacy Workshops. 2020, 2−10
7 S, Zhou J, Liu D, Hou X, Zhong Y Zhang . Autonomous penetration testing based on improved deep Q-network. Applied Sciences, 2021, 11( 19): 8823
8 K, Tran A, Akella M, Standen J, Kim D, Bowman T, Richer C T Lin . Deep hierarchical reinforcement agents for automated penetration testing. 2021, arXiv preprint arXiv: 2109.06449
9 J, Schwartz H Kurniawati . Autonomous penetration testing using reinforcement learning. 2019, arXiv preprint arXiv: 1905.05965
10 J, Schwartz H Kurniawatti . NASim: network attack simulator. Networkattacksimulator.readthedocs.io/, 2019
11 C, Baillie M, Standen J, Schwartz M, Docking D, Bowman J Kim . CybORG: an autonomous cyber operations research gym. 2020, arXiv preprint arXiv: 2002.10667
12 D, Shmaryahu G, Shani J, Hoffmann M Steinmetz . Simulated penetration testing as contingent planning. In: Proceedings of the 28th International Conference on Automated Planning and Scheduling. 2018, 241−249
13 H S, Lallie K, Debattista J Bal . A review of attack graph and attack tree visual syntax in cyber security. Computer Science Review, 2020, 35: 100219
14 L, Erdődi F M Zennaro . The agent web model: modeling web hacking for reinforcement learning. International Journal of Information Security, 2022, 21( 2): 293–309
15 Y, Li J, Yan M Naili . Deep reinforcement learning for penetration testing of cyber-physical attacks in the smart grid. In: Proceedings of 2022 International Joint Conference on Neural Networks. 2022, 1−9
16 R, Gangupantulu T, Cody A, Rahma C, Redino R, Clark P Park . Crown jewels analysis using reinforcement learning with attack graphs. In: Proceedings of 2021 IEEE Symposium Series on Computational Intelligence. 2021, 1−6
17 M C, Ghanem T M Chen . Reinforcement learning for efficient network penetration testing. Information, 2019, 11( 1): 6
18 F M, Zennaro L Erdődi . Modelling penetration testing with reinforcement learning using capture-the-flag challenges: trade-offs between model-free learning and a priori knowledge. IET Information Security, 2023, 17( 3): 441–457
19 D, Pathak P, Agrawal A A, Efros T Darrell . Curiosity-driven exploration by self-supervised prediction. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017, 2778−2787
20 P, Wang J, Liu X, Zhong G, Yang S, Zhou Y Zhang . DUSC-DQN: an improved deep Q-network for intelligent penetration testing path design. In: Proceedings of the 7th International Conference on Computer and Communication Systems. 2022, 476−480
21 A S, Vezhnevets S, Osindero T, Schaul N, Heess M, Jaderberg D, Silver K Kavukcuoglu . Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3540−3549
22 W, Czarnecki S, Jayakumar M, Jaderberg L, Hasenclever Y W, Teh N, Heess S, Osindero R Pascanu . Mix & match agent curricula for reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1087−1095
23 G, Farquhar L, Gustafson Z, Lin S, Whiteson N, Usunier G Synnaeve . Growing action spaces. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 285
24 T, Murata H, Ishibuchi M Gen . Specification of genetic search directions in cellular multi-objective genetic algorithms. In: Proceedings of the 1st International Conference on Evolutionary Multi-Criterion Optimization. 2001, 82−95
25 K, Deb H Jain . An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Transactions on Evolutionary Computation, 2014, 18( 4): 577–601
26 C H, Hsu S H, Chang J H, Liang H P, Chou C H, Liu S C, Chang J Y, Pan Y T, Chen W, Wei D C Juan . MONAS: multi-objective neural architecture search using reinforcement learning. 2018, arXiv preprint arXiv: 1806.10332
27 H, Mossalam Y M, Assael D M, Roijers S Whiteson . Multi-objective deep reinforcement learning. 2016, arXiv preprint arXiv: 1610.02707
28 M, Jaderberg W M, Czarnecki I, Dunning L, Marris G, Lever A G, Castañeda C, Beattie N C, Rabinowitz A S, Morcos A, Ruderman N, Sonnerat T, Green L, Deason J Z, Leibo D, Silver D, Hassabis K, Kavukcuoglu T Graepel . Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364( 6443): 859–865
29 R, Shen Y, Zheng J, Hao Z, Meng Y, Chen C, Fan Y Liu . Generating behavior-diverse game AIs with evolutionary multi-objective deep reinforcement learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 3371−3377
30 Strom B E, Applebaum A, Miller D P, Nickels K C, Pennington A G, Thomas C B. Mitre att&ck: Design and philosophy. Mitre Product MP, 2018
31 J, Schulman F, Wolski P, Dhariwal A, Radford O Klimov . Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347
32 J, Schulman P, Moritz S, Levine M I, Jordan P Abbeel . High-dimensional continuous control using generalized advantage estimation. In: Proceedings of the 4th International Conference on Learning Representations. 2016
33 Y, Burda H, Edwards A J, Storkey O Klimov . Exploration by random network distillation. In: Proceedings of the 7th International Conference on Learning Representations. 2019
34 R S, Sutton A G Barto . Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018
35 D M, Roijers P, Vamplew S, Whiteson R Dazeley . A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 2013, 48: 67–113
36 I, Oh S, Rho S, Moon S, Son H, Lee J Chung . Creating pro-level AI for a real-time fighting game using deep reinforcement learning. IEEE Transactions on Games, 2022, 14( 2): 212–220
37 R, Agarwal M, Schwarzer P S, Castro A C, Courville M Bellemare . Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 29304−29320
38 Standen M, Bowman D, Hoang S, Richer T, Lucas M, Van Tassel R. Cyber autonomy gym for experimentation challenge 1, 2021
39 F H Katz . Breadth vs. depth: best practices teaching cybersecurity in a small public university sharing models. The Cyber Defense Review, 2018, 3( 2): 65–72
[1] FCS-23380-OF-YY_suppl_1 Download
[1] Xiao MA, Shen-Yi ZHAO, Zhao-Heng YIN, Wu-Jun LI. Clustered Reinforcement Learning[J]. Front. Comput. Sci., 2025, 19(4): 194313-.
[2] Tao HE, Ming LIU, Yixin CAO, Zekun WANG, Zihao ZHENG, Bing QIN. Exploring & exploiting high-order graph structure for sparse knowledge graph completion[J]. Front. Comput. Sci., 2025, 19(2): 192306-.
[3] Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU. Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation[J]. Front. Comput. Sci., 2024, 18(6): 186331-.
[4] Chengxing JIA, Fuxiang ZHANG, Tian XU, Jing-Cheng PANG, Zongzhang ZHANG, Yang YU. Model gradient: unified model and policy learning in model-based reinforcement learning[J]. Front. Comput. Sci., 2024, 18(4): 184339-.
[5] Yuya CUI, Degan ZHANG, Jie ZHANG, Ting ZHANG, Lixiang CAO, Lu CHEN. Multi-user reinforcement learning based task migration in mobile edge computing[J]. Front. Comput. Sci., 2024, 18(4): 184504-.
[6] Xumeng WANG, Ziliang WU, Wenqi HUANG, Yating WEI, Zhaosong HUANG, Mingliang XU, Wei CHEN. VIS+AI: integrating visualization with artificial intelligence for efficient data analysis[J]. Front. Comput. Sci., 2023, 17(6): 176709-.
[7] Jian AN, Siyuan WU, Xiaolin GUI, Xin HE, Xuejun ZHANG. A blockchain-based framework for data quality in edge-computing-enabled crowdsensing[J]. Front. Comput. Sci., 2023, 17(4): 174503-.
[8] Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML2: meta reinforcement learning via meta-learning for task categories[J]. Front. Comput. Sci., 2023, 17(4): 174325-.
[9] Donghong HAN, Yanru KONG, Jiayi HAN, Guoren WANG. A survey of music emotion recognition[J]. Front. Comput. Sci., 2022, 16(6): 166335-.
[10] Xiaoqin ZHANG, Huimin MA, Xiong LUO, Jian YUAN. LIDAR: learning from imperfect demonstrations with advantage rectification[J]. Front. Comput. Sci., 2022, 16(1): 161312-.
[11] Hong QIAN, Yang YU. Derivative-free reinforcement learning: a review[J]. Front. Comput. Sci., 2021, 15(6): 156336-.
[12] Li ZHANG, Yuxuan CHEN, Wei WANG, Ziliang HAN, Shijian Li, Zhijie PAN, Gang PAN. A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games[J]. Front. Comput. Sci., 2021, 15(5): 155334-.
[13] Peng YANG, Qi YANG, Ke TANG, Xin YAO. Parallel exploration via negatively correlated search[J]. Front. Comput. Sci., 2021, 15(5): 155333-.
[14] Yao QIN, Hua WANG, Shanwen YI, Xiaole LI, Linbo ZHAI. A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds[J]. Front. Comput. Sci., 2021, 15(5): 155105-.
[15] Hongwei LI, Yingpeng HU, Yixuan CAO, Ganbin ZHOU, Ping LUO. Rich-text document styling restoration via reinforcement learning[J]. Front. Comput. Sci., 2021, 15(4): 154328-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed