Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2024, Vol. 18 Issue (4): 184318   https://doi.org/10.1007/s11704-023-2639-2
  本期目录
A prompt-based approach to adversarial example generation and robustness enhancement
Yuting YANG1,2, Pei HUANG3, Juan CAO1,2(), Jintao LI1, Yun LIN4, Feifei MA2,5()
1. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
3. Department of Computer Science, Stanford University, CA 94305, USA
4. School of Computing, National University of Singapore, Singapore 119077, Singapore
5. Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
 全文: PDF(12021 KB)   HTML
Abstract

Recent years have seen the wide application of natural language processing (NLP) models in crucial areas such as finance, medical treatment, and news media, raising concerns about the model robustness and vulnerabilities. We find that prompt paradigm can probe special robust defects of pre-trained language models. Malicious prompt texts are first constructed for inputs and a pre-trained language model can generate adversarial examples for victim models via mask-filling. Experimental results show that prompt paradigm can efficiently generate more diverse adversarial examples besides synonym substitution. Then, we propose a novel robust training approach based on prompt paradigm which incorporates prompt texts as the alternatives to adversarial examples and enhances robustness under a lightweight minimax-style optimization framework. Experiments on three real-world tasks and two deep neural models show that our approach can significantly improve the robustness of models to resist adversarial attacks.

Key wordsrobustness    adversarial example    prompt learning    pre-trained language model
收稿日期: 2022-10-22      出版日期: 2023-06-09
Corresponding Author(s): Juan CAO,Feifei MA   
 引用本文:   
. [J]. Frontiers of Computer Science, 2024, 18(4): 184318.
Yuting YANG, Pei HUANG, Juan CAO, Jintao LI, Yun LIN, Feifei MA. A prompt-based approach to adversarial example generation and robustness enhancement. Front. Comput. Sci., 2024, 18(4): 184318.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-023-2639-2
https://academic.hep.com.cn/fcs/CN/Y2024/V18/I4/184318
Fig.1  
Fig.2  
  
Fig.3  
  
Dataset Attack BERT RoBERTa BART BiLSTM
Succ PPL Sim Disj Succ PPL Sim Disj Succ PPL Sim Disj Succ PPL Sim Disj
MR TextFooler 48.35 600.49 99.92 41.99 620.46 99.93 44.21 629.23 99.28 69.70 694.67 99.14
SemPSO 73.63 708.28 98.65 69.06 763.90 98.65 46.32 729.01 98.64 88.27 100.52 98.26
BERT-Attack 48.96 463.82 95.16 47.13 528.36 95.32 61.05 529.11 95.17 52.72 472.18 95.18
PAT 43.02 395.77 97.93 19.35 55.00 404.72 98.35 25.25 58.75 382.34 98.26 23.17 42.04 419.26 98.01 20.15
PAT* 55.31 590.32 95.26 22.84 63.82 600.74 96.63 27.38 69.62 580.23 96.93 25.40 50.96 574.82 98.52 22.66
IMDB TextFooler 82.63 191.90 99.74 79.13 192.74 99.52 56.98 192.12 99.62 86.24 188.58 99.15
SemPSO 92.51 183.74 98.32 85.45 190.26 98.62 23.26 190.25 98.02 95.83 183.74 98.06
BERT-Attack 80.22 132.36 96.14 77.42 140.62 96.92 66.28 142.86 96.18 82.24 108.26 96.63
PAT 30.27 93.96 96.94 12.62 26.23 99.56 98.70 16.82 29.07 95.47 98.32 16.11 38.46 81.22 98.53 16.23
PAT* 53.51 148.49 96.31 15.92 60.72 160.62 95.29 20.17 54.71 150.26 96.85 21.03 55.49 147.46 98.72 18.63
SNLI TextFooler 69.94 1023.13 99.72 64.64 1000.73 99.56 40.91 1000.26 99.25 75.16 1322.70 99.73
SemPSO 71.10 456.09 98.26 75.69 509.72 97.75 44.32 500.71 98.02 80.54 504.62 97.83
BERT-Attack 59.77 317.36 97.13 63.53 382.44 97.05 52.27 356.59 97.02 69.13 415.92 96.27
PAT 66.29 127.82 98.01 23.61 70.22 142.33 99.62 26.25 70.21 130.22 99.42 25.37 64.85 116.67 98.01 27.22
PAT* 84.00 602.43 94.14 25.91 88.14 620.67 95.71 31.27 86.83 600.65 95.38 30.63 83.64 459.99 98.83 26.34
Tab.1  
Dataset Model PAT PAT*
Succ PPL Sim Succ PPL Sim
MR BERT 39.37 81.57 94.12 57.50 108.51 93.05
RoBERTa 43.81 85.17 95.76 56.10 110.41 93.36
BART 42.12 86.28 95.76 55.39 108.74 93.46
IMDB BERT 37.97 90.35 95.67 52.41 140.77 94.53
RoBERTa 32.14 98.43 96.13 59.15 156.92 93.20
BART 36.65 96.83 96.28 53.96 150.73 94.22
SNLI BERT 51.11 44.71 97.13 75.56 35.90 96.52
RoBERTa 55.31 58.43 98.07 80.31 45.85 97.13
BART 54.39 59.65 98.35 79.13 48.24 97.90
Tab.2  
Word-level
Original The film might have been more satisfying if it had, in fact, been fleshed out a little more instead of going for easy smiles.
Label 0 (Negative sentiment) 1 (Positive sentiment)
TextFooler The film might experience been more satisfying if it took, in matter, been fleshed out a little more instead of going for gentle smiles.
PAT while it might have been more satisfying if it had, in fact, been thinking going a little more instead of going for easy smiles.
Original You watch for that sense of openness, the little surprises.
Label 1 (Positive sentiment) 0 (Negative sentiment)
TextFooler [Attack failed]
PAT You watch for that sense of openness, of little surprises.
Original [x1]: A guy riding a motorcycle near junk cars. [x2]: A man is riding a motorcycle.
Label 2 (Entailment) 1 (Contradiction)
TextFooler ... [x2]: A man is riding a motorbike.
PAT ... [x2]: A young, riding a motorcycle.
Sentence-level
Original The film is predictable in the reassuring manner of a beautifully sung holiday carol.
Label 1 (Positive sentiment) 0 (Negative sentiment)
Continued sentence but it’s also one of the funniest movies I’ve ever seen.
PAT The film is predictable in the reassuring manner of a beautifully sung Christmas carol, but it’s also one of the best movies I’ve ever seen.
Original Rates an for effort and a for boring.
Label 0 (Negative sentiment) 1 (Positive sentiment)
Continued sentence it’s hard to think of a better way to describe it.
PAT Rates an for effort and a for boring, it’s hard to think of a better way to like it.
Tab.3  
Fig.4  
Triggers
t1 It is a good movie. It is a bad movie.
t2 I like the movie so much. I hate the movie so much.
t3 It is a funny movie. It is a boring movie.
t4 I think it is funny. I think it is boring.
Tab.4  
Original you know those films that are blatantly awful but you can’t help but love them? well that’s what evil ed is. possibly the best awful film in the world. the sound is rubbish. the dubbing is crap, the screenplay is nonsense and the special effects are pap. however, i can’t help but love this film dearly and i have recommended it to at least 50 people over the years. sam campbell (or the guy who plays him) should be featured on the actor’s studio series as he is that memorable. possibly the greatest movie villain not named tony montana. seriously, if you don’t expect a lot then you won’t be disappointed. keep a light-hearted approach to watching this film and you’ll soon rate it a ten afterwards.
Label 1 (positive sentiment)
Adv you know those films that are blatantly awful but you can’t help but love them? well that’s what evil ed is, possibly the best awful film in the world. the sound is rubbish. the dubbing is crap, the screenplay is nonsense and the special effects are pap. yes, i can’t help but love this film dearly and i have recommended it to at least 50 people over the years. sam campbell (or the guy who plays him) should be featured on the actor’s studio series as he is that memorable. possibly the greatest movie villain not named tony montana. but seriously, if you don’t expect a lot then you won’t be disappointed. keep a light-hearted approach to watching this film and you’ll soon rate it a ten afterwards.
Label 0 (negative sentiment)
Tab.5  
BERT RoBERTa
TextFooler SemPSO BERT-Attack PAT TextFooler SemPSO BERT-Attack PAT
Acc Succ Rob Succ Rob Succ Rob Succ Rob Acc Succ Rob Succ Rob Succ Rob Succ Rob
MR Original 89.60 48.35 46.52 73.63 24.13 48.96 45.91 43.02 51.04 89.03 41.99 52.53 69.06 28.03 47.13 45.72 55.00 40.64
Adv 88.00 40.22 52.13 73.08 24.62 47.13 46.87 43.82 49.13 87.44 35.20 58.24 64.25 32.42 45.13 48.05 60.00 37.17
ASCC 89.03 41.67 52.51 72.12 25.18 45.63 47.83 40.33 54.02 87.21 33.45 59.50 57.14 40.02 39.82 52.13 53.82 41.44
Ours 88.57 38.33 55.53 66.01 30.64 40.01 53.27 36.87 56.52 86.22 31.52 63.03 40.76 54.57 33.14 59.76 48.41 45.26
IMDB Original 93.68 82.63 16.48 92.51 7.01 80.22 18.25 30.27 65.72 92.09 79.13 18.22 85.45 11.12 77.42 19.21 26.23 67.09
Adv 91.00 38.95 58.35 58.42 38.54 55.32 42.12 28.12 65.53 91.50 37.11 48.02 40.98 51.03 44.15 50.34 24.92 69.63
ASCC 91.02 36.28 61.32 55.62 41.16 52.11 38.96 23.01 70.46 92.05 36.28 49.13 35.91 57.27 42.94 52.07 23.74 68.47
Ours 92.20 26.80 71.33 46.18 50.26 45.27 51.01 12.71 80.52 92.02 7.89 87.53 20.00 76.29 21.24 75.03 14.52 81.57
SNLI Original 86.77 69.94 26.00 71.10 25.35 59.77 36.11 66.29 29.42 88.74 64.64 32.15 75.69 22.03 63.53 33.25 70.22 26.53
Adv 82.53 52.98 39.51 54.17 38.22 44.63 47.82 65.18 29.08 88.64 54.34 38.47 70.61 27.84 57.27 35.01 68.52 31.17
ASCC 84.13 52.63 41.51 46.12 44.26 42.18 50.27 65.02 31.14 87.64 53.12 40.31 66.42 30.73 55.02 37.29 68.72 31.98
Ours 84.01 50.18 43.86 43.14 47.42 38.85 53.16 62.82 34.56 87.71 45.35 47.32 50.13 45.23 48.74 44.38 65.82 36.71
Tab.6  
  
  
  
  
  
  
1 J, Qiang F, Zhang Y, Li Y, Yuan Y, Zhu X Wu . Unsupervised statistical text simplification using pre-trained language modeling for initialization. Frontiers of Computer Science, 2023, 17( 1): 171303
2 L, Kang L, Wu Y H Yang . A novel unsupervised approach for multilevel image clustering from unordered image collection. Frontiers of Computer Science, 2013, 7( 1): 69–82
3 Huang P, Yang Y, Jia F, Liu M, Ma F, Zhang J. Word level robustness enhancement: fight perturbation with perturbation. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 10785−10793
4 P, Huang Y, Yang M, Liu F, Jia F, Ma J Zhang . ε-weakened robustness of deep neural networks. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 2022, 126−138
5 X, Zhang J, Zhang Z, Chen K He . Crafting adversarial examples for neural machine translation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers). 2021, 1967−1977
6 X, Zheng J, Zeng Y, Zhou C, Hsieh M, Cheng X Huang . Evaluating and enhancing the robustness of neural network-based dependency parsing models with adversarial examples. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6600−6610
7 J, Lin J, Zou N Ding . Using adversarial attacks to reveal the statistical bias in machine reading comprehension models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2021, 333−342
8 Cheng M, Wei W, Hsieh C J. Evaluating and enhancing the robustness of dialogue systems: a case study on a negotiation agent. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 3325−3335
9 Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 4171−4186
10 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: a robustly optimized BERT pretraining approach. 2019, arXiv preprint arXiv: 1907.11692
11 Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. 2021, arXiv preprint arXiv: 2107.13586
12 Xu J, Ju D, Li M, Boureau Y L, Weston J, Dinan E. Bot-adversarial dialogue for safe conversational agents. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2950−2968
13 Bartolo M, Thrush T, Jia R, Riedel S, Stenetorp P, Kiela D. Improving question answering model robustness with synthetic adversarial data generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8830−8848
14 Perez E, Huang S, Song H F, Cai T, Ring R, Aslanides J, Glaese A, McAleese N, Irving G. Red teaming language models with language models. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022
15 Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations. 2018
16 A L, Maas R E, Daly P T, Pham D, Huang A Y, Ng C Potts . Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142−150
17 Jin D, Jin Z, Zhou J T, Szolovits P. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8018−8025
18 T, Gao A, Fisch D Chen . Making pre-trained language models better few-shot learners. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers). 2021, 3816−3830
19 T, Schick H Schütze . Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021, 255−269
20 X L, Li P Liang . Prefix-tuning: Optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers). 2021, 4582−4597
21 Dou Z, Liu P, Hayashi H, Jiang Z, Neubig G. GSum: a general framework for guided neural abstractive summarization. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 4830−4842
22 Y, Yang W, Lei P, Huang J, Cao J, Li T Chua . A dual prompt learning framework for few-shot dialogue state tracking. In: Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 − 4 May 2023. 2023, 1468–1477
23 Wang X, Yang Y, Deng Y, He K. Adversarial training with fast gradient projection method against synonym substitution based text attacks. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2021, 13997−14005
24 Alzantot M, Sharma Y, Elgohary A, Ho B H, Srivastava M B, Chang K W. Generating natural language adversarial examples. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing, Brussels. 2018, 2890−2896
25 Y, Zang F, Qi C, Yang Z, Liu M, Zhang Q, Liu M Sun . Word-level textual adversarial attacking as combinatorial optimization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 6066−6080
26 S, Ren Y, Deng K, He W Che . Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, 1085−1097
27 Y, Yang P, Huang F, Ma J, Cao M, Zhang J, Zhang J Li . Quantifying robustness to adversarial word substitutions. 2022, arXiv preprint arXiv: 2201.03829v1
28 Li D, Zhang Y, Peng H, Chen L, Brockett C, Sun M T, Dolan B. Contextualized perturbation for textual adversarial attack. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 5053−5069
29 Li L, Ma R, Guo Q, Xue X, Qiu X. BERT-ATTACK: adversarial attack against BERT using BERT. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 6193−6202
30 Garg S, Ramakrishnan G. BAE: BERT-based adversarial examples for text classification. In: Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. 2020, 6174−6181
31 M, Mozes P, Stenetorp B, Kleinberg L D Griffin . Frequency-guided word substitutions for detecting textual adversarial examples. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021, 171−186
32 Y, Zhou J, Jiang K W, Chang W Wang . Learning to discriminate perturbations for blocking adversarial attacks in text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 4903−4912
33 Miyato T, Dai A M, Goodfellow I J. Adversarial training methods for semi-supervised text classification. In: Proceedings of the 5th International Conference on Learning Representations. 2017
34 Jia R, Raghunathan A, Göksel K, Liang P. Certified robustness to adversarial word substitutions. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 4129−4142
35 Dong X, Luu A T, Ji R, Liu H. Towards robustness against natural language word substitutions. In: Proceedings of the 9th International Conference on Learning Representations. 2021
36 Pang B, Lee L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005, 115−124
37 Bowman S R, Angeli G, Potts C, Manning C D. A large annotated corpus for learning natural language inference. In: Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 632−642
38 M, Lewis Y, Liu N, Goyal M, Ghazvininejad A, Mohamed O, Levy V, Stoyanov L Zettlemoyer . BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871−7880
39 Z, Dong Q Dong . Hownet and the Computation of Meaning. Singapore: World Scientific Publishing Co. Pte. Ltd., 2006
40 L A, Wolsey G L Nemhauser . Integer and Combinatorial Optimization. New York: Wiley-Interscience, 1999
41 Kingma D P, Ba J. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
42 Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. 2017, 670−680
43 Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation. In: Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532−1543
[1] FCS-22639-OF-YY_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed