Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples

doi:10.1007/s11704-024-3587-1

Front. Comput. Sci.

2025, Vol. 19

Issue (3) : 193803 https://doi.org/10.1007/s11704-024-3587-1

Information Security

Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples

Ningping MOU, Xinli YUE, Lingchen ZHAO, Qian WANG(

)

Key Laboratory of Aerospace Information Security and Trusted Computing (Ministry of Education), School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China

Download: PDF(5214 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Adversarial training has been widely considered the most effective defense against adversarial attacks. However, recent studies have demonstrated that a large discrepancy exists in the class-wise robustness of adversarial training, leading to two potential issues: firstly, the overall robustness of a model is compromised due to the weakest class; and secondly, ethical concerns arising from unequal protection and biases, where certain societal demographic groups receive less robustness in defense mechanisms. Despite these issues, solutions to address the discrepancy remain largely underexplored. In this paper, we advance beyond existing methods that focus on class-level solutions. Our investigation reveals that hard examples, identified by higher cross-entropy values, can provide more fine-grained information about the discrepancy. Furthermore, we find that enhancing the diversity of hard examples can effectively reduce the robustness gap between classes. Motivated by these observations, we propose Fair Adversarial Training (FairAT) to mitigate the discrepancy of class-wise robustness. Extensive experiments on various benchmark datasets and adversarial attacks demonstrate that FairAT outperforms state-of-the-art methods in terms of both overall robustness and fairness. For a WRN-28-10 model trained on CIFAR10, FairAT improves the average and worst-class robustness by 2.13% and 4.50%, respectively.

Keywords robust fairness adversarial training hard example data augmentation

Corresponding Author(s): Qian WANG

About author: Li Liu and Yanqing Liu contributed equally to this work.

Just Accepted Date: 17 January 2024 Issue Date: 23 April 2024

Cite this article:

Ningping MOU,Xinli YUE,Lingchen ZHAO, et al. Fairness is essential for robustness: fair adversarial training by identifying and augmenting hard examples[J]. Front. Comput. Sci., 2025, 19(3): 193803.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-3587-1
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I3/193803

Fig.1 Demonstration of unfairness in adversarial training with a robust PreAct ResNet18 on CIFAR10. We show the class-wise/average robust errors of (a) PGD Adv. Training [6] and (b) our FairAT. Larger deviations in class-wise robustness relative to average robustness indicate more serious unfairness. The robust errors are evaluated by PGD-20 attack (

? ∞ = 8 / 255

) following [10]

Tab.1 Robust errors (%) of different reweighting algorithms. Avg. Rob. refers to the average robust error. Worst. Rob. refers to the worst-class robust error

Fig.2 Analyses of the relationship between cross-entropy loss values and (a) hard classes and (b) hard examples. (a) The class-wise robust error of the test set and the average cross-entropy loss value calculated over the training examples of each class. (b) We select the most robust classes (Class 9 and 1) and the least robust classes (Class 3 and 2). For hard examples with the largest cross-entropy loss values (from top 10% to top 90%), we plot the proportions of them belonging to these four classes

Fig.3 (a) Adding extra training examples from 80M-TI to the three least robust classes (Class 3, 2, and 4). We add 10% (500 of 5,000), 20%, and 30% examples to these classes respectively; (b) removing hard examples from the training set. We directly remove the top 10% to 90% hard examples of the whole training set

Fig.4 (a) Using Cutout to augment hard examples with different proportions. 0% is the original TRADES (

λ = 6

). 10% indicates 5,000 examples of the training set are identified as hard examples and augmented; (b) the training process of our FairAT evaluated by a hold-out validation set and the test set. We report worst-class robust errors and average robust errors by them

Fig.5 The bar plots show the hard examples selected by models at different epochs of the same training run: a PreAct ResNet18 model trained by TRADES (

λ = 6

). The training example indexes are re-ordered to show contiguous blocks. The plots show significant consistency in the individual selection of hard examples across the models at different epochs

Fig.6 Training the model of Epoch M for FairAT. It firstly inputs the original training dataset into the model of Epoch M-1 to calculate the cross-entropy values for each example. It then uses these values to differentiate hard examples from non-hard examples. After augmenting hard examples, it combines augmented examples and non-hard examples as the training dataset to train the model of Epoch M

Method	Avg. Nat.	Worst. Nat.	Avg. Bndy.	Worst. Bndy.	Avg. Rob.	Worst. Rob.
PGD Adv. Training	15.52	32.20	41.48	51.60	57.00	83.80
TRADES ( $λ = 1$ )	12.49	27.50	43.44	58.30	56.93	85.80
TRADES ( $λ = 6$ )	19.38	37.20	29.26	40.30	48.64	77.50
Baseline Reweight	15.21	30.30	42.23	52.60	57.44	82.90
FRL (Reweight)	16.20	28.80	38.06	44.50	54.26	73.30
FRL (Remargin)	15.61	27.40	37.48	48.90	53.09	76.30
FRL (Reweight+Remargin)	16.68	26.20	37.45	43.90	54.13	70.10
FAT	20.08	33.70	28.16	36.10	48.24	69.80
FairAT (Ours)	19.16	32.80	28.50	33.20	47.66	66.00

Tab.2 Average & worst-class natural error, boundary error, and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by PGD-20. The best results are in bold

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	15.52	32.20	46.40	74.50	57.00	83.80	56.19	82.60	58.67	85.50
TRADES ( $λ = 1$ )	12.49	27.50	47.00	75.90	56.93	85.80	57.50	86.30	59.36	88.20
TRADES ( $λ = 6$ )	19.38	37.20	44.00	72.10	48.64	77.50	51.35	81.40	52.27	82.70
Baseline Reweight	15.21	30.30	47.02	74.50	57.44	82.90	56.75	82.10	59.30	85.00
FRL (Reweight)	16.20	28.80	46.34	65.10	54.26	73.30	54.82	74.30	56.56	77.50
FRL (Remargin)	15.61	27.40	45.40	67.50	53.09	76.30	53.64	76.90	55.36	79.90
FRL (Reweight+Remargin)	16.68	26.20	46.82	62.10	54.13	70.10	54.98	71.80	56.74	74.60
FAT	20.08	33.70	43.81	64.80	48.24	69.80	51.13	76.20	52.33	77.90
FairAT (Ours)	19.16	32.80	42.32	61.40	47.66	66.00	50.18	71.60	51.33	73.30

Tab.3 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on CIFAR10. Robust errors are evaluated by four popular attacks, including FGSM, PGD-20, C&W, and AutoAttack. The best results are in bold

Fig.7 Comparison of different adversarial training algorithms with regard to class-wise robust errors. The robust errors are evaluated by PGD-20. FRL is FRL (Reweight+Remargin). PGD AT. is PGD Adv. Training. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	13.37	26.10	42.49	68.70	53.00	79.00	52.39	78.40	54.39	80.30
TRADES ( $λ = 1$ )	11.15	21.40	40.75	66.40	51.74	76.90	51.00	76.50	53.23	78.60
TRADES ( $λ = 6$ )	13.78	26.20	38.43	62.60	47.04	72.50	47.17	72.20	49.24	74.80
Baseline Reweight	13.28	25.40	42.58	67.20	53.08	78.00	52.47	77.60	54.42	79.20
FRL (Reweight)	14.55	27.90	42.37	65.70	53.40	76.80	52.42	75.50	55.07	78.60
FRL (Remargin)	15.59	24.40	48.26	62.30	56.36	71.10	58.75	75.20	60.18	77.30
FRL (Reweight+Remargin)	15.66	30.10	45.09	63.90	51.77	67.70	53.79	70.10	55.17	71.30
FAT	15.90	30.90	40.24	58.30	45.53	64.10	47.94	68.90	49.55	72.30
FairAT (Ours)	15.35	28.30	38.10	58.00	44.59	63.20	46.02	65.70	47.42	67.80

Tab.4 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on CIFAR10

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	9.16	23.86	33.68	63.49	46.47	74.28	49.24	74.10	53.33	78.43
TRADES ( $λ = 1$ )	7.59	17.59	33.12	57.47	48.21	72.89	49.38	72.89	53.63	76.02
TRADES ( $λ = 6$ )	9.07	19.34	32.44	55.48	44.37	69.52	47.21	71.39	51.92	76.20
Baseline Reweight	9.35	18.13	34.15	58.61	48.25	72.17	49.61	72.41	54.28	75.54
FRL (Reweight)	9.23	16.36	33.65	45.96	46.67	70.06	46.96	69.76	59.33	76.33
FRL (Remargin)	9.61	15.71	32.25	42.76	46.87	67.89	46.54	67.29	62.54	76.33
FRL (Reweight+Remargin)	9.64	15.73	32.33	44.54	44.60	65.18	44.31	64.58	64.19	75.90
FAT	10.21	19.81	33.39	46.64	42.83	57.26	48.51	62.22	52.40	65.00
FairAT (Ours)	9.63	17.14	26.69	40.31	38.92	53.23	42.89	56.75	46.92	62.77

Tab.5 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on SVHN

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	7.70	16.39	33.84	55.12	46.96	71.27	49.56	73.49	54.11	77.29
TRADES ( $λ = 1$ )	8.97	20.36	36.76	63.49	48.57	72.89	50.90	74.46	55.30	76.75
TRADES ( $λ = 6$ )	10.64	21.74	19.34	41.02	43.72	71.20	46.97	73.92	54.90	78.61
Baseline Reweight	7.19	13.92	37.61	59.52	50.21	72.29	51.70	73.43	55.50	75.60
FRL (Reweight)	8.08	14.53	24.09	40.06	54.38	71.08	52.93	69.76	63.67	76.39
FRL (Remargin)	7.78	13.26	22.64	35.72	49.21	66.02	47.88	64.88	59.18	71.81
FRL (Reweight+Remargin)	7.85	15.18	24.93	36.45	47.41	63.86	46.27	63.07	55.54	69.58
FAT	7.28	12.48	22.75	35.00	38.01	52.17	41.73	54.32	51.86	63.67
FairAT (Ours)	7.38	14.82	17.08	31.12	34.84	51.04	37.79	53.64	49.36	63.13

Tab.6 Average & worst-class natural error and robust error for various algorithms against WRN-28-10 on SVHN

Method	Natural		FGSM		PGD		C&W		AutoAttack
Method	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.	Avg.	Worst.
PGD Adv. Training	59.21	83.40	81.92	97.50	83.71	98.60	88.34	99.00	88.92	99.25
TRADES ( $λ = 1$ )	58.53	82.75	82.43	97.70	83.27	98.65	87.11	99.05	87.63	99.20
TRADES ( $λ = 6$ )	59.42	83.15	80.20	96.90	81.82	97.50	86.10	98.90	86.55	99.00
Baseline Reweight	59.13	82.90	82.08	96.70	83.96	98.35	88.45	99.10	88.73	99.15
FRL (Reweight)	59.62	82.75	81.52	90.35	82.34	96.85	87.02	98.70	87.56	98.95
FRL (Remargin)	59.58	82.60	81.19	91.60	82.71	97.20	86.33	98.75	87.34	99.00
FRL (Reweight+Remargin)	59.78	82.30	81.53	89.90	82.06	96.45	86.49	98.40	87.49	98.85
FAT	59.93	83.80	79.13	90.20	80.38	96.30	84.76	98.90	85.40	99.10
FairAT (Ours)	58.97	82.65	78.00	85.65	79.71	95.90	83.52	98.05	84.15	98.30

Tab.7 Average & worst-class natural error and robust error for various algorithms against PreAct ResNet18 on TinyImageNet-200

Tab.8 Comparison of different metrics about hard examples for PreAct ResNet18 on CIFAR10. The robust errors are evaluated by PGD-20

Fig.8 Analyses of the effect of the proportions of hard examples on FairAT. We test different proportions from 5% to 100% on two datasets and two models. The proportion of 0% corresponds to the original TRADES (

λ = 6

). The robust errors are evaluated by PGD-20. (a) PreAct ResNet18 on CIFAR10; (b) WRN-28-10 on CIFAR10; (c) PreAct ResNet18 on SVHN; (d) WRN-28-10 on SVHN

1	C, Szegedy W, Zaremba I, Sutskever J, Bruna D, Erhan I J, Goodfellow R Fergus . Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations. 2014
2	I J, Goodfellow J, Shlens C Szegedy . Explaining and harnessing adversarial examples. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
3	N, Carlini D Wagner . Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy. 2017, 39–57
4	F, Croce M Hein . Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 206
5	A, Athalye N, Carlini D A Wagner . Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 274–283
6	A, Madry A, Makelov L, Schmidt D, Tsipras A Vladu . Towards deep learning models resistant to adversarial attacks. In: Proceedings of the 6th International Conference on Learning Representations. 2018
7	H, Zhang Y, Yu J, Jiao E P, Xing Ghaoui L, El M I Jordan . Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7472–7482
8	L, Rice E, Wong J Z Kolter . Overfitting in adversarially robust deep learning. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 749
9	Q, Tian K, Kuang K, Jiang F, Wu Y Wang . Analysis and applications of class-wise robustness in adversarial training. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 1561–1570
10	H, Xu X, Liu Y, Li A K, Jain J Tang . To be robust or to be fair: towards fairness in adversarial training. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 11492–11501
11	M, Hardt E, Price N Srebro . Equality of opportunity in supervised learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3323–3331
12	E, Krasanakis E, Spyromitros-Xioufis S, Papadopoulos Y Kompatsiaris . Adaptive sensitive reweighting to mitigate bias in fairness-aware classification. In: Proceedings of 2018 World Wide Web Conference. 2018, 853–862
13	B, Ustun Y, Liu D C Parkes . Fairness without harm: decoupled classifiers with preference guarantees. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 6373–6382
14	X, Ma Z, Wang W Liu . On the tradeoff between robustness and fairness. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 26230–26241
15	T, Devries G W Taylor . Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552
16	H, Zhang M, Cissé Y N, Dauphin D Lopez-Paz . Mixup: beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018
17	S, Yun D, Han S, Chun S J, Oh Y, Yoo J Choe . CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022–6031
18	Y, Wang D, Zou J, Yi J, Bailey X, Ma Q Gu . Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of the 8th International Conference on Learning Representations. 2020
19	Y, Zhan B, Zheng Q, Wang N, Mou B, Guo Q, Li C, Shen C Wang . Towards black-box adversarial attacks on interpretable deep learning systems. In: Proceedings of 2022 IEEE International Conference on Multimedia and Expo. 2022, 1–6
20	N, Mou B, Zheng Q, Wang Y, Ge B Guo . A few seconds can change everything: Fast decision-based attacks against DNNs. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence. 2022, 3342–3350
21	F, Tramèr N, Carlini W, Brendel A Mądry . On adaptive attacks to adversarial example defenses. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 138
22	S, Aghaei M J, Azizi P Vayanos . Learning optimal and fair decision trees for non-discriminative decision-making. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 1418–1426
23	N, Goel M, Yaghini B Faltings . Non-discriminatory machine learning through convex fairness criteria. In: Proceedings of 2018 AAAI/ACM Conference on AI, Ethics, and Society. 2018, 116
24	N, Mehrabi F, Morstatter N, Saxena K, Lerman A Galstyan . A survey on bias and fairness in machine learning. ACM Computing Surveys, 2022, 54( 6): 115
25	Y X, Wang D, Ramanan M Hebert . Learning to model the tail. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 7032–7042
26	K, Cao C, Wei A, Gaidon N, Aréchiga T Ma . Learning imbalanced datasets with label-distribution-aware margin loss. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1567–1578
27	A, Agarwal A, Beygelzimer M, Dudík J, Langford H Wallach . A reductions approach to fair classification. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 60–69
28	Y, Cui M, Jia T Y, Lin Y, Song S Belongie . Class-balanced loss based on effective number of samples. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9260–9269
29	X, Zhan H, Liu Q, Li A B Chan . A comparative survey: benchmarking for pool-based active learning. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 4679–4686
30	W H, Beluch T, Genewein A, Nürnberger J M Köhler . The power of ensembles for active learning in image classification. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 9368–9377
31	Y, Gal R, Islam Z Ghahramani . Deep Bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1183–1192
32	R, Rade S M Moosavi-Dezfooli . Reducing excessive margin to achieve a better accuracy vs. robustness trade-off. In: Proceedings of the 10th International Conference on Learning Representations. 2022
33	J, Zhang J, Zhu G, Niu B, Han M, Sugiyama M S Kankanhalli . Geometry-aware instance-reweighted adversarial training. In: Proceedings of the 9th International Conference on Learning Representations. 2021
34	Y, Carmon A, Raghunathan L, Schmidt P, Liang J C Duchi . Unlabeled data improves adversarial robustness. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 1004
35	D, Hendrycks K, Lee M Mazeika . Using pre-training can improve model robustness and uncertainty. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 2712–2721
36	A, Najafi S I, Maeda M, Koyama T Miyato . Robustness to adversarial perturbations in learning from incomplete data. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 497
37	R, Zhai T, Cai D, He C, Dan K, He J, Hopcroft L Wang . Adversarially robust generalization just requires more unlabeled data. In: Proceedings of ICLR 2020. 2020
38	A, Torralba R, Fergus W T Freeman . 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30( 11): 1958–1970
39	E D, Cubuk B, Zoph D, Mané V, Vasudevan Q V Le . AutoAugment: learning augmentation policies from data. 2018, arXiv preprint arXiv: 1805.09501
40	Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009
41	Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 2011
42	K, He X, Zhang S, Ren J Sun . Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630–645
43	S, Zagoruyko N Komodakis . Wide residual networks. In: Proceedings of British Machine Vision Conference. 2016
44	J, Deng W, Dong R, Socher L J, Li K, Li L Fei-Fei . ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255
45	F, Croce M, Andriushchenko V, Sehwag E, Debenedetti N, Flammarion M, Chiang P, Mittal M Hein . RobustBench: a standardized adversarial robustness benchmark. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021
46	D, Wang Y Shang . A new active labeling method for deep learning. In: Proceedings of 2014 International Joint Conference on Neural Networks. 2014, 112–119
47	C E Shannon . A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5( 1): 3–55

[1]

FCS-23587-OF-NM_suppl_1

Download

[1]	Taiyan CHEN, Xianghua YING. FPSMix: data augmentation strategy for point cloud classification[J]. Front. Comput. Sci., 2025, 19(2): 192701-.
[2]	Lei YUAN, Feng CHEN, Zongzhang ZHANG, Yang YU. Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation[J]. Front. Comput. Sci., 2024, 18(6): 186331-.
[3]	Ning CHEN, Jun ZHU, Jianfei CHEN, Ting CHEN. Dropout training for SVMs with data augmentation[J]. Front. Comput. Sci., 2018, 12(4): 694-713.

Viewed

Full text

Abstract

Cited

Shared

Discussed