Toward few-shot domain adaptation with perturbation-invariant representation and transferable prototypes

doi:10.1007/s11704-022-2015-7

Frontiers of Computer Science

2022, Vol. 16

Issue (3): 163347 https://doi.org/10.1007/s11704-022-2015-7

本期目录

Toward few-shot domain adaptation with perturbation-invariant representation and transferable prototypes

Junsong FAN^1,², Yuxi WANG^1,^2,³, He GUAN^1,², Chunfeng SONG^1,², Zhaoxiang ZHANG^1,^2,³(

)

¹. Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
². University of Chinese Academy of Sciences, Beijing 100049, China
³. Centre for Artificial Intelligence and Robotics, HKISI_CAS, HongKong 999077, China

全文: PDF(4805 KB) HTML

Abstract：

Domain adaptation (DA) for semantic segmentation aims to reduce the annotation burden for the dense pixel-level prediction task. It focuses on tackling the domain gap problem and manages to transfer knowledge learned from abundant source data to new target scenes. Although recent works have achieved rapid progress in this field, they still underperform fully supervised models with a large margin due to the absence of any available hints in the target domain. Considering that few-shot labels are cheap to obtain in practical applications, we attempt to leverage them to mitigate the performance gap between DA and fully supervised methods. The key to this problem is to leverage the few-shot labels to learn robust domain-invariant predictions effectively. To this end, we first design a data perturbation strategy to enhance the robustness of the representations. Furthermore, a transferable prototype module is proposed to bridge the domain gap based on the source data and few-shot targets. By means of these proposed methods, our approach can perform on par with the fully supervised models to some extent. We conduct extensive experiments to demonstrate the effectiveness of the proposed methods and report the state-of-the-art performance on two popular DA tasks, i.e., from GTA5 to Cityscapes and SYNTHIA to Cityscapes.

Key words： domain adaptation semantic segmentation

收稿日期: 2022-01-08 出版日期: 2022-04-27

Corresponding Author(s): Zhaoxiang ZHANG

作者简介:

Peng Lu, Renxing Wang, and Yue Xing contributed equally to this work.

引用本文:

. [J]. Frontiers of Computer Science, 2022, 16(3): 163347.
Junsong FAN, Yuxi WANG, He GUAN, Chunfeng SONG, Zhaoxiang ZHANG. Toward few-shot domain adaptation with perturbation-invariant representation and transferable prototypes. Front. Comput. Sci., 2022, 16(3): 163347.

链接本文:

https://academic.hep.com.cn/fcs/CN/10.1007/s11704-022-2015-7
https://academic.hep.com.cn/fcs/CN/Y2022/V16/I3/163347

Fig.1

GTA5→Cityscapes
Method	Arch.	Road	Side	Build	Wall	Fence	Pole	Light	Sign	Vege	Terr	Sky	Pers	Rider	Car	Truck	Bus	Train	Motor	Bike	mIoU
Source-only	V	26.0	14.9	65.1	5.5	12.9	8.9	6.0	2.5	70.0	2.9	47.0	24.5	0.0	40.0	12.1	1.5	0.0	0.0	0.0	17.9
FCNs [2]	V	0.4	32.4	62.1	14.9	5.4	10.9	14.2	2.7	79.2	21.3	64.6	44.1	4.2	70.4	8.0	7.3	0.0	3.5	0.0	27.1
CyCADA [24]	V	85.6	30.7	74.7	14.4	13.0	17.6	13.7	5.8	74.6	15.8	69.9	38.2	3.5	72.3	16.0	5.0	0.1	3.6	0.0	29.2
MCD [3]	V	86.4	8.5	76.1	18.6	9.7	14.9	7.8	0.6	82.8	32.7	71.4	25.2	1.1	76.3	16.1	17.1	1.4	0.2	0.0	28.8
AdaptSeg [7]	V	87.3	29.8	78.6	21.1	18.2	22.5	21.5	11.0	79.7	29.6	71.3	46.8	6.5	80.1	23.0	26.9	0.0	10.6	0.3	35.0
CLAN [5]	V	88.0	30.6	79.2	23.4	20.5	26.1	23.0	14.8	81.6	34.5	72.0	45.8	7.9	80.5	26.6	29.9	0.0	10.7	0.0	36.6
Baseline	V	93.4	57.6	79.9	23.0	21.3	23.7	15.1	11.7	80.9	37.8	83.5	42.2	9.2	78.4	9.5	0.9	15.4	4.8	3.7	36.4
$M s_c y c$	V	94.2	62.4	82.5	20.8	30.6	26.9	23.6	22.9	82.3	39.0	87.3	50.5	16.2	79.9	17.7	4.9	11.9	6.6	15.9	40.8
$M p r o t o$	V	94.4	62.9	82.2	21.4	26.3	27.9	23.8	21.5	84.7	38.5	85.3	51.4	13.9	80.6	14.2	4.1	3.8	3.8	24.0	40.3
Ours (all)	V	93.7	58.9	82.7	31.4	28.1	26.8	22.2	22.8	83.5	40.2	86.1	49.0	17.1	78.9	25.4	3.9	20.6	5.8	21.0	42.1

Source-only	R	75.8	16.8	77.2	12.5	21.0	25.5	30.1	20.1	81.3	24.6	70.3	53.8	26.4	49.9	17.2	25.9	6.5	25.3	36.0	36.0
AdaptSeg [7]	R	86.5	25.9	79.8	22.1	20.0	23.6	33.1	21.8	81.8	25.9	75.9	57.3	26.2	76.3	29.8	32.1	7.2	29.5	32.5	41.1
CLAN [5]	R	87.0	27.1	79.6	27.3	23.3	28.3	35.5	24.2	83.6	27.4	74.2	58.6	28.0	76.2	33.1	36.7	6.7	31.9	31.4	43.2
MRNet [42]	R	89.1	23.9	82.2	19.5	20.1	33.5	42.2	39.1	85.3	33.7	76.4	60.2	33.7	86.0	36.1	43.3	5.9	22.8	30.8	45.5
R-MRNet [43]	R	90.4	31.2	85.1	36.9	25.6	37.5	48.8	48.5	85.3	34.8	81.1	64.4	36.8	86.3	34.9	52.2	1.7	29.0	44.6	50.3
Baseline	R	93.8	59.4	79.9	21.5	19.9	26.2	22.9	18.9	83.5	40.7	84.7	58.3	25.6	86.1	37.6	39.8	3.7	11.3	10.2	43.4
$M s_c y c$	R	95.2	67.6	85.0	27.0	30.5	33.0	38.2	47.8	86.6	44.3	85.9	60.3	33.8	86.7	20.6	14.9	24.2	15.7	56.4	50.2
$M p r o t o$	R	95.2	65.2	85.1	26.4	30.5	34.1	39.1	48.7	86.5	46.4	86.0	62.2	35.2	85.4	8.75	10.4	25.5	24.0	58.4	50.3
Ours (all)	R	95.6	68.8	85.6	27.6	35.6	35.4	40.2	45.2	88.3	46.5	87.6	61.3	36.5	86.3	30.8	10.2	32.7	22.4	57.2	52.6

Tab.1

SYNTHIA → Cityscapes
Method	Arch.	Road	Side.	Build.	Light	Sign	Vege.	Sky	Pers.	Rider	Car	Bus	Motor	Bike	mIoU
Source-only	V	6.4	17.7	29.7	0.0	7.2	30.3	66.8	51.5	1.5	47.3	3.9	0.1	0.0	20.2
FCNs [2]	V	11.5	19.6	30.8	0.1	11.7	42.3	68.7	51.2	3.8	54.0	3.2	0.2	0.6	22.9
CDA [8]	V	65.2	26.1	74.9	3.7	3.0	76.1	70.6	47.1	8.2	43.2	20.7	0.7	13.1	34.8
Cross-city [41]	V	62.7	25.6	78.3	1.2	5.4	81.3	81.0	37.4	6.4	63.5	16.1	1.2	4.6	35.7
AdaptSeg [7]	V	78.9	29.2	75.5	0.1	4.8	72.6	76.7	43.4	8.8	71.1	16.0	3.6	8.4	37.6
CLAN [5]	V	80.4	30.7	74.7	1.4	8.0	77.1	79.0	46.5	8.9	73.8	18.2	2.2	9.9	39.3
Baseline	V	89.8	43.6	73.1	2.3	19.1	79.4	77.5	43.8	7.7	74.8	6.5	0.7	15.2	41.0
$M s_c y c$	V	93.7	56.2	79.6	5.7	16.3	80.4	85.0	47.8	11.4	78.6	6.6	7.1	22.0	45.4
$M p r o t o$	V	92.8	54.2	78.7	6.1	12.8	81.1	83.5	47.9	9.6	76.6	3.6	9.5	28.0	44.9
Ours	V	94.7	60.7	82.6	5.5	19.7	84.3	85.6	52.9	10.7	80.2	9.1	10.2	36.7	48.7

Source-only	R	55.6	23.8	74.6	6.1	12.1	74.8	79.0	55.3	19.1	39.6	23.3	13.7	25.0	38.6
AdaptSeg [7]	R	79.2	37.2	78.8	9.9	10.5	78.2	80.5	53.5	19.6	67.0	29.5	21.6	31.3	45.9
CLAN [5]	R	81.3	37.0	80.1	16.1	13.7	78.2	81.5	53.4	21.2	73.0	32.9	22.6	30.7	47.8
MRNet [42]	R	82.0	36.5	80.4	18.0	13.4	81.1	80.8	61.3	21.7	84.4	32.4	14.8	45.7	50.2
R-MRNet [43]	R	87.6	41.9	83.1	31.3	19.9	81.6	80.6	63.0	21.8	86.2	40.7	23.6	53.1	54.9
Baseline	R	88.1	42.4	79.9	16.4	21.8	80.0	77.1	57.6	24.6	75.5	20.0	11.2	40.5	48.9
$M s_c y c$	R	94.5	61.3	83.4	16.9	24.0	84.8	88.2	61.6	21.9	84.1	27.8	7.1	49.4	54.2
$M p r o t o$	R	93.4	56.9	82.7	7.2	27.6	83.5	86.8	60.6	24.0	82.0	22.0	11.5	46.7	52.7
Ours	R	93.4	57.5	83.2	18.3	29.0	83.9	87.3	60.1	30.2	83.6	38.3	11.3	49.3	55.8

Tab.2

Tab.3

GTA5→Cityscapes
Method	VGG	ResNet
Baseline	36.4	43.4
$M s_c y c$	40.8	50.2
$M c a t e$	38.3	45.4
$M t a s k$	39.2	48.8
$M p r o t o$	40.3	50.3
Ours	42.1	52.6

Tab.4

Fig.2

Fig.3

Fig.4

GTA5→Cityscapes
Method	VGG	ResNet
$1$ -shot	36.8	46.9
$2$ -shot	37.6	48.8
$3$ -shot	38.6	49.7
$5$ -shot	42.1	52.6
$10$ -shot	48.6	56.5
Full	58.5	65.1

Tab.5

Fig.5

1	L C Chen , G Papandreou , I Kokkinos , K Murphy , A L Yuille . Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40( 4): 834– 848
2	J Hoffman D Wang F Yu T Darrell. FCNs in the wild: pixel-level adversarial and constraint-based adaptation. 2016, arXiv preprint arXiv: 1612.02649
3	K Saito K Watanabe Y Ushiku T Harada. Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 3723– 3732
4	Y Wang J Peng Z Zhang. Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 9072– 9081
5	Y Luo L Zheng T Guan J Yu Y Yang. Taking a closer look at domain shift: category-level adversaries for semantics consistent domain adaptation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2502– 2511
6	Y Zhang Z Qiu T Yao D Liu T Mei. Fully convolutional adaptation networks for semantic segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6810– 6818
7	Y H Tsai W C Hung S Schulter K Sohn M H Yang M Chandraker. Learning to adapt structured output space for semantic segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7472– 7481
8	Y Zhang P David B Gong. Curriculum domain adaptation for semantic segmentation of urban scenes. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2039– 2049
9	W C Hung Y H Tsai Y T Liou Y Y Lin M H Yang. Adversarial learning for semi-supervised semantic segmentation. In: Proceedings of British Machine Vision Conference 2018. 2018, 65
10	T Kalluri G Varma M Chandraker C V Jawahar. Universal semi-supervised semantic segmentation. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 5258– 5269
11	S Mittal M Tatarchenko T Brox. Semi-supervised semantic segmentation with high- and low-level consistency. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 43(4): 1369– 1379
12	I J Goodfellow J Pouget-Abadie M Mirza B Xu D Warde-Farley S Ozair A Courville Y Bengio. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. 2014, 2672– 2680
13	S Lim I Kim T Kim C Kim S Kim. Fast autoaugment. In: Proceedings of Neural Information Processing Systems 32. 2019, 6662– 6672
14	T Liu Q Yang D Tao. Understanding how feature structure transfers in transfer learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2365– 2371
15	P Ge C X Ren D Q Dai H Yan. Domain adaptation and image classification via deep conditional adaptation network. 2020, arXiv preprint arXiv: 2006.07776
16	D Wittich , F Rottensteiner . Appearance based deep domain adaptation for the classification of aerial images. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 180: 82– 102
17	Z He L Zhang. Multi-adversarial faster-RCNN for unrestricted object detection. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6667– 6676
18	Z He L Zhang. Domain adaptive object detection via asymmetric tri-way faster-RCNN. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 309– 324
19	L Song , Y Xu , L Zhang , B Du , Q Zhang , X Wang . Learning from synthetic images via active pseudo-labeling. IEEE Transactions on Image Processing, 2020, 29: 6452– 6465
20	L Gao J Zhang L Zhang D Tao. DSP: dual soft-paste for unsupervised domain adaptive semantic segmentation. In: Proceedings of the 29th ACM International Conference on Multimedia. 2021, 2825– 2833
21	C Pape , A Matskevych , A Wolny , J Hennies , G Mizzon , M Louveaux , J Musser , A Maizel , D Arendt , A Kreshuk . Leveraging domain knowledge to improve microscopy image segmentation with lifted multicuts. Frontiers in Computer Science, 2019, 1: 6
22	T M Quan , D G C Hildebrand , W K Jeong . Fusionnet: a deep fully residual convolutional neural network for image segmentation in connectomics. Frontiers in Computer Science, 2021, 3: 613981
23	P Baniukiewicz , E J Lutton , S Collier , T Bretschneider . Generative adversarial networks for augmenting training data of microscopic cell images. Frontiers in Computer Science, 2019, 1: 10
24	J Hoffman E Tzeng T Park J Y Zhu P Isola K Saenko A A Efros T Darrell. CyCADA: cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1989– 1998
25	T Yao Y Pan C W Ngo H Li T Mei. Semi-supervised domain adaptation with subspace learning for visual recognition. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2142– 2150
26	K Saito D Kim S Sclaroff T Darrell K Saenko. Semi-supervised domain adaptation via minimax entropy. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 8049– 8057
27	H Zhang M Cisse Y N Dauphin D Lopez-Paz. Mixup: beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. 2018
28	T DeVries G W Taylor. Improved regularization of convolutional neural networks with cutout. 2017, arXiv preprint arXiv: 1708.04552
29	S Yun D Han S Chun S J Oh Y Yoo J Choe. CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019, 6022– 6031
30	E D Cubuk B Zoph D Mane V Vasudevan Q V Le. Autoaugment: learning augmentation policies from data. 2018, arXiv preprint arXiv: 1805.09501
31	B Zoph E D Cubuk G Ghiasi T Y Lin J Shlens Q V Le. Learning data augmentation strategies for object detection. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 566– 583
32	L Zhang Y Zhou L Zhang. On the robustness of domain adaption to adversarial attacks. 2021, arXiv preprint arXiv: 2108.01807
33	G Koch R Zemel R Salakhutdinov. Siamese neural networks for one-shot image recognition. In: Proceedings of the 32nd International Conference on Machine Learning. 2015
34	O Vinyals C Blundell T Lillicrap D Wierstra. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3637– 3645
35	J Snell K Swersky R Zemel. Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 4077– 4087
36	J S Bergstra R Bardenet Y Bengio B Kégl. Algorithms for hyper-parameter optimization. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 2546– 2554
37	T Miyato , S I Maeda , M Koyama , S Ishii . Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41( 8): 1979– 1993
38	S R Richter V Vineet S Roth V Koltun. Playing for data: Ground truth from computer games. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 102– 118
39	M Cordts M Omran S Ramos T Rehfeld M Enzweiler R Benenson U Franke S Roth B Schiele. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3213– 3223
40	G Ros L Sellart J Materzynska D Vazquez A M Lopez. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3234– 3243
41	Y H Chen W Y Chen Y T Chen B C Tsai Y C F Wang M Sun. No more discrimination: cross city adaptation of road scene segmenters. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2011– 2020
42	Z Zheng Y Yang. Unsupervised scene adaptation with memory regularization in vivo. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2021, 150
43	Z Zheng , Y Yang . Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. International Journal of Computer Vision, 2021, 129( 4): 1106– 1120
44	I Loshchilov F Hutter. SGDR: stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations. 2016
45	A Paszke S Gross S Chintala G Chanan E Yang Z DeVito Z Lin A Desmaison L Antiga A Lerer. Automatic differentiation in PyTorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems. 2017
46	L van der Maaten , G Hinton . Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9( 86): 2579– 2605

[1]

Highlights

Download

Viewed

Full text

Abstract

Cited

Shared

Discussed