Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (3) : 193310    https://doi.org/10.1007/s11704-024-3810-0
Artificial Intelligence
Robust domain adaptation with noisy and shifted label distribution
Shao-Yuan LI(), Shi-Ji ZHAO, Zheng-Tao CAO, Sheng-Jun HUANG, Songcan CHEN
MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
 Download: PDF(6077 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Unsupervised Domain Adaptation (UDA) intends to achieve excellent results by transferring knowledge from labeled source domains to unlabeled target domains in which the data or label distribution changes. Previous UDA methods have acquired great success when labels in the source domain are pure. However, even the acquisition of scare clean labels in the source domain needs plenty of costs as well. In the presence of label noise in the source domain, the traditional UDA methods will be seriously degraded as they do not deal with the label noise. In this paper, we propose an approach named Robust Self-training with Label Refinement (RSLR) to address the above issue. RSLR adopts the self-training framework by maintaining a Labeling Network (LNet) on the source domain, which is used to provide confident pseudo-labels to target samples, and a Target-specific Network (TNet) trained by using the pseudo-labeled samples. To combat the effect of label noise, LNet progressively distinguishes and refines the mislabeled source samples. In combination with class re-balancing to combat the label distribution shift issue, RSLR achieves effective performance on extensive benchmark datasets.

Keywords unsupervised domain adaptation      label noise      label distribution shift      self-training      class rebalancing     
Corresponding Author(s): Shao-Yuan LI   
Issue Date: 30 April 2024
 Cite this article:   
Shao-Yuan LI,Shi-Ji ZHAO,Zheng-Tao CAO, et al. Robust domain adaptation with noisy and shifted label distribution[J]. Front. Comput. Sci., 2025, 19(3): 193310.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-024-3810-0
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I3/193310
Fig.1  Illustration of the challenge in noisy UDA. Left: Previous robust UDA methods filter out the mislabeled source samples, thus losing important class boundary information, e.g., samples 1, 2, 3. Besides, they ignore the label distribution shift issue and tend to be biased towards the majority circle class in the source domain (the black classification boundary). But in the target domain, the triangle majority class is under-fitted. Right: We propose keeping the essential information of the mislabeled samples and learning classification boundary not biased towards any class
Methods Adversarial training Pseudo vlabeling Using noisy samples Considering LDS Branching structure
Butterfly [19]
TCL [20]
RDA [21]
GearNet [22]
RSLR
Tab.1  Comparison of the proposed RSLR approach and previous robust UDA methods
X Instance space Y Label space
Ps Source distribution Pt Target distribution
Ds Source data Dt Target data
G Feature extractor C1 Common classifier 1
C2 Common classifier 2 Ct Target classifier
A(x) Strong augmentation α(x) Weak augmentation
Tab.2  Summary of the key notations used in the paper
Fig.2  Flowchart of the proposed RSLR approach. G is a shared deep feature extractor, C1 and C2 are dual Labeling Networks trained to correct the noisy labels and predict target sample pseudo-labels in an ensemble way. Ct is the Target-specific Network (TNet) trained on pseudo-labeled target samples and acts as the final desired target classifier
Fig.3  The pseudo-labeling strategy for noisy source samples and unlabeled target samples. The two LNets C1,C2 predict labels for weakly-augmented version and strongly-augmented version of unlabeled data, then only retain the pseudo-labels that are consistent among the ensemble of the predictions
  
Fig.4  The label distribution shift issue on Office-31
Fig.5  The label distribution shift issue on Office-Home. (a) Art; (b) Clipart; (c) Product; (d) Real-world
Tasks Noise type DAN DANN ATDA TCL Butterfly RSLR
S→M S0.20 90.7 75.2 89.9 80.2 95.9 98.4
S0.45 89.3 65.9 87.5 68.5 95.0 98.5
P0.20 90.2 79.1 56.0 80.8 95.3 98.4
P0.45 67.0 55.3 53.7 56.0 90.2 97.7
M→S S0.20 30.6 53.5 49.8 56.7 57.1 89.4
S0.45 28.2 43.8 17.2 49.9 56.2 88.7
P0.20 40.8 58.8 33.7 58.9 60.4 93.9
P0.45 28.4 43.7 19.5 45.3 56.6 80.3
Average 58.2 58.0 50.9 62.1 75.8 93.1
Tab.3  Target domain accuracy (%) on MNIST?SYDN (8 tasks). Bold value represents the best method
Method Label corruption: 0.4
A→W W→A A→D D→A W→D D→W Avg
MentorNet 74.4 54.2 75.0 43.2 85.9 70.6 67.2
DAN 63.2 39.0 58.0 36.7 71.6 61.6 55.0
RTN 64.6 56.2 76.1 49.0 82.7 71.7 66.7
DANN 61.2 46.2 57.4 42.4 74.5 62.0 57.3
ADDA 61.5 49.2 61.2 45.5 74.7 65.1 59.5
MDD 74.7 55.1 76.7 54.3 89.2 81.6 71.9
TCL 82.0 65.7 83.3 60.5 90.8 77.2 76.6
GearNet 76.7 58.2 82.5 59.1 91.3 85.3 76.7
RDA 89.7 67.2 92.0 65.5 96.0 92.7 83.6
RSLR 90.9 71.6 86.9 69.7 98.6 95.8 85.6
Label corruption: 0.5
RDA 61.4 51.4 71.9 50.3 90.0 82.9 68.0
RSLR 85.7 67.5 82.3 54.1 98.1 87.4 79.2
Label corruption: 0.6
RDA 58.6 45.2 63.1 47.5 80.1 67.4 60.3
RSLR 74.6 61.5 81.3 55.8 84.4 72.4 71.6
Label corruption: 0.7
RDA 51.8 36.1 60.4 27.7 72.1 42.3 48.4
RSLR 67.5 54.5 66.3 44.0 76.3 62.6 61.8
Label corruption: 0.8
RDA 45.4 19.4 47.2 0.1 31.7 0.1 24.0
RSLR 55.5 46.3 61.5 32.3 61.0 43.4 50.0
Tab.4  Target domain accuracy (%) on Office-31. Bold value represents the best method
Methods Label corruption: 0.2
Ar→Cl Ar→Pr Ar→Rw Cl→Pr Cl→Rw Cl→Ar Pr→Rw Pr→Cl Pr→Ar Rw→Cl Rw→Pr Rw→Ar Avg
MentorNet 33.29 47.89 61.92 44.91 50.08 39.93 65.16 31.75 43.88 36.88 68.48 56.45 48.39
DAN 40.78 58.93 68.05 57.45 60.20 49.16 68.69 37.37 51.42 44.49 71.84 60.82 55.77
RTN 32.35 51.84 59.35 48.52 52.47 40.05 62.27 30.84 42.97 36.31 66.05 53.93 48.08
DANN 36.29 52.92 31.34 50.01 53.39 40.96 65.99 32.07 44.17 39.73 71.01 55.71 47.80
ADDA 28.36 42.26 52.03 41.36 43.29 30.61 57.61 28.50 31.40 31.98 63.17 49.98 41.71
MDD 35.10 56.95 64.91 54.27 55.86 43.02 65.46 31.55 46.64 34.85 70.74 56.24 51.30
TCL 29.07 43.84 56.87 42.37 45.26 37.25 64.08 33.22 43.79 37.37 71.05 60.24 47.03
GearNet 37.78 59.62 69.99 61.68 61.58 49.08 73.21 37.82 51.71 43.36 75.59 62.04 56.96
RDA 54.04 70.17 75.79 65.06 67.94 61.10 75.63 51.89 61.15 57.80 80.27 68.85 65.81
RSLR 58.46 72.60 76.68 74.96 74.79 58.96 78.45 56.20 60.13 61.81 82.12 66.01 68.43
Label corruption: 0.4
MentorNet 23.89 37.10 50.40 33.30 37.40 36.71 55.30 28.40 40.17 33.60 63.90 53.03 41.10
DAN 30.20 48.70 61.00 52.40 56.70 48.58 64.90 34.10 46.64 40.10 68.20 55.62 50.60
RTN 19.40 38.70 46.70 45.00 47.10 36.55 55.40 24.20 33.13 32.20 57.70 46.15 40.19
DANN 27.70 44.20 57.20 42.30 44.10 38.81 60.40 27.80 37.82 34.70 67.10 52.58 44.56
ADDA 16.30 26.10 38.80 31.60 33.50 30.82 35.10 16.90 29.21 20.40 41.80 41.99 30.21
MDD 34.40 54.30 60.80 53.90 53.80 43.35 62.80 28.60 40.71 37.40 68.40 50.60 49.09
TCL 24.20 37.40 44.50 30.40 33.30 32.55 52.40 25.90 34.61 33.60 60.30 50.23 38.28
GearNet 32.10 51.68 62.52 50.63 54.55 44.17 67.26 33.09 47.58 40.40 72.31 56.79 51.09
RDA 31.71 50.92 63.32 52.50 54.70 57.89 66.20 34.01 56.90 42.00 68.60 67.78 53.88
RSLR 50.91 60.72 66.52 68.20 70.40 57.95 77.90 51.40 56.29 60.50 80.56 63.59 63.75
Label corruption: 0.6
MentorNet 19.84 30.46 38.99 32.28 36.01 28.51 49.53 24.99 33.79 29.26 53.01 45.90 35.21
DAN 23.05 35.41 42.67 43.16 50.65 37.25 59.79 28.98 38.32 33.93 61.41 45.86 41.71
RTN 13.72 24.85 32.84 32.85 35.16 22.13 46.96 23.28 27.48 26.23 47.02 37.12 30.80
DANN 20.53 31.34 38.97 35.50 40.74 28.22 52.03 27.42 32.67 30.13 55.58 42.89 36.34
ADDA 15.10 22.89 26.44 29.38 31.03 19.61 44.89 20.78 26.16 24.97 43.75 36.84 28.49
MDD 19.73 31.72 37.37 43.30 42.94 25.96 53.87 24.95 33.99 28.98 58.64 41.24 36.89
TCL 15.66 29.22 38.56 27.82 31.95 26.97 47.19 23.07 32.10 29.16 50.62 43.22 32.96
GearNet 25.99 44.63 54.21 44.20 47.45 39.58 60.09 28.61 40.36 34.83 63.93 53.88 44.81
RDA 29.39 45.03 52.26 43.03 50.01 36.92 61.28 32.85 41.08 38.74 67.04 55.13 46.06
RSLR 36.42 50.71 59.04 63.04 64.27 49.21 72.05 49.91 52.46 55.42 75.52 57.16 57.10
Tab.5  Target domain accuracy (%) on Office-Home. Bold value represents the best method
Methods Mixed corruption: 0.4
Ar→Cl Ar→Pr Ar→Rw Cl→Pr Cl→Rw Pr→Rw Pr→Cl Rw→Cl Rw→Pr Avg
MentorNet 34.5 57.1 66.7 56.1 57.6 70.2 34.0 37.2 70.4 53.8
DAN 31.2 52.3 61.2 53.1 54.6 61.5 30.3 36.7 67.4 49.8
RTN 29.3 57.8 66.3 58.6 58.3 67.5 30.1 32.2 69.9 52.2
DANN 32.9 50.6 60.1 49.2 50.6 60.4 32.6 38.4 67.4 49.1
ADDA 32.6 52.0 60.6 53.5 54.3 63.1 31.6 37.7 67.5 50.3
MDD 44.6 62.4 68.8 58.9 60.8 65.2 39.5 47.1 72.9 57.8
TCL 38.8 62.1 69.4 58.5 59.8 72.3 39.5 43.5 74.0 57.5
GearNet 38.6 58.5 68.3 59.2 60.8 71.6 37.5 44.3 75.1 57.1
RDA 50.8 68.7 72.3 67.4 67.9 74.6 50.5 57.7 80.2 65.6
RSLR 52.6 69.5 73.1 68.7 70.2 73.1 51.2 58.3 77.0 66.0
Tab.6  Target domain accuracy (%) on Office-Home with 40% mixed noise. The bold value represents the best method
Fig.6  Target domain t-SNE visualization of class features on Office-31 A→W tasks in low noise corruption: 0.4 (Top) and high noise corruption: 0.6 (Bottom). Office-31 includes 31 classes with different colors
Fig.7  Comparison of the accuracy of target pseudo-labeled samples during training on Office-31 under 40 % label corruption. (a) A→W; (b) A→D; (c) D→A; (d) W→A
Methods Label corruption: 0.4
Double LNets Rebalance Recycling A→W W→A A→D D→A D→W W→D Avg
84.4 69.7 85.4 63.2 92.4 96.5 81.9
83.1 68.6 87.3 55.2 88.4 96.3 79.8
89.9 70.6 84.9 67.7 97.6 94.8 84.3
90.9 71.6 86.9 69.7 95.8 98.6 85.6
Tab.7  Ablation study of our method on 6 sub-tasks of Office-31 under 40% label corruption
  
  
  
  
  
1 M, Wang W Deng . Deep visual domain adaptation: a survey. Neurocomputing, 2018, 312: 135–153
2 G Csurka . A comprehensive survey on domain adaptation for visual applications. In: Csurka G, ed. Domain Adaptation in Computer Vision Applications. Cham: Springer, 2017, 1−35
3 C S, Perone P, Ballester R C, Barros J Cohen-Adad . Unsupervised domain adaptation for medical imaging segmentation with self-ensembling. NeuroImage, 2019, 194: 1–11
4 R, Ojha C C Sekhar . Unsupervised domain adaptation in speech recognition using phonetic features. 2021, arXiv preprint arXiv: 2108.02850
5 Ben-David S, Blitzer J, Crammer K, Pereira F. Analysis of representations for domain adaptation. In: Schölkopf B, Platt J, Hofmann T, eds. Advances in Neural Information Processing Systems 19: Proceedings of 2006 Conference. Cambridge: MIT Press, 2007, 137−144
6 Y, Mansour M, Mohri A Rostamizadeh . Domain adaptation: learning bounds and algorithms. 2009, arXiv preprint arXiv: 0902.3430
7 M, Ghifary W B, Kleijn M Zhang . Domain adaptive neural networks for object recognition. In: Proceedings of the 13th Pacific Rim International Conference on Artificial Intelligence. 2014, 898−904
8 H, Yan Y, Ding P, Li Q, Wang Y, Xu W Zuo . Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 945−954
9 Y, Ganin V Lempitsky . Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1180−1189
10 K, Bousmalis G, Trigeorgis N, Silberman D, Krishnan D Erhan . Domain separation networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 343−351
11 M, Long Y, Cao J, Wang M I Jordan . Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 97−105
12 Y, Taigman A, Polyak L Wolf . Unsupervised cross-domain image generation. In: Proceedings of the International Conference on Learning Representations. 2017
13 J, Hoffman E, Tzeng T, Park J Y, Zhu P, Isola K, Saenko A, Efros T Darrell . CyCADA: Cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1989−1998
14 K, Bousmalis N, Silberman D, Dohan D, Erhan D Krishnan . Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 95−104
15 S, Li M, Xie K, Gong C H, Liu Y, Wang W Li . Transferable semantic augmentation for domain adaptation. In: Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 11511−11520
16 K, Saito Y, Ushiku T Harada . Asymmetric tri-training for unsupervised domain adaptation. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2988−2997
17 V, Prabhu S, Khare D, Kartik J Hoffman . SENTRY: selective entropy optimization via committee consistency for unsupervised domain adaptation. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 8538−8547
18 V S, Sheng J Zhang . Machine learning with crowdsourcing: a brief summary of the past research and future directions. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 9837−9843
19 F, Liu J, Lu B, Han G, Niu G, Zhang M Sugiyama . Butterfly: a panacea for all difficulties in wildly unsupervised domain adaptation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems Workshop. 2019
20 Y, Shu Z, Cao M, Long J Wang . Transferable curriculum for weakly-supervised domain adaptation. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4951−4958
21 Han Z, Gui X J, Cui C, Yin Y. Towards accurate and robust domain adaptation under noisy environments. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2020, 2269−2276
22 R, Xie H, Wei L, Feng B An . GearNet: stepwise dual learning for weakly supervised domain adaptation. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 8717−8725
23 A, Gretton K M, Borgwardt M J, Rasch B, Schölkopf A Smola . A kernel two-sample test. The Journal of Machine Learning Research, 2012, 13: 723–773
24 J, Lee M Raginsky . Minimax statistical learning with wasserstein distances. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 2692−2701
25 M, Long H, Zhu J, Wang M I Jordan . Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 2208−2217
26 Y, Ganin E, Ustinova H, Ajakan P, Germain H, Larochelle F, Laviolette M, Marchand V Lempitsky . Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 2016, 17( 1): 2096–2030
27 E, Tzeng J, Hoffman K, Saenko T Darrell . Adversarial discriminative domain adaptation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2962−2971
28 J N, Kundu A R, Kulkarni S, Bhambri D, Mehta S A, Kulkarni V, Jampani V B Radhakrishnan . Balancing discriminability and transferability for source-free domain adaptation. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 11710−11728
29 H, Liu J, Wang M Long . Cycle self-training for domain adaptation. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. 2021, 22968−22981
30 D, Arpit S, Jastrzębski N, Ballas D, Krueger E, Bengio M S, Kanwal T, Maharaj A, Fischer A, Courville Y, Bengio S Lacoste-Julien . A closer look at memorization in deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 233−242
31 B, Han Q, Yao X, Yu G, Niu M, Xu W, Hu I W, Tsang M Sugiyama . Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536−8546
32 E, Arazo D, Ortego P, Albert N, O’Connor K McGuinness . Unsupervised label noise modeling and loss correction. In: Proceedings of the International Conference on Machine Learning. 2019, 312−321
33 J, Li R, Socher S C H Hoi . DivideMix: learning with noisy labels as semi-supervised learning. In: Proceedings of the International Conference on Learning Representations. 2020
34 X, Yu B, Han J, Yao G, Niu I, Tsang M Sugiyama . How does disagreement help generalization against label corruption? In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7164−7173
35 L, Yi S, Liu Q, She A I, McLeod B Wang . On learning contrastive representations for learning with noisy labels. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, 16661−16670
36 A, Tarvainen H Valpola . Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1195−1204
37 H, Permuter J, Francos I Jermyn . A study of Gaussian mixture models of color and texture features for image classification and segmentation. Pattern Recognition, 2006, 39( 4): 695–706
38 E D, Cubuk B, Zoph J, Shlens Q V Le . Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2020, 3008−3017
39 G, Patrini A, Rozza Menon A, Krishna R, Nock L Qu . Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2233−2241
40 L, Jiang Z, Zhou T, Leung L J, Li L Fei-Fei . MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 2304−2313
41 M, Long H, Zhu J, Wang M I Jordan . Unsupervised domain adaptation with residual transfer networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 136−144
42 Y, Zhang T, Liu M, Long M Jordan . Bridging theory and algorithm for domain adaptation. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7404−7413
43 K, Sohn D, Berthelot C L, Li Z, Zhang N, Carlini E D, Cubuk A, Kurakin H, Zhang C Raffel . FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 51
44 J, Donahue Y, Jia O, Vinyals J, Hoffman N, Zhang E, Tzeng T Darrell . DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning. 2014, I-647−I-655
[1] FCS-23810-OF-SL_suppl_1 Download
[1] Yueying LIU, Tingjin LUO. Nonconvex and discriminative transfer subspace learning for unsupervised domain adaptation[J]. Front. Comput. Sci., 2025, 19(2): 192307-.
[2] Shaoyuan LI, Yuxiang ZHENG, Ye SHI, Shengjun HUANG, Songcan CHEN. KD-Crowd: a knowledge distillation framework for learning from crowds[J]. Front. Comput. Sci., 2025, 19(1): 191302-.
[3] Enes DEDEOGLU, Himmet Toprak KESGIN, Mehmet Fatih AMASYALI. A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k[J]. Front. Comput. Sci., 2024, 18(4): 184315-.
[4] Yi ZHU, Xindong WU, Jipeng QIANG, Yunhao YUAN, Yun LI. Representation learning via an integrated autoencoder for unsupervised domain adaptation[J]. Front. Comput. Sci., 2023, 17(5): 175334-.
[5] Yunyun WANG, Chao WANG, Hui XUE, Songcan CHEN. Self-corrected unsupervised domain adaptation[J]. Front. Comput. Sci., 2022, 16(5): 165323-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed