Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2023, Vol. 17 Issue (2) : 172312    https://doi.org/10.1007/s11704-022-1250-2
RESEARCH ARTICLE
Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning
Zhong JI1,2,3, Jingwei NI1,3, Xiyao LIU1,3(), Yanwei PANG1,3
1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
2. Science and Technology on Electro-Optical Information Security Control Laboratory, Tianjin 300308, China
3. Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, Tianjin 300072, China
 Download: PDF(2932 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Although few-shot learning (FSL) has achieved great progress, it is still an enormous challenge especially when the source and target set are from different domains, which is also known as cross-domain few-shot learning (CD-FSL). Utilizing more source domain data is an effective way to improve the performance of CD-FSL. However, knowledge from different source domains may entangle and confuse with each other, which hurts the performance on the target domain. Therefore, we propose team-knowledge distillation networks (TKD-Net) to tackle this problem, which explores a strategy to help the cooperation of multiple teachers. Specifically, we distill knowledge from the cooperation of teacher networks to a single student network in a meta-learning framework. It incorporates task-oriented knowledge distillation and multiple cooperation among teachers to train an efficient student with better generalization ability on unseen tasks. Moreover, our TKD-Net employs both response-based knowledge and relation-based knowledge to transfer more comprehensive and effective knowledge. Extensive experimental results on four fine-grained datasets have demonstrated the effectiveness and superiority of our proposed TKD-Net approach.

Keywords cross-domain few-shot learning      meta-learning      knowledge distillation      multiple teachers     
Corresponding Author(s): Xiyao LIU   
Just Accepted Date: 09 September 2021   Issue Date: 01 August 2022
 Cite this article:   
Zhong JI,Jingwei NI,Xiyao LIU, et al. Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning[J]. Front. Comput. Sci., 2023, 17(2): 172312.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1250-2
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I2/172312
Fig.1  Different types of knowledge distillation framework. (a) The generic framework of knowledge distillation; (b) multi-teacher distillation; (c) multi-teacher distillation under meta-learning
Fig.2  The architecture illustration of the proposed TKD-Net on 5-way 1-shot classification. It first respectively pre-trains teacher models on multiple seen domains by supervised learning in the teacher development stage. Then multi-level knowledge is distilled from the cooperation of teachers to the student model by soft labels and correlation congruence. The second stage is under the paradigm of meta-learning. The final student model is employed to few-shot classification tasks on the unseen domain
Dataset Training set Validation set Testing set
mini-ImageNet 64 16 20
CUB-200 100 50 50
Stanford Cars 98 49 49
Places365-Standard 183 91 91
Plantae 100 50 50
Tab.1  Summarization of the datasets.
Method 5-way 1-shot
CUB/% Cars/% Places/% Plantae/%
MatchingNet [1] 39.24 ± 0.55 30.33 ± 0.45 41.46 ± 0.59 31.71 ± 0.51
ProtoNet [3] 36.54 ± 0.52 29.38 ± 0.42 40.12 ± 0.59 31.42 ± 0.49
RelationNet [4] 39.29 ± 0.54 30.46 ± 0.44 40.86 ± 0.61 32.81 ± 0.52
MAML(w/o pre-train) [5] 35.06 ± 0.54 31.12 ± 0.54 36.14 ± 0.56 30.95 ± 0.49
MAML [5] 35.50 ± 0.53 26.76 ± 0.42 39.12 ± 0.60 31.35 ± 0.49
Proto-MAML [15] 36.05 ± 0.53 29.46 ± 0.44 38.71 ± 0.57 31.20 ± 0.49
LFT [13] on MatchingNet 34.20 ± 0.53 30.15 ± 0.46 39.43 ± 0.60 29.50 ± 0.39
LFT [13] on RelationNet 39.70 ± 0.58 32.59 ± 0.54 39.92 ± 0.59 33.11 ± 0.56
TKD-Net (Ours) 40.69 ± 0.51 31.36 ± 0.46 45.71 ± 0.60 34.53 ± 0.52
Method 5-way 5-shot
CUB/% Cars/% Places/% Plantae/%
MatchingNet [1] 53.71 ± 0.53 38.39 ± 0.48 55.96 ± 0.58 45.00 ± 0.54
ProtoNet [3] 54.79 ± 0.56 41.76 ± 0.55 59.91 ± 0.56 42.99 ± 0.51
RelationNet [4] 55.29 ± 0.52 43.58 ± 0.56 58.62 ± 0.58 46.01 ± 0.55
MAML(w/o pre-train) [5] 53.20 ± 0.54 43.71 ± 0.56 53.91 ± 0.57 44.70 ± 0.53
MAML [5] 52.66 ±0.52 43.43 ± 0.53 56.61 ± 0.58 42.72 ± 0.55
Proto-MAML [15] 57.21 ± 0.54 45.06 ± 0.56 58.38 ± 0.57 47.45 ± 0.55
LFT [13] on MatchingNet 49.09 ± 0.53 42.42 ± 0.53 54.15 ± 0.54 43.32 ± 0.53
LFT [13] on RelationNet 55.53 ± 0.59 46.05 ± 0.55 53.17 ± 0.55 45.66 ± 0.56
TKD-Net (Ours) 57.85 ± 0.53 43.93 ± 0.52 64.96 ± 0.56 48.61 ± 0.53
Tab.2  Multiple cross-domain few-shot classification accuracy of TKD-Net with ±95% confidence intervals
Method 5-way 1-shot/% 5-way 5-shot/%
ProtoNet 36.54 ± 0.52 54.79 ± 0.56
TKD-Net (w/o LCC) 38.56 ± 0.53 55.47 ± 0.52
TKD-Net (w/o LKL) 39.54 ± 0.52 56.92 ± 0.52
TKD-Net (Ours) 40.69 ± 0.51 57.85 ± 0.53
Tab.3  Ablation studies of our TKD-Net on CUB dataset
Fig.3  Impact of different temperatures on CUB dataset. (a) 5-way 1-shot classification result; (b) 5-way 5-shot classification result
Method 5-way 1-shot/% 5-way 5-shot/%
ProtoNet 36.54 ± 0.52 54.79 ± 0.56
TKD-Net_specific 38.64 ± 0.54 55.31 ± 0.54
TKD-Net_average 39.43 ± 0.50 56.38 ± 0.54
TKD-Net (Ours) 40.69 ± 0.51 57.85 ± 0.53
Tab.4  Results of our TKD-Net with different teacher weights αi on CUB dataset
Fig.4  Qualitative results of our TKD-Net on Places and Cars datasets. The task setting is 5-way 1-shot
  
  
  
  
1 O, Vinyals C, Blundell T, Lillicrap K, Kavukcuoglu D Wierstra. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3637− 3645
2 S, Ravi H Larochelle. Optimization as a model for few-shot learning. In: Proceedings of the International Conference on Learning Representations. 2017, 1− 11
3 J, Snell K, Swersky R S Zemel. Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 4080− 4090
4 F, Sung Y, Yang L, Zhang T, Xiang P H S, Torr T M Hospedales. Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1199− 1208
5 C, Finn P, Abbeel S Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126− 1135
6 X Y, Liu S T, Wang M L Zhang . Transfer synthetic over-sampling for class-imbalance learning with limited minority class data. Frontiers of Computer Science, 2019, 13( 5): 996– 1009
7 Y X, Wang M Hebert. Learning to learn: model regression networks for easy small sample learning. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 616− 634
8 W Y, Chen Y C, Liu Z, Kira Y C F, Wang J B Huang. A closer look at few-shot classification. In: Proceedings of the International Conference on Learning Representations. 2019, 1− 17
9 Z, Luo Y, Zou J, Hoffman F F Li. Label efficient learning of transferable representations across domains and tasks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 164− 176
10 J, Lu Z, Cao K, Wu G, Zhang C Zhang. Boosting few-shot image recognition via domain alignment prototypical networks. In: Proceedings of the 30th International Conference on Tools with Artificial Intelligence. 2018, 260− 264
11 H W, Ge Y X, Han W J, Kang L Sun . Unpaired image to image transformation via informative coupled generative adversarial networks. Frontiers of Computer Science, 2021, 15( 4): 154326
12 L, Liu W L, Hamilton G, Long J, Jiang H Larochelle. A universal representation transformer layer for few-shot image classification. In: Proceedings of the 9th International Conference on Learning Representations. 2021, 1− 11
13 H Y, Tseng H Y, Lee J B, Huang M H Yang. Cross-domain few-shot classification via learned feature-wise transformation. In: Proceedings of the 8th International Conference on Learning Representations. 2020, 1− 18
14 N, Dvornik C, Schmid J Mairal. Selecting relevant features from a multi-domain representation for few-shot classification. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 769− 786
15 E, Triantafillou T, Zhu V, Dumoulin P, Lamblin U, Evci K, Xu R, Goroshin C, Gelada K, Swersky P A, Manzagol others. Meta-dataset: a dataset of datasets for learning to learn from few examples. In: Proceedings of the 8th International Conference on Learning Representations. 2020, 1− 24
16 T, He C, Shen Z, Tian D, Gong C, Sun Y Yan. Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 578− 587
17 S, Mukherjee Awadallah A Hassan. XtremeDistil: multi-stage distillation for massive multilingual models. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 2221− 2234
18 G, Hinton O, Vinyals J Dean. Distilling the knowledge in a neural network. 2015, arXiv preprint arXiv: 1503.02531
19 B, Peng X, Jin D, Li S, Zhou Y, Wu J, Liu Z, Zhang Y Liu. Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, 5006− 5015
20 A, Nichol J, Achiam J Schulman. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999
21 A A, Rusu D, Rao J, Sygnowski O, Vinyals R, Pascanu S, Osindero R Hadsell. Meta-learning with latent embedding optimization. In: Proceedings of the 7th International Conference on Learning Representations. 2019, 1− 17
22 X, Chen H, Dai Y, Li X, Gao L Song. Learning to stop while learning to predict. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1520− 1530
23 Z, Ji X, Liu Y, Pang W, Ouyang X Li . Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. IEEE Transactions on Image Processing, 2021, 30: 1648– 1661
24 Y, Tian Y, Wang D, Krishnan J B, Tenenbaum P Isola. Rethinking few-shot image classification: a good embedding is all you need? In: Proceedings of the 16th European Conference on Computer Vision. 2020, 266− 282
25 Y X, Wang R, Girshick M, Hebert B Hariharan. Low-shot learning from imaginary data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7278− 7286
26 K, Li Y, Zhang K, Li Y Fu. Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 13467− 13476
27 Z, Chen Y, Fu Y X, Wang L, Ma W, Liu M Hebert. Image deformation meta-networks for one-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 8672− 8681
28 H, Zhang J, Zhang P Koniusz. Few-shot learning via saliency-guided hallucination of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2765− 2774
29 C, Yang L, Xie S, Qiao A L Yuille. Training deep neural networks in generations: a more tolerant teacher educates better students. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 5628− 5635
30 A, Romero N, Ballas S E, Kahou A, Chassang C, Gatta Y Bengio. FitNets: hints for thin deep nets. In: Proceedings of the 3rd International Conference on Learning Representations. 2015, 1− 13
31 S, Zagoruyko N Komodakis. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the 5th International Conference on Learning Representations. 2017, 1− 13
32 J, Yim D, Joo J, Bae J Kim. A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 7130− 7138
33 T, Fukuda M, Suzuki G, Kurata S, Thomas J, Cui B Ramabhadran. Efficient knowledge distillation from an ensemble of teachers. In: Proceedings of the 18th Annual Conference of the International Speech Communication Association. 2017, 3697− 3701
34 Z H, Zhou Y, Jiang S F Chen . Extracting symbolic rules from trained neural network ensembles. AI Communications, 2003, 16( 1): 3– 15
35 Z H, Zhou Y Jiang . NeC4. 5: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, 2004, 16( 6): 770– 773
36 S, Ruder P, Ghaffari J G Breslin. Knowledge adaptation: teaching to adapt. 2017, arXiv preprint arXiv: 1702.02052
37 N, Li Y, Yu Z H Zhou. Diversity regularized ensemble pruning. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. 2012, 330− 345
38 J, Deng W, Dong R, Socher L J, Li K, Li F F Li. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248− 255
39 C, Wah S, Branson P, Welinder P, Perona S Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001. Pasadena: California Institute of Technology, 2011
40 N, Hilliard L, Phillips S, Howland A, Yankov C D, Corley N O Hodas. Few-shot learning with metric-agnostic conditional embeddings. 2018, arXiv preprint arXiv: 1802.04376
41 J, Krause M, Stark J, Deng F F Li. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013, 554− 561
42 B, Zhou A, Lapedriza A, Khosla A, Oliva A Torralba . Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40( 6): 1452– 1464
43 Horn G, Van Aodha O, Mac Y, Song Y, Cui C, Sun A, Shepard H, Adam P, Perona S Belongie. The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 8769− 8778
[1] FCS-21250-OF-ZJ_suppl_1 Download
[1] Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML2: meta reinforcement learning via meta-learning for task categories[J]. Front. Comput. Sci., 2023, 17(4): 174325-.
[2] Haoyu ZHAO, Weidong MIN, Jianqiang XU, Qi WANG, Yi ZOU, Qiyan FU. Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet[J]. Front. Comput. Sci., 2023, 17(1): 171304-.
[3] Wei GAO, Mingwen SHAO, Jun SHU, Xinkai ZHUANG. Meta-BN Net for few-shot learning[J]. Front. Comput. Sci., 2023, 17(1): 171302-.
[4] Pinzhuo TIAN, Yang GAO. Improving meta-learning model via meta-contrastive loss[J]. Front. Comput. Sci., 2022, 16(5): 165331-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed