Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning

doi:10.1007/s11704-022-1250-2

Front. Comput. Sci.

2023, Vol. 17

Issue (2) : 172312 https://doi.org/10.1007/s11704-022-1250-2

RESEARCH ARTICLE

Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning

Zhong JI^1,^2,³, Jingwei NI^1,³, Xiyao LIU^1,³(

), Yanwei PANG^1,³

¹. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
². Science and Technology on Electro-Optical Information Security Control Laboratory, Tianjin 300308, China
³. Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, Tianjin 300072, China

Download: PDF(2932 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Although few-shot learning (FSL) has achieved great progress, it is still an enormous challenge especially when the source and target set are from different domains, which is also known as cross-domain few-shot learning (CD-FSL). Utilizing more source domain data is an effective way to improve the performance of CD-FSL. However, knowledge from different source domains may entangle and confuse with each other, which hurts the performance on the target domain. Therefore, we propose team-knowledge distillation networks (TKD-Net) to tackle this problem, which explores a strategy to help the cooperation of multiple teachers. Specifically, we distill knowledge from the cooperation of teacher networks to a single student network in a meta-learning framework. It incorporates task-oriented knowledge distillation and multiple cooperation among teachers to train an efficient student with better generalization ability on unseen tasks. Moreover, our TKD-Net employs both response-based knowledge and relation-based knowledge to transfer more comprehensive and effective knowledge. Extensive experimental results on four fine-grained datasets have demonstrated the effectiveness and superiority of our proposed TKD-Net approach.

Keywords cross-domain few-shot learning meta-learning knowledge distillation multiple teachers

Corresponding Author(s): Xiyao LIU

Just Accepted Date: 09 September 2021 Issue Date: 01 August 2022

Cite this article:

Zhong JI,Jingwei NI,Xiyao LIU, et al. Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning[J]. Front. Comput. Sci., 2023, 17(2): 172312.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-022-1250-2
https://academic.hep.com.cn/fcs/EN/Y2023/V17/I2/172312

Fig.1 Different types of knowledge distillation framework. (a) The generic framework of knowledge distillation; (b) multi-teacher distillation; (c) multi-teacher distillation under meta-learning

Fig.2 The architecture illustration of the proposed TKD-Net on 5-way 1-shot classification. It first respectively pre-trains teacher models on multiple seen domains by supervised learning in the teacher development stage. Then multi-level knowledge is distilled from the cooperation of teachers to the student model by soft labels and correlation congruence. The second stage is under the paradigm of meta-learning. The final student model is employed to few-shot classification tasks on the unseen domain

Tab.1 Summarization of the datasets.

Tab.2 Multiple cross-domain few-shot classification accuracy of TKD-Net with

± 95 %

confidence intervals

Method	5-way 1-shot/%	5-way 5-shot/%
ProtoNet	36.54 ± 0.52	54.79 ± 0.56
TKD-Net (w/o $L C C$ )	38.56 ± 0.53	55.47 ± 0.52
TKD-Net (w/o $L K L$ )	39.54 ± 0.52	56.92 ± 0.52
TKD-Net (Ours)	40.69 ± 0.51	57.85 ± 0.53

Tab.3 Ablation studies of our TKD-Net on CUB dataset

Fig.3 Impact of different temperatures on CUB dataset. (a) 5-way 1-shot classification result; (b) 5-way 5-shot classification result

Tab.4 Results of our TKD-Net with different teacher weights

α i

on CUB dataset

Fig.4 Qualitative results of our TKD-Net on Places and Cars datasets. The task setting is 5-way 1-shot

1	O, Vinyals C, Blundell T, Lillicrap K, Kavukcuoglu D Wierstra. Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 3637− 3645
2	S, Ravi H Larochelle. Optimization as a model for few-shot learning. In: Proceedings of the International Conference on Learning Representations. 2017, 1− 11
3	J, Snell K, Swersky R S Zemel. Prototypical networks for few-shot learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 4080− 4090
4	F, Sung Y, Yang L, Zhang T, Xiang P H S, Torr T M Hospedales. Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 1199− 1208
5	C, Finn P, Abbeel S Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126− 1135
6	X Y, Liu S T, Wang M L Zhang . Transfer synthetic over-sampling for class-imbalance learning with limited minority class data. Frontiers of Computer Science, 2019, 13( 5): 996– 1009
7	Y X, Wang M Hebert. Learning to learn: model regression networks for easy small sample learning. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 616− 634
8	W Y, Chen Y C, Liu Z, Kira Y C F, Wang J B Huang. A closer look at few-shot classification. In: Proceedings of the International Conference on Learning Representations. 2019, 1− 17
9	Z, Luo Y, Zou J, Hoffman F F Li. Label efficient learning of transferable representations across domains and tasks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 164− 176
10	J, Lu Z, Cao K, Wu G, Zhang C Zhang. Boosting few-shot image recognition via domain alignment prototypical networks. In: Proceedings of the 30th International Conference on Tools with Artificial Intelligence. 2018, 260− 264
11	H W, Ge Y X, Han W J, Kang L Sun . Unpaired image to image transformation via informative coupled generative adversarial networks. Frontiers of Computer Science, 2021, 15( 4): 154326
12	L, Liu W L, Hamilton G, Long J, Jiang H Larochelle. A universal representation transformer layer for few-shot image classification. In: Proceedings of the 9th International Conference on Learning Representations. 2021, 1− 11
13	H Y, Tseng H Y, Lee J B, Huang M H Yang. Cross-domain few-shot classification via learned feature-wise transformation. In: Proceedings of the 8th International Conference on Learning Representations. 2020, 1− 18
14	N, Dvornik C, Schmid J Mairal. Selecting relevant features from a multi-domain representation for few-shot classification. In: Proceedings of the 16th European Conference on Computer Vision. 2020, 769− 786
15	E, Triantafillou T, Zhu V, Dumoulin P, Lamblin U, Evci K, Xu R, Goroshin C, Gelada K, Swersky P A, Manzagol others. Meta-dataset: a dataset of datasets for learning to learn from few examples. In: Proceedings of the 8th International Conference on Learning Representations. 2020, 1− 24
16	T, He C, Shen Z, Tian D, Gong C, Sun Y Yan. Knowledge adaptation for efficient semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 578− 587
17	S, Mukherjee Awadallah A Hassan. XtremeDistil: multi-stage distillation for massive multilingual models. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 2221− 2234
18	G, Hinton O, Vinyals J Dean. Distilling the knowledge in a neural network. 2015, arXiv preprint arXiv: 1503.02531
19	B, Peng X, Jin D, Li S, Zhou Y, Wu J, Liu Z, Zhang Y Liu. Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019, 5006− 5015
20	A, Nichol J, Achiam J Schulman. On first-order meta-learning algorithms. 2018, arXiv preprint arXiv: 1803.02999
21	A A, Rusu D, Rao J, Sygnowski O, Vinyals R, Pascanu S, Osindero R Hadsell. Meta-learning with latent embedding optimization. In: Proceedings of the 7th International Conference on Learning Representations. 2019, 1− 17
22	X, Chen H, Dai Y, Li X, Gao L Song. Learning to stop while learning to predict. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1520− 1530
23	Z, Ji X, Liu Y, Pang W, Ouyang X Li . Few-shot human-object interaction recognition with semantic-guided attentive prototypes network. IEEE Transactions on Image Processing, 2021, 30: 1648– 1661
24	Y, Tian Y, Wang D, Krishnan J B, Tenenbaum P Isola. Rethinking few-shot image classification: a good embedding is all you need? In: Proceedings of the 16th European Conference on Computer Vision. 2020, 266− 282
25	Y X, Wang R, Girshick M, Hebert B Hariharan. Low-shot learning from imaginary data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7278− 7286
26	K, Li Y, Zhang K, Li Y Fu. Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 13467− 13476
27	Z, Chen Y, Fu Y X, Wang L, Ma W, Liu M Hebert. Image deformation meta-networks for one-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 8672− 8681
28	H, Zhang J, Zhang P Koniusz. Few-shot learning via saliency-guided hallucination of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 2765− 2774
29	C, Yang L, Xie S, Qiao A L Yuille. Training deep neural networks in generations: a more tolerant teacher educates better students. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 5628− 5635
30	A, Romero N, Ballas S E, Kahou A, Chassang C, Gatta Y Bengio. FitNets: hints for thin deep nets. In: Proceedings of the 3rd International Conference on Learning Representations. 2015, 1− 13
31	S, Zagoruyko N Komodakis. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of the 5th International Conference on Learning Representations. 2017, 1− 13
32	J, Yim D, Joo J, Bae J Kim. A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 7130− 7138
33	T, Fukuda M, Suzuki G, Kurata S, Thomas J, Cui B Ramabhadran. Efficient knowledge distillation from an ensemble of teachers. In: Proceedings of the 18th Annual Conference of the International Speech Communication Association. 2017, 3697− 3701
34	Z H, Zhou Y, Jiang S F Chen . Extracting symbolic rules from trained neural network ensembles. AI Communications, 2003, 16( 1): 3– 15
35	Z H, Zhou Y Jiang . NeC4. 5: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, 2004, 16( 6): 770– 773
36	S, Ruder P, Ghaffari J G Breslin. Knowledge adaptation: teaching to adapt. 2017, arXiv preprint arXiv: 1702.02052
37	N, Li Y, Yu Z H Zhou. Diversity regularized ensemble pruning. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. 2012, 330− 345
38	J, Deng W, Dong R, Socher L J, Li K, Li F F Li. ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248− 255
39	C, Wah S, Branson P, Welinder P, Perona S Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001. Pasadena: California Institute of Technology, 2011
40	N, Hilliard L, Phillips S, Howland A, Yankov C D, Corley N O Hodas. Few-shot learning with metric-agnostic conditional embeddings. 2018, arXiv preprint arXiv: 1802.04376
41	J, Krause M, Stark J, Deng F F Li. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013, 554− 561
42	B, Zhou A, Lapedriza A, Khosla A, Oliva A Torralba . Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40( 6): 1452– 1464
43	Horn G, Van Aodha O, Mac Y, Song Y, Cui C, Sun A, Shepard H, Adam P, Perona S Belongie. The iNaturalist species classification and detection dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 8769− 8778

[1]

FCS-21250-OF-ZJ_suppl_1

Download

[1]	Qiming FU, Zhechao WANG, Nengwei FANG, Bin XING, Xiao ZHANG, Jianping CHEN. MAML²: meta reinforcement learning via meta-learning for task categories[J]. Front. Comput. Sci., 2023, 17(4): 174325-.
[2]	Haoyu ZHAO, Weidong MIN, Jianqiang XU, Qi WANG, Yi ZOU, Qiyan FU. Scene-adaptive crowd counting method based on meta learning with dual-input network DMNet[J]. Front. Comput. Sci., 2023, 17(1): 171304-.
[3]	Wei GAO, Mingwen SHAO, Jun SHU, Xinkai ZHUANG. Meta-BN Net for few-shot learning[J]. Front. Comput. Sci., 2023, 17(1): 171302-.
[4]	Pinzhuo TIAN, Yang GAO. Improving meta-learning model via meta-contrastive loss[J]. Front. Comput. Sci., 2022, 16(5): 165331-.

Viewed

Full text

Abstract

Cited

Shared

Discussed