Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2025, Vol. 19 Issue (1) : 191302    https://doi.org/10.1007/s11704-023-3578-7
Artificial Intelligence
KD-Crowd: a knowledge distillation framework for learning from crowds
Shaoyuan LI(), Yuxiang ZHENG, Ye SHI, Shengjun HUANG, Songcan CHEN
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China
 Download: PDF(5087 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each workers expertise, and aggregate over them to infer the latent true labels. In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in Stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher’s mistakes and obtaining better true label predictions; in Stage 2, we switch their roles, retraining a better crowdsourcing model using the crowds’ annotations supervised by the refined true label predictions given by Stage 1. Additionally, we propose one f-mutual information gain (MIGf) based knowledge distillation loss, which finds the maximum information intersection between the student’s and teacher’s prediction. We show in experiments that MIGf achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework, KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.

Keywords crowdsourcing      label noise      worker expertise      knowledge distillation      robust learning     
Corresponding Author(s): Shaoyuan LI   
Just Accepted Date: 03 November 2023   Issue Date: 12 March 2024
 Cite this article:   
Shaoyuan LI,Yuxiang ZHENG,Ye SHI, et al. KD-Crowd: a knowledge distillation framework for learning from crowds[J]. Front. Comput. Sci., 2025, 19(1): 191302.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-3578-7
https://academic.hep.com.cn/fcs/EN/Y2025/V19/I1/191302
Fig.1  Illustrative example for dynamic memorization of annotation noise. The experiment is conducted on CIFAR-10 with 60% symmetric noise and independent mistakes. The vertical axis means the fraction of wrong annotations that are memorized by each worker’s true label predictions. The shaded regions for the curves encapsulate the range within one standard deviation of the respective means
Fig.2  The KD-Crowd framework consists of two stages: (1) We train a noise-model-free robust student model to refine the true label predictions of the transition matrix based crowdsourcing teacher model; (2) We retrain the crowdsourcing model using crowds’ annotations and the refined true label predictions simultaneously
  
Noise scenarios Inst 20% Sym 20% Sym 40%
Methods Independent Mistakes Effortless Workers Correlated Mistakes Independent Mistakes Effortless Workers Correlated Mistakes Independent Mistakes Effortless Workers Correlated Mistakes
DL-MV Best 72.45 ± 0.61 54.08 ± 5.83 64.79 ± 4.24 74.14 ± 0.12 10.00 ± 0.00 73.72 ± 0.44 69.65 ± 0.45 10.00 ± 0.00 69.46 ± 0.13
Last 71.65 ± 0.63 52.52 ± 6.13 63.69 ± 4.24 73.40 ± 0.47 10.00 ± 0.00 72.60 ± 1.28 69.19 ± 0.35 10.00 ± 0.00 67.69 ± 0.32
WDN Best 68.73 ± 1.48 10.60 ± 0.68 68.47 ± 1.86 74.16 ± 0.52 10.30 ± 0.17 73.10 ± 0.58 69.39 ± 0.65 10.01 ± 0.07 67.39 ± 2.29
Last 68.25 ± 1.48 10.47 ± 0.58 68.15 ± 1.77 73.77 ± 0.62 10.24 ± 0.15 72.82 ± 0.65 69.03 ± 0.76 10.04 ± 0.04 66.79 ± 2.24
CrowdLayer Best 68.31 ± 1.65 69.43 ± 1.30 63.79 ± 0.84 73.51 ± 0.15 73.81 ± 0.48 73.91 ± 0.21 70.86 ± 0.65 70.54 ± 0.26 69.76 ± 0.43
Last 61.75 ± 0.95 62.06 ± 0.58 62.95 ± 1.10 71.22 ± 0.66 71.02 ± 0.41 70.01 ± 0.21 61.54 ± 0.34 61.25 ± 0.51 60.59 ± 0.30
AggNet Best 72.87 ± 0.47 51.98 ± 2.01 72.54 ± 0.30 74.27 ± 0.18 10.00 ± 0.00 73.91 ± 0.21 69.98 ± 0.11 10.00 ± 0.00 69.63 ± 0.23
Last 72.12 ± 0.45 50.98 ± 2.06 71.82 ± 0.59 73.94 ± 0.38 10.00 ± 0.00 72.85 ± 0.34 69.05 ± 0.72 10.00 ± 0.00 68.80 ± 0.69
CoNAL Best 72.13 ± 0.40 71.88 ± 0.64 70.47 ± 0.52 74.40 ± 0.29 71.99 ± 3.59 73.95 ± 0.25 70.42 ± 0.23 66.69 ± 1.74 70.15 ± 0.05
Last 65.84 ± 0.38 64.80 ± 0.19 63.76 ± 2.01 71.23 ± 0.32 67.47 ± 5.98 70.76 ± 0.84 60.79 ± 1.12 59.79 ± 3.59 59.98 ± 0.70
Max-MIG Best 71.16 ± 0.55 70.61 ± 0.23 70.99 ± 0.52 73.91 ± 0.13 74.23 ± 0.35 72.72 ± 0.11 70.14 ± 0.45 70.36 ± 0.42 68.94 ± 0.44
Last 63.65 ± 1.17 62.26 ± 0.28 63.47 ± 0.14 73.17 ± 0.74 73.64 ± 0.23 71.79 ± 0.45 68.37 ± 0.19 68.01 ± 0.54 64.17 ± 0.56
KD-Crowd Stage 1 77.25 ± 0.42 77.03 ± 0.92 77.05 ± 0.78 85.12 ± 0.37 85.48 ± 0.32 84.77 ± 0.25 84.57 ± 0.29 84.62 ± 0.43 84.52 ± 0.50
Stage 2 88.99 ± 0.86 88.23 ± 0.97 88.28 ± 0.58 89.10 ± 0.96 89.33 ± 0.84 89.28 ± 0.10 87.92 ± 0.52 87.91 ± 0.52 88.44 ± 0.30
Ensemble 89.29 ± 0.69 88.73 ± 0.76 88.75 ± 0.30 89.96 ± 0.81 90.28 ± 0.15 90.10 ± 0.13 88.86 ± 0.56 88.89 ± 0.21 89.40 ± 0.24
Noise scenarios Sym 60% Sym 80% Asym
Methods Independent Mistakes Effortless Workers Correlated Mistakes Independent Mistakes Effortless Workers Correlated Mistakes Independent Mistakes Effortless Workers Correlated Mistakes
DL-MV Best 70.18 ± 0.40 23.48 ± 1.06 69.32 ± 1.05 32.13 ± 1.36 10.02 ± 0.04 36.62 ± 0.98 53.62 ± 0.15 42.37 ± 0.56 53.00 ± 0.78
Last 41.16 ± 1.30 19.47 ± 0.96 37.49 ± 1.53 18.37 ± 0.72 10.00 ± 0.00 17.33 ± 0.30 52.27 ± 1.01 35.75 ± 0.79 50.32 ± 0.28
WDN Best 71.26 ± 0.40 16.47 ± 1.71 55.72 ± 1.59 46.93 ± 1.58 10.23 ± 0.23 17.00 ± 0.95 68.78 ± 2.27 40.72 ± 3.99 66.09 ± 0.76
Last 45.93 ± 0.66 14.79 ± 4.19 36.05 ± 1.20 26.29 ± 9.14 10.16 ± 0.19 15.50 ± 1.35 52.56 ± 4.33 26.01 ± 4.12 55.27 ± 6.03
CrowdLayer Best 73.56 ± 0.50 71.67 ± 0.27 57.70 ± 2.49 56.50 ± 1.55 41.79 ± 1.74 10.99 ± 0.94 81.48 ± 0.29 71.26 ± 4.29 72.41 ± 0.48
Last 51.81 ± 1.01 50.17 ± 1.02 48.14 ± 0.88 24.28 ± 1.28 13.72 ± 1.54 10.53 ± 0.56 80.91 ± 0.93 70.15 ± 5.11 71.54 ± 0.47
AggNet Best 76.92 ± 0.67 76.43 ± 0.38 71.35 ± 0.64 59.79 ± 1.50 40.57 ± 3.19 44.49 ± 1.18 82.10 ± 0.41 78.35 ± 1.29 69.40 ± 0.36
Last 55.49 ± 1.47 54.41 ± 0.16 44.32 ± 0.52 26.96 ± 1.07 25.46 ± 0.87 15.43 ± 0.99 80.43 ± 0.43 76.93 ± 1.19 65.56 ± 2.64
CoNAL Best 73.80 ± 0.06 16.83 ± 9.56 74.06 ± 0.77 55.46 ± 0.98 10.00 ± 0.00 53.12 ± 0.45 79.96 ± 0.28 57.80 ± 4.37 80.29 ± 0.24
Last 47.82 ± 5.55 16.79 ± 9.49 47.02 ± 3.47 22.93 ± 0.92 10.00 ± 0.00 21.69 ± 0.48 79.18 ± 0.64 56.99 ± 3.97 80.04 ± 0.50
Max-MIG Best 74.07 ± 0.23 74.04 ± 0.45 72.47 ± 0.76 55.07 ± 3.21 56.91 ± 0.93 49.73 ± 0.34 80.99 ± 0.24 80.25 ± 0.20 78.16 ± 0.27
Last 61.00 ± 0.44 62.26 ± 0.28 54.50 ± 0.50 27.56 ± 1.39 29.16 ± 0.97 22.15 ± 0.16 80.05 ± 0.70 79.38 ± 0.29 76.21 ± 1.37
KD-Crowd Stage 1 84.60 ± 0.51 86.66 ± 0.81 83.95 ± 0.44 68.94 ± 0.88 71.54 ± 0.57 47.75 ± 1.28 87.11 ± 1.09 88.34 ± 0.87 86.60 ± 0.88
Stage 2 83.46 ± 0.13 83.43 ± 0.32 82.69 ± 0.96 73.22 ± 0.37 74.69 ± 0.21 51.10 ± 1.60 84.41 ± 0.89 84.92 ± 0.32 84.35 ± 0.58
Ensemble 86.08 ± 0.10 87.17 ± 0.16 85.87 ± 0.15 72.23 ± 0.16 74.35 ± 0.33 51.74 ± 0.85 89.95 ± 0.08 89.60 ± 0.30 87.65 ± 0.25
Tab.1  Test accuracy (%) on CIFAR-10 under Inst 20%, Sym 20%, Sym 40%, Sym 60%, Sym 80% and Asym noise cases. For the baselines, we report both the best and last performance. For the proposed KD-Crowd, we report three results for better understanding. Bold values represent the best three methods, bold and underlined values represent the best methods
Datasets LabelMe CIFAR10H
Methods Best Last Best Last
DL-MV 81.23 ± 2.78 76.37 ± 0.67 63.87 ± 0.54 58.43 ± 0.80
WDN 83.78 ± 0.86 82.24 ± 0.37 69.43 ± 0.32 68.67 ± 0.45
CrowdLayer 85.63 ± 0.53 80.78 ± 0.39 67.26 ± 0.43 65.30 ± 1.17
AggNet 87.05 ± 0.24 83.87 ± 0.87 69.58 ± 0.30 67.94 ± 0.77
CoNAL 84.01 ± 0.36 79.97 ± 0.93 66.46 ± 0.52 61.53 ± 0.88
Max-MIG 87.40 ± 1.55 82.32 ± 0.59 67.87 ± 0.21 65.83 ± 0.24
KD-Crowd Stage 1 89.22 ± 1.12 70.19 ± 0.51
Stage 2 89.11 ± 0.21 69.99 ± 0.65
Ensemble 89.03 ± 0.52 71.46 ± 0.28
Tab.2  Test accuracy (%) on real-world datasets LabelMe and CIFAR10H. For the baselines, we report both the best and last performance. For the proposed KD-Crowd, we report three results for better understanding. Bold values represent the best three methods, bold and underlined values represent the best methods
Metric Test accuracy Train accuracy
Noise scenarios Sym 60% Sym 80% Asym Sym 60% Sym 80% Asym
Stages Student type Distillation method
Initialization 70.32 ± 0.02 49.48 ± 2.69 79.64 ± 0.41 78.01 ± 0.15 51.47 ± 2.96 90.26 ± 0.14
61.00 ± 0.44 27.56 ± 1.39 80.05 ± 0.70 67.05 ± 0.24 31.49 ± 0.21 90.75 ± 0.07
Stage 1 ELR 72.87 ± 1.04 53.74 ± 0.92 78.23 ± 0.54 79.21 ± 0.30 53.65 ± 1.77 87.76 ± 0.36
ELR+ 84.60 ± 0.51 68.94 ± 0.88 87.11 ± 1.09 86.21 ± 0.34 70.23 ± 0.29 91.26 ± 0.12
Stage 2 ELR → Di KL Divergence 71.14 ± 0.20 50.92 ± 1.21 78.77 ± 0.42 77.28 ± 0.31 50.61 ± 0.71 87.61 ± 0.06
ELR → M KL Divergence 72.74 ± 0.24 51.93 ± 2.30 79.25 ± 0.18 77.90 ± 0.41 52.04 ± 2.20 87.86 ± 0.25
ELR → M MIGf 79.12 ± 0.48 65.67 ± 2.21 82.22 ± 0.43 84.13 ± 0.58 67.40 ± 2.12 89.59 ± 0.26
ELR+ → M MIGf 81.87 ± 0.45 70.09 ± 0.70 82.94 ± 0.60 85.74 ± 0.09 69.98 ± 0.49 90.87 ± 0.03
ELR+ → M MIGf, Disturb 83.46 ± 0.13 73.22 ± 0.37 84.41 ± 0.89 88.56 ± 0.07 72.89 ± 0.09 92.79 ± 0.13
Ensemble ELR+ → M MIGf, Disturb 86.08 ± 0.10 72.23 ± 0.16 89.95 ± 0.08 88.73 ± 0.14 72.06 ± 0.14 93.26 ± 0.03
Tab.3  Ablation study results of test and train accuracy (%) on CIFAR-10 with independent mistakes. We only report the last performance results of stages in KD-Crowd. The two rows of Initialization mean Max-MIG pre-trained for 10 epochs (row 1) and 50 epochs (row 2). Bold value represents the best method
Fig.3  Illustrative results of the entry wise transition matrix estimation error Tem on CIFAR-10 with 60% symmetric independent mistakes, which is calculated as |Tm?T^m|/Tm, with Tm, T^m respectively denoting the groundtruth and estimated transition matrix. Darker (lighter) entries mean larger (smaller) errors
Fig.4  The gap between the overall error Em of Max-MIG and KD-Crowd for all workers on CIFAR10H (a) and LabelMe (b). Em is calculated as the mean value of entries in Tem=|Tm?T^m|/Tm
Metirc Test accuracy Train accuracy
Datasets LabelMe CIFAR10H LabelMe CIFAR10H
Methods Stage
CrowdLayer ELR+ Stage 1 79.53 ± 3.58 69.77 ± 0.47 86.28 ± 4.72 80.65 ± 1.21
Stage 2 82.21 ± 0.65 69.67 ± 1.63 83.27 ± 0.57 77.65 ± 0.82
Ensemble 80.78 ± 3.00 72.10 ± 0.80
CrowdLayer DivideMix Stage 1 83.42 ± 0.67 68.07 ± 2.23 85.80 ± 2.10 77.83 ± 3.34
Stage 2 82.18 ± 0.95 64.40 ± 6.90 76.43 ± 0.87 72.60 ± 8.14
Ensemble 86.56 ± 0.98 71.73 ± 1.37
Max-MIG ELR+ Stage 1 89.33 ± 0.24 69.43 ± 0.32 86.69 ± 4.72 79.60 ± 1.29
Stage 2 87.77 ± 0.31 68.67 ± 0.45 91.63 ± 1.07 79.67 ± 3.68
Ensemble 89.79 ± 0.10 71.46 ± 0.28
Max-MIG DivideMix Stage 1 84.37 ± 1.99 66.73 ± 0.97 87.90 ± 1.15 73.13 ± 0.46
Stage 2 88.97 ± 0.42 71.27 ± 1.33 91.17 ± 0.63 77.39 ± 0.21
Ensemble 88.24 ± 1.66 70.43 ± 1.27
Tab.4  Different implementation of KD-Crowd on real-world datasets LabelMe and CIFAR10H. Bold values represent the best methods
  
  
  
  
  
1 R, Snow B, O’Connor D, Jurafsky A Y Ng . Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. 2008, 254−263
2 V C, Raykar S, Yu L H, Zhao G H, Valadez C, Florin L, Bogoni L Moy . Learning from crowds. The Journal of Machine Learning Research, 2010, 11: 1297–1322
3 S, Albarqouni C, Baur F, Achilles V, Belagiannis S, Demirci N Navab . AggNet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Transactions on Medical Imaging, 2016, 35( 5): 1313–1321
4 Rodrigues F, Pereira F. Deep learning from crowds. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 30th Innovative Applications of Artificial Intelligence Conference, and 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 2017, 197
5 Yang Y, Wei H, Zhu H, Yu D, Xiong H, Yang J. Exploiting crossmodal prediction and relation consistency for semisupervised image captioning. IEEE Transactions on Cybernetics, 2022, doi: 10.1109/TCYB.2022.3156367
6 A P, Dawid A M Skene . Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 1979, 28( 1): 20–28
7 P, Cao Y, Xu Y, Kong Y Wang . Max-MIG: an information theoretic approach for joint learning from crowds. In: Proceedings of the 7th International Conference on Learning Representations. 2019
8 N, Natarajan I S, Dhillon P, Ravikumar A Tewari . Learning with noisy labels. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 2013, 1196−1204
9 D, Arpit S, Jastrzebski N, Ballas D, Krueger E, Bengio M S, Kanwal T, Maharaj A, Fischer A, Courville Y, Bengio S Lacoste-Julien . A closer look at memorization in deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 233−242
10 X-J, Gui W, Wang Z-H Tian . Towards understanding deep learning from noisy labels with small-loss criterion. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 2469−2475
11 L, Jiang Z, Zhou T, Leung L-J, Li F-F Li . MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 2304−2313
12 B, Han Q, Yao X, Yu G, Niu M, Xu W, Hu I W, Tsang M Sugiyama . Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536−8546
13 X, Yu B, Han J, Yao G, Niu I, Tsang M Sugiyama . How does disagreement help generalization against label corruption? In: Proceedings of the 36th International Conference on Machine Learning. 2019, 7164−7173
14 E, Malach S Shalev-Shwartz . Decoupling “when to update” from “how to update”. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 961−971
15 J, Li R, Socher S C H Hoi . DivideMix: learning with noisy labels as semi-supervised learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
16 S, Liu J, Niles-Weed N, Razavian C Fernandez-Granda . Early-learning regularization prevents memorization of noisy labels. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1707
17 H, Song M, Kim J G Lee . SELFIE: refurbishing unclean samples for robust deep learning. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5907−5915
18 Q, Liu J, Peng A T Ihler . Variational inference for crowdsourcing. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 692−700
19 D, Zhou J C, Platt S, Basu Y Mao . Learning from the wisdom of crowds by minimax entropy. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2195−2203
20 F, Rodrigues F C, Pereira B Ribeiro . Gaussian process classification and active learning with multiple annotators. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 433−441
21 Guan M Y, Gulshan V, Dai A M, Hinton G E. Who said what: modeling individual labelers improves classification. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. 2018
22 R, Tanno A, Saeedi S, Sankaranarayanan D C, Alexander N Silberman . Learning from noisy labels by regularized estimation of annotator confusion. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 11236−11245
23 Chu Z, Ma J, Wang H. Learning from crowds by modeling common confusions. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence AAAI 2021, 33rd Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, and 11th Symposium on Educational Advances in Artificial Intelligence. 2021, 5832−5840
24 S-Y, Li S-J, Huang S Chen . Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 2021, 64( 3): 130104
25 Y, Shi S-Y, Li S-J Huang . Learning from crowds with sparse and imbalanced annotations. Machine Learning, 2023, 112( 6): 1823–1845
26 S-Y, Li Y, Jiang N V, Chawla Z-H Zhou . Multi-label learning from crowds. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 7): 1369–1382
27 K, Lee S, Yun K, Lee H, Lee B, Li J Shin . Robust inference via generative classifiers for handling noisy labels. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3763−3772
28 Y, Yao T, Liu B, Han M, Gong J, Deng G, Niu M Sugiyama . Dual T: reducing estimation error for transition matrix in label-noise learning. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020, 7260−7271
29 Ghosh A, Kumar H, Sastry P S. Robust loss functions under label noise for deep neural networks. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 1919−1925
30 Z, Zhang M R Sabuncu . Generalized cross entropy loss for training deep neural networks with noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8792−8802
31 X, Ma H, Huang Y, Wang S R S, Erfani J Bailey . Normalized loss functions for deep learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 607
32 Li M, Soltanolkotabi M, Oymak S. Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 4313−4324
33 G, Hinton O, Vinyals J Dean . Distilling the knowledge in a neural network. 2015. arXiv preprint arXiv: 1503.02531
34 Z-H, Zhou Y, Jiang S-F Chen . Extracting symbolic rules from trained neural network ensembles. AI Communications, 2003, 16( 1): 3–15
35 Z-H, Zhou Y Jiang . NeC4.5: neural ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering, 2004, 16( 6): 770–773
36 N, Li Y, Yu Z-H Zhou . Diversity regularized ensemble pruning. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2012, 330−345
37 Y, Li J, Yang Y, Song L, Cao J, Luo L-J Li . Learning from noisy labels with distillation. In: Proceedings of IEEE International Conference on Computer Vision. 2017, 1928−1936
38 Z, Zhang H, Zhang S Ö, Arik H, Lee T Pfister . Distilling effective supervision from severe label noise. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 9291−9300
39 Yang Y, Zhan D-C, Fan Y, Jiang Y, Zhou Z-H. Deep learning for fixed model reuse. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2831−2837
40 Q, Xie M T, Luong E, Hovy Q V Le . Self-training with noisy student improves ImageNet classification. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, 10684−10695
41 E D, Cubuk B, Zoph J, Shlens Q V Le . Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020, 3008−3017
42 A Krizhevsky . Learning multiple layers of features from tiny images. University of Toronto, Dissertation, 2009
43 X, Xia T, Liu B, Han N, Wang M, Gong H, Liu G, Niu D, Tao M Sugiyama . Part-dependent label noise: towards instance-dependent label noise. In: Proceedings of the 34th Conference on Neural Information Processing Systems. 2020, 7597−7610
44 J C, Peterson R M, Battleday T L, Griffiths O Russakovsky . Human uncertainty makes classification more robust. In: Proceedings of IEEE/CVF International Conference on Computer Vision. 2019, 9617−9626
45 K, He X, Zhang S, Ren J Sun . Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
46 N, Ma X, Zhang H-T, Zheng J Sun . ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 122−138
[1] FCS-23578-OF-SL_suppl_1 Download
[1] Xiaochuan LIN, Kaimin WEI, Zhetao LI, Jinpeng CHEN, Tingrui PEI. Aggregation-based dual heterogeneous task allocation in spatial crowdsourcing[J]. Front. Comput. Sci., 2024, 18(6): 186605-.
[2] Lijuan REN, Liangxiao JIANG, Wenjun ZHANG, Chaoqun LI. Label distribution similarity-based noise correction for crowdsourcing[J]. Front. Comput. Sci., 2024, 18(5): 185323-.
[3] Enes DEDEOGLU, Himmet Toprak KESGIN, Mehmet Fatih AMASYALI. A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k[J]. Front. Comput. Sci., 2024, 18(4): 184315-.
[4] Jiaran LI, Richong ZHANG, Samuel MENSAH, Wenyi QIN, Chunming HU. Classification-oriented dawid skene model for transferring intelligence from crowds to machines[J]. Front. Comput. Sci., 2023, 17(5): 175332-.
[5] Yao ZHANG, Liangxiao JIANG, Chaoqun LI. Attribute augmentation-based label integration for crowdsourcing[J]. Front. Comput. Sci., 2023, 17(5): 175331-.
[6] Zhong JI, Jingwei NI, Xiyao LIU, Yanwei PANG. Teachers cooperation: team-knowledge distillation for multiple cross-domain few-shot learning[J]. Front. Comput. Sci., 2023, 17(2): 172312-.
[7] Peng LI, Junzuo LAI, Yongdong WU. Accountable attribute-based authentication with fine-grained access control and its application to crowdsourcing[J]. Front. Comput. Sci., 2023, 17(1): 171802-.
[8] Tao HAN, Hailong SUN, Yangqiu SONG, Yili FANG, Xudong LIU. Find truth in the hands of the few: acquiring specific knowledge with crowdsourcing[J]. Front. Comput. Sci., 2021, 15(4): 154315-.
[9] Gang WU, Zhiyong CHEN, Jia LIU, Donghong HAN, Baiyou QIAO. Task assignment for social-oriented crowdsourcing[J]. Front. Comput. Sci., 2021, 15(2): 152316-.
[10] Zhenghui HU, Wenjun WU, Jie LUO, Xin WANG, Boshu LI. Quality assessment in competition-based software crowdsourcing[J]. Front. Comput. Sci., 2020, 14(6): 146207-.
[11] Bo YUAN, Xiaolei ZHOU, Xiaoqiang TENG, Deke GUO. Enabling entity discovery in indoor commercial environments without pre-deployed infrastructure[J]. Front. Comput. Sci., 2019, 13(3): 618-636.
[12] Xiaolei ZHOU, Tao CHEN, Deke GUO, Xiaoqiang TENG, Bo YUAN. From one to crowd: a survey on crowdsourcing-based wireless indoor localization[J]. Front. Comput. Sci., 2018, 12(3): 423-450.
[13] Najam NAZAR,He JIANG,Guojun GAO,Tao ZHANG,Xiaochen LI,Zhilei REN. Source code fragment summarization with small-scale crowdsourcing based features[J]. Front. Comput. Sci., 2016, 10(3): 504-517.
[14] Xiaolan XU,Wenjun WU,Ya WANG,Yuchuan WU. Software crowdsourcing for developing Software-as-a-Service[J]. Front. Comput. Sci., 2015, 9(4): 554-565.
[15] Wenjun WU, Wei-Tek TSAI, Wei LI. An evaluation framework for software crowdsourcing[J]. Front. Comput. Sci., 2013, 7(5): 694-709.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed