Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2024, Vol. 18 Issue (5) : 185323    https://doi.org/10.1007/s11704-023-2751-3
Artificial Intelligence
Label distribution similarity-based noise correction for crowdsourcing
Lijuan REN1, Liangxiao JIANG1(), Wenjun ZHANG1, Chaoqun LI2,3
1. School of Computer Science, China University of Geosciences, Wuhan 430074, China
2. Key Laboratory of Artificial Intelligence, Ministry of Education, Shanghai 200240, China
3. School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
 Download: PDF(6069 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance’s information from both its features and multiple noisy labels simultaneously when identifying a noise instance. In this study, we argue that the more distinguishable an instance’s features but the noisier its multiple noisy labels, the more likely it is a noise instance. Based on this premise, we propose a label distribution similarity-based noise correction (LDSNC) method. To measure whether an instance’s features are distinguishable, we obtain each instance’s predicted label distribution by building multiple classifiers using instances’ features and their integrated labels. To measure whether an instance’s multiple noisy labels are noisy, we obtain each instance’s multiple noisy label distribution using its multiple noisy labels. Then, we use the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and define the instance with the lower similarity as a noise instance. The extensive experimental results on 34 simulated and four real-world crowdsourced datasets validate the effectiveness of our method.

Keywords crowdsourcing learning      noise correction      label distribution similarity      kullback-leibler divergence     
Corresponding Author(s): Liangxiao JIANG   
Just Accepted Date: 15 May 2023   Issue Date: 10 July 2023
 Cite this article:   
Lijuan REN,Liangxiao JIANG,Wenjun ZHANG, et al. Label distribution similarity-based noise correction for crowdsourcing[J]. Front. Comput. Sci., 2024, 18(5): 185323.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-023-2751-3
https://academic.hep.com.cn/fcs/EN/Y2024/V18/I5/185323
Fig.1  Several cases of label aggregation. The features of an instance determine whether the instance is distinguishable. In cases where an instance’s features are distinguishable but its multiple noisy labels are noisy, the instance is more likely to be a noise instance
Fig.2  The process of obtaining each instance’s predicted labels. We build K random trees based on instances’ features and integrated labels, which are used to obtain each instance’s predicted labels L^i
Fig.3  Overall framework of LDSNC. In LDSNC, we at first obtain each instance’s predicted label distribution P^i and multiple noisy label distribution Pi using L^i and Li, respectively. Then, we measure the label distribution similarity si using the KL divergence to identify and filter noise instances and thus obtain a clean set D^c and a noise set D^n. Finally, we build three heterogeneous classifiers on D^c to correct instances in D^n and thus obtain a corrected dataset D~
  
Dataset Instances Features Classes Feature type
Anneal 898 38 6 Hybrid
Audiology 226 69 24 Nominal
Autos 205 25 7 Hybrid
Balance-scale 625 4 3 Numeric
Biodeg 1055 41 2 Numeric
Breast-cancer 286 9 2 Nominal
Breast-w 699 9 2 Numeric
Car 1728 6 4 Nominal
Credit-a 690 15 2 Hybrid
Credit-g 1000 20 2 Hybrid
Diabetes 768 8 2 Numeric
Heart-c 303 13 5 Hybrid
Heart-h 294 13 5 Hybrid
Heart-statlog 270 13 2 Numeric
Hepatitis 155 19 2 Hybrid
Horse-colic 368 22 2 Hybrid
Hypothyroid 3772 29 4 Hybrid
Ionosphere 351 34 2 Numeric
Iris 150 4 3 Numeric
Kr-vs-kp 3196 36 2 Nominal
Labor 57 16 2 Hybrid
Letter 20000 16 26 Numeric
Lymph 148 18 4 Hybrid
Mushroom 8124 22 2 Nominal
Segment 2310 19 7 Numeric
Sick 3772 29 2 Hybrid
Sonar 208 60 2 Numeric
Spambase 4601 57 2 Numeric
Tic-tac-toe 958 9 2 Nominal
Vehicle 846 18 4 Numeric
Vote 435 16 2 Nominal
Vowel 990 13 11 Hybrid
Waveform 5000 40 3 Numeric
Zoo 101 17 7 Hybrid
Tab.1  The description of 34 simulated datasets
Dataset MV PL STC CC AVNC CENC LDNC LDSNC
Anneal 10.33 19.74 10.33 16.12 10.33 8.9 5.12 7.65
Audiology 18.1 31.9 18.1 23.5 24.38 25.18 17.48 16.55
Autos 8.15 45.07 8.15 22.59 18.29 18.29 6.68 8.59
Balance-scale 17.78 24.7 17.78 15.18 17.97 17.46 12.56 10.05
Biodeg 20.07 29.31 20.49 15.61 13.96 14.53 14.64 13.87
Breast-cancer 21.05 26.71 24.16 24.23 23.29 24.62 18.85 17.97
Breast-w 19.86 4.32 11.39 4.38 4.26 4.88 7.97 4.88
Car 14.81 21.57 14.13 22.02 9.38 9.1 7.85 5.91
Credit-a 20.36 19.71 17.48 14.71 13.22 12.48 12.07 11.67
Credit-g 20.19 25.03 23.53 22.41 21.99 22.26 18.05 15.48
Diabetes 21.03 25.72 22.77 22.49 22.28 22.23 17.62 15.34
Heart-c 20.26 17.52 20.26 17.19 18.45 19.14 15.28 11.95
Heart-h 20.68 34.93 20.68 19.39 16.53 16.87 14.83 12.52
Heart-statlog 20.56 16.7 21.59 19.96 17.3 17.7 17.3 12.89
Hepatitis 21.42 18.58 19.1 15.1 16 16.71 14.65 12.52
Horse-colic 22.09 16.55 17.99 19.81 14.27 14.08 14.1 14.81
Hypothyroid 12.41 1.3 12.41 9.54 0.55 0.49 3.2 2.88
Ionosphere 20.06 16.3 14.84 9.54 10.8 10.68 12.45 11.54
Iris 12.8 26.2 7.8 3.6 4.87 4.93 4.27 3
Kr-vs-kp 19.74 25.41 7.52 12.65 1.81 2.13 6.27 8.01
Labor 20 26.67 21.05 13.86 20.53 20.35 16.67 12.63
Letter 1.89 9.85 10.81 3.28 8.76 7.57 1.16 1.25
Lymph 18.31 20.34 18.31 16.15 18.72 17.97 15.27 11.82
Mushroom 20.51 6.85 5.32 1.98 0.1 0.17 5.73 7.44
Segment 4.83 4.34 4.76 3.18 3.03 2.13 0.89 1.51
Sick 18.26 2.09 6.05 5.46 1.46 1.48 5.56 8.79
Sonar 23.61 40.67 26.92 22.84 23.85 24.38 22.45 19.42
Spambase 20.16 33.16 15.27 10.66 7.41 7.75 10.08 12.15
Tic-tac-toe 22.39 33.96 21.26 18.97 16.6 16.63 15.48 12.78
Vehicle 7.67 21.08 23.13 19.26 18.7 17.68 7.07 7.73
Vote 19.2 5.03 10.6 9.08 4.02 4.25 7.06 5.86
Vowel 3.76 29.08 15 3.2 10.18 11.03 3.26 3.92
Waveform 9.86 13.36 21.25 14.77 13.56 13.71 8.02 6.01
Zoo 13.07 19.5 13.07 5.15 9.9 10.4 8.22 5.35
Average 16.63 20.98 15.98 14.05 12.85 12.89 10.83 9.84
Tab.2  Noise ratio (%) comparisons on the uniform distribution for LDSNC versus MV, PL, STC, CC, AVNC, CENC and LDNC
MV PL STC CC AVNC CENC LDNC LDSNC
MV ? ?
PL ?
STC ? ?
CC ? ? ? ?
AVNC ? ? ? ?
CENC ? ? ? ?
LDNC ? ? ? ? ? ? ?
LDSNC ? ? ? ? ? ? ? ?
Tab.3  Noise ratio (%) comparisons on the uniform distribution using Wilcoxon tests for LDSNC versus MV, PL, STC, CC, AVNC, CENC and LDNC
Dataset MV PL STC CC AVNC CENC LDNC LDSNC
Anneal 10.39 19.78 10.39 15.31 10.36 9.47 5.51 7.52
Audiology 18.89 34.34 18.89 23.76 25.53 27.26 19.25 17.08
Autos 8.73 42.98 8.73 22.83 20.44 20.49 7.66 9.41
Balance-scale 16.37 23.1 16.37 14.88 17.22 16.46 10.77 8.7
Biodeg 21.45 28.75 21.51 15.4 14.32 15.19 14.81 14.29
Breast-cancer 20.56 27.8 24.62 24.97 25.07 25.49 17.83 17.06
Breast-w 20.87 4.52 11.4 4.92 4.05 4.69 9 5.81
Car 14.36 21.73 13.92 22.07 9.13 8.32 7.52 5.76
Credit-a 20.29 18.91 17.19 14.54 13.28 13.12 12.3 12.2
Credit-g 19.09 24.75 23.15 22.44 22.36 22.17 17.41 14.91
Diabetes 19.6 23.82 22.47 21.48 22.12 22.17 16.41 14.53
Heart-c 19.34 18.12 19.34 17.76 18.98 18.81 17.39 13.53
Heart-h 18.78 29.73 18.78 17.41 17.24 17.01 13.64 11.56
Heart-statlog 21.07 17.19 20.89 19.7 18.37 20.07 17.22 13.04
Hepatitis 20.84 18.52 18.77 16.26 16.84 17.35 13.87 12.58
Horse-colic 20 15.98 17.31 19.16 14.16 13.7 12.96 13.34
Hypothyroid 12.78 1.25 12.78 9.46 0.52 0.5 3.39 2.97
Ionosphere 21.17 15.61 15.38 10.4 10.14 10.66 12.85 12.25
Iris 10.8 24.73 7.27 3.2 5.4 4.6 4.2 2.33
Kr-vs-kp 19.59 25.19 7.63 12.58 1.72 1.59 6.08 7.89
Labor 23.51 28.25 24.04 14.74 20 19.47 16.49 14.91
Letter 1.9 10.16 10.82 3.27 9.28 7.76 1.17 1.16
Lymph 19.66 24.53 19.66 19.05 21.76 21.35 16.35 11.35
Mushroom 20.03 5.83 5.07 1.94 0.06 0.14 5.64 7.19
Segment 4.77 4.21 4.82 3.14 2.93 2.17 0.81 1.53
Sick 19.54 2.16 6.73 5.54 1.64 1.68 6.19 10.2
Sonar 21.01 39.47 24.81 21.06 20.58 22.64 19.23 17.31
Spambase 20.56 30.67 15.64 10.57 7.35 7.63 10.19 12.57
Tic-tac-toe 20.23 34.42 20.35 17.84 15.21 14.52 13.68 11.52
Vehicle 7.62 21.28 23.53 19.57 18.94 17.47 7.04 7.86
Vote 20.51 5.66 12.11 10.71 4.6 4.46 7.77 6.51
Vowel 3.53 28.54 15.71 3.21 9.57 11.04 3.25 4.19
Waveform 11.52 13.82 22.9 14.94 14.21 14.27 9.1 6.85
Zoo 13.56 16.93 13.56 6.73 10.59 10.4 8.51 6.04
Average 16.56 20.67 16.07 14.14 13.06 13.06 10.75 9.88
Tab.4  Noise ratio (%) comparisons on the Gaussian distribution for LDSNC versus MV, PL, STC, CC, AVNC, CENC and LDNC
MV PL STC CC AVNC CENC LDNC LDSNC
MV ? ?
PL ?
STC ? ?
CC ? ? ?
AVNC ? ? ? ?
CENC ? ? ? ?
LDNC ? ? ? ? ? ? ?
LDSNC ? ? ? ? ? ? ? ?
Tab.5  Noise ratio (%) comparisons on the Gaussian distribution using Wilcoxon tests for LDSNC versus MV, PL, STC, CC, AVNC, CENC and LDNC
Dataset Instances Features Classes Workers Labels
Income 600 10 2 67 6000
LabelMe 1000 512 8 59 2547
Leaves 384 64 6 83 3840
Music?genre 700 124 10 44 2946
Tab.6  The description of four real-world datasets
Fig.4  Noise ratio (%) comparisons for LDSNC versus MV, PL, STC, CC, AVNC, CENC and LDNC on four real-world datasets. (a) Income; (b) LabelMe; (c) Leaves; (d) Music_genre
Fig.5  Noise ratio (%) comparisons for LDSNC when K varies from 5 to 50 on the “Income” dataset
Fig.6  Noise ratio (%) comparisons for LDSNC versus KOS, PL, STC, CC, AVNC, CENC and LDNC on the “Income” dataset
  
  
  
  
1 L, Jiang L, Zhang C, Li J Wu . A correlation-based feature weighting filter for naive bayes. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 2): 201–213
2 Y, Hu L, Jiang C Li . Instance difficulty-based noise correction for crowdsourcing. Expert Systems with Applications, 2023, 212: 118794
3 V S, Sheng F J, Provost P G Ipeirotis . Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 614−622
4 R, Snow B, O’Connor D, Jurafsky A Y Ng . Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008, 254−263
5 J Zhang . Knowledge learning with crowdsourcing: a brief review and systematic perspective. IEEE/CAA Journal of Automatica Sinica, 2022, 9( 5): 749–762
6 D R, Karger S, Oh D Shah . Iterative learning for reliable crowdsourcing systems. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 1953−1961
7 J, Zhang V S, Sheng J, Wu X Wu . Multi-class ground truth inference in crowdsourcing with clustering. IEEE Transactions on Knowledge and Data Engineering, 2016, 28( 4): 1080–1085
8 L, Yin J, Han W, Zhang Y Yu . Aggregating crowd wisdoms with label-aware autoencoders. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1325−1331
9 L, Jiang H, Zhang F, Tao C Li . Learning from crowds with multiple noisy label distribution propagation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33( 11): 6558–6568
10 Z, Chen L, Jiang C Li . Label augmented and weighted majority voting for crowdsourcing. Information Sciences, 2022, 606: 397–409
11 A P, Dawid A M Skene . Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28( 1): 20–28
12 V C, Raykar S, Yu L H, Zhao G H, Valadez C, Florin L, Bogoni L Moy . Learning from crowds. The Journal of Machine Learning Research, 2010, 11: 1297–1322
13 G, Demartini D E, Difallah P C Mauroux . ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web. 2012, 469−478
14 Y, Zhang X, Chen D, Zhou M I Jordan . Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 1260−1268
15 Y, Li B I P, Rubinstein T Cohn . Exploiting worker correlation for label aggregation in crowdsourcing. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3886−3895
16 J, Zhang X Wu . Multi-label truth inference for crowdsourcing using mixture models. IEEE Transactions on Knowledge and Data Engineering, 2021, 33( 5): 2083–2095
17 B, Nicholson V S, Sheng J Zhang . Label noise correction and application in crowdsourcing. Expert Systems with Applications, 2016, 66: 149–162
18 J, Zhang V S, Sheng T, Li X Wu . Improving crowdsourced label quality using noise correction. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29( 5): 1675–1688
19 W, Xu L, Jiang C Li . Improving data and model quality in crowdsourcing using cross-entropy-based noise correction. Information Sciences, 2021, 546: 803–814
20 Z, Chen L, Jiang C Li . Label distribution-based noise correction for multiclass crowdsourcing. International Journal of Intelligent Systems, 2022, 37( 9): 5752–5767
21 H, Li L, Jiang S Xue . Neighborhood weighted voting-based noise correction for crowdsourcing. ACM Transactions on Knowledge Discovery from Data, 2023, 17( 7): 96
22 Li J, Socher R, Hoi S C H. DivideMix: Learning with noisy labels as semi-supervised learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
23 S, Liu J, Niles-Weed N, Razavian C Fernandez-Granda . Early-learning regularization prevents memorization of noisy labels. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020
24 L, Jiang C Li . Two improved attribute weighting schemes for value difference metric. Knowledge and Information Systems, 2019, 60( 2): 949–970
25 J, Deng Y, Wang J, Guo Y, Deng J, Gao Y Park . A similarity measure based on kullback-leibler divergence for collaborative filtering in sparse data. Journal of Information Science, 2019, 45( 5): 656–675
26 J, Zhang V S, Sheng B, Nicholson X Wu . CEKA: a tool for mining the wisdom of crowds. The Journal of Machine Learning Research, 2015, 16( 1): 2853–2858
27 J R Quinlan . C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1993
28 Langley P, Iba W, Thompson K. An analysis of Bayesian classifiers. In: Proceedings of the 10th National Conference on Artificial intelligence. 1992, 223−228
29 Keerthi S, Shevade S, Bhattacharyya C, Murthy K. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation, 2001, 13(3): 637–649
30 I H, Witten E, Frank M A Hall . Data Mining: Practical Machine Learning Tools and Techniques. 3rd ed. Burlington: Morgan Kaufmann, 2011
31 Gamberger D, Lavrac N, Groselj C. Experiments with noise filtering in a medical domain. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 143−151
32 J, Alcalá-Fdez A, Fernández J, Luengo J, Derrac S, García L, Sanchez F Herrera . KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 2011, 17(2−3): 255−287
33 J Demšar . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30
34 J, Zhang V S, Sheng J Wu . Crowdsourced label aggregation using bilayer collaborative clustering. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 10): 3172–3185
35 F, Rodrigues M, Lourenço B, Ribeiro F C Pereira . Learning supervised topic models for classification and regression from crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39( 12): 2409–2422
36 J, Zhang X, Wu V S Sheng . Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 2016, 46( 4): 543–576
37 F, Rodrigues F C, Pereira B Ribeiro . Gaussian process classification and active learning with multiple annotators. In: Proceedings of the 31st International Conference on International Conference on Machine Learning. 2014, II-433−II-441
[1] FCS-22751-OF-LR_suppl_1 Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed