|
|
A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k |
Enes DEDEOGLU, Himmet Toprak KESGIN( ), Mehmet Fatih AMASYALI |
Department of Computer Engineering, Yildiz Technical University, Istanbul 34220, Turkey |
|
|
Abstract The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.
|
Keywords
robust optimization
label noise
noisy label
deep learning
noisy datasets
noise ratio estimation
robust training
|
Corresponding Author(s):
Himmet Toprak KESGIN
|
Just Accepted Date: 04 April 2023
Issue Date: 05 June 2023
|
|
1 |
C, Zhang S, Bengio M, Hardt B, Recht O Vinyals . Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 2021, 64( 3): 107–115
|
2 |
S, Liao X, Jiang Z Ge . Weakly supervised multilayer perceptron for industrial fault classification with inaccurate and incomplete labels. IEEE Transactions on Automation Science and Engineering, 2022, 19( 2): 1192–1201
|
3 |
D, Ortego E, Arazo P, Albert N E, O’Connor K McGuinness . Towards robust learning with different label noise distributions. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). 2021, 7020−7027
|
4 |
E, Arazo D, Ortego P, Albert N, O’Connor K McGuinness . Unsupervised label noise modeling and loss correction. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 312−321
|
5 |
K, Nishi Y, Ding A, Rich T Höllerer . Augmentation strategies for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8018−8027
|
6 |
N, Majidi E, Amid H, Talebi M K Warmuth . Exponentiated gradient reweighting for robust training under label noise and beyond. 2021, arXiv preprint arXiv: 2104.01493
|
7 |
V, Shah X, Wu S Sanghavi . Choosing the sample with lowest loss makes SGD robust. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 2120−2130
|
8 |
Y, Bengio J, Louradour R, Collobert J Weston . Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 41−48
|
9 |
H T, Kesgin M F Amasyali . Cyclical curriculum learning. 2022, arXiv preprint arXiv: 2202.05531
|
10 |
B, Han Q, Yao X, Yu G, Niu M, Xu W, Hu I W, Tsang M Sugiyama . Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536−8546
|
11 |
X, Shi W Che . Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17( 5): 175333
|
12 |
H, Yang Y, Jin Z, Li D B, Wang L, Miao X, Geng M L Zhang . Learning from noisy labels via dynamic loss thresholding. 2021, arXiv preprint arXiv: 2104.02570
|
13 |
Y, Wei M, Xue X, Liu P Xu . Data fusing and joint training for learning with noisy labels. Frontiers of Computer Science, 2022, 16( 6): 166338
|
14 |
Q, Yao H, Yang B, Han G, Niu J T Kwok . Searching to exploit memorization effect in learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1000
|
15 |
Chi Y, Li Y, Zhang H, Liang Y. Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation. In: Proceedings of the 3rd International MATHEON Conference on Compressed Sensing and Its Applications. 2019, 237−261
|
16 |
Y, Shen S Sanghavi . Learning with bad training data via iterative trimmed loss minimization. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5739−5748
|
17 |
K, Nakamura B W Hong . Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label. 2020, arXiv preprint arXiv: 2012.11073
|
18 |
D P, Kingma J Ba . Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
19 |
L Deng . The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 2012, 29( 6): 141–142
|
20 |
H, Xiao K, Rasul R Vollgraf . Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
|
21 |
Krizhevsky A. Learning multiple layers of features from tiny images.Technical Report, 2009
|
22 |
K, He X, Zhang S, Ren J Sun . Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630−645
|
23 |
A L, Maas R E, Daly P T, Pham D, Huang A Y, Ng C Potts . Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142−150
|
24 |
comet-examples/comet-keras-cnn-lstm-example.py at master • comet-ml/comet-examples • github. See qwone.com/~jason/20Newsgroups website, 2021
|
25 |
R, Misra P Arora . Sarcasm detection using hybrid neural network. 2019, arXiv preprint arXiv: 1908.07414
|
26 |
kaggle. Sarcasm detection: a guide for ML and DL approach. See kaggle.com/subbhashit/sarcasm-detection-a-guide-for-ml-and-dl-approach website. 2021
|
27 |
M H, Alam W J, Ryu S Lee . Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Information Sciences, 2016, 339: 206–223
|
28 |
kaggle. Hotel reviews sentiment prediction. See kaggle.com/code/shahraizanwar/hotel-reviews-sentiment-prediction/notebook website. 2021
|
29 |
Home page for 20 newsgroups data set. See qwone.com/~jason/ 20Newsgroups website, 2014
|
30 |
Team K. Using pre-trained word embeddings. See keras.io/examples/nlp/pretrained_word_embeddings website, 2021
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|