A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k |
Enes DEDEOGLU, Himmet Toprak KESGIN( ), Mehmet Fatih AMASYALI |
Department of Computer Engineering, Yildiz Technical University, Istanbul 34220, Turkey |
Abstract The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.
robust optimization
label noise
noisy label
deep learning
noisy datasets
noise ratio estimation
robust training
Corresponding Author(s):
Himmet Toprak KESGIN
Just Accepted Date: 04 April 2023
Issue Date: 05 June 2023
1 |
C, Zhang S, Bengio M, Hardt B, Recht O Vinyals . Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 2021, 64( 3): 107–115
2 |
S, Liao X, Jiang Z Ge . Weakly supervised multilayer perceptron for industrial fault classification with inaccurate and incomplete labels. IEEE Transactions on Automation Science and Engineering, 2022, 19( 2): 1192–1201
3 |
D, Ortego E, Arazo P, Albert N E, O’Connor K McGuinness . Towards robust learning with different label noise distributions. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). 2021, 7020−7027
4 |
E, Arazo D, Ortego P, Albert N, O’Connor K McGuinness . Unsupervised label noise modeling and loss correction. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 312−321
5 |
K, Nishi Y, Ding A, Rich T Höllerer . Augmentation strategies for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8018−8027
6 |
N, Majidi E, Amid H, Talebi M K Warmuth . Exponentiated gradient reweighting for robust training under label noise and beyond. 2021, arXiv preprint arXiv: 2104.01493
7 |
V, Shah X, Wu S Sanghavi . Choosing the sample with lowest loss makes SGD robust. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 2120−2130
8 |
Y, Bengio J, Louradour R, Collobert J Weston . Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 41−48
9 |
H T, Kesgin M F Amasyali . Cyclical curriculum learning. 2022, arXiv preprint arXiv: 2202.05531
10 |
B, Han Q, Yao X, Yu G, Niu M, Xu W, Hu I W, Tsang M Sugiyama . Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536−8546
11 |
X, Shi W Che . Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17( 5): 175333
12 |
H, Yang Y, Jin Z, Li D B, Wang L, Miao X, Geng M L Zhang . Learning from noisy labels via dynamic loss thresholding. 2021, arXiv preprint arXiv: 2104.02570
13 |
Y, Wei M, Xue X, Liu P Xu . Data fusing and joint training for learning with noisy labels. Frontiers of Computer Science, 2022, 16( 6): 166338
14 |
Q, Yao H, Yang B, Han G, Niu J T Kwok . Searching to exploit memorization effect in learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1000
15 |
Chi Y, Li Y, Zhang H, Liang Y. Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation. In: Proceedings of the 3rd International MATHEON Conference on Compressed Sensing and Its Applications. 2019, 237−261
16 |
Y, Shen S Sanghavi . Learning with bad training data via iterative trimmed loss minimization. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5739−5748
17 |
K, Nakamura B W Hong . Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label. 2020, arXiv preprint arXiv: 2012.11073
18 |
D P, Kingma J Ba . Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
19 |
L Deng . The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 2012, 29( 6): 141–142
20 |
H, Xiao K, Rasul R Vollgraf . Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
21 |
Krizhevsky A. Learning multiple layers of features from tiny images.Technical Report, 2009
22 |
K, He X, Zhang S, Ren J Sun . Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630−645
23 |
A L, Maas R E, Daly P T, Pham D, Huang A Y, Ng C Potts . Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142−150
24 |
comet-examples/comet-keras-cnn-lstm-example.py at master • comet-ml/comet-examples • github. See qwone.com/~jason/20Newsgroups website, 2021
25 |
R, Misra P Arora . Sarcasm detection using hybrid neural network. 2019, arXiv preprint arXiv: 1908.07414
26 |
kaggle. Sarcasm detection: a guide for ML and DL approach. See kaggle.com/subbhashit/sarcasm-detection-a-guide-for-ml-and-dl-approach website. 2021
27 |
M H, Alam W J, Ryu S Lee . Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Information Sciences, 2016, 339: 206–223
28 |
kaggle. Hotel reviews sentiment prediction. See kaggle.com/code/shahraizanwar/hotel-reviews-sentiment-prediction/notebook website. 2021
29 |
Home page for 20 newsgroups data set. See qwone.com/~jason/ 20Newsgroups website, 2014
30 |
Team K. Using pre-trained word embeddings. See keras.io/examples/nlp/pretrained_word_embeddings website, 2021
Viewed |
Full text
Cited |
Shared |
Discussed |