|
|
New logarithmic step size for stochastic gradient descent |
Mahsa Soheil SHAMAEE1, Sajad Fathi HAFSHEJANI2(), Zeinab SAEIDIAN3 |
1. Department of Computer Science, Faculty of Mathematical Science, University of Kashan, Kashan 87317-53153, Iran 2. Department of Applied Mathematics, Shiraz University of Technology, Shiraz 13876-71557, Iran 3. Department of Mathematical Sciences, University of Kashan, Kashan 87317-53153, Iran |
|
|
Abstract In this paper, we propose a novel warm restart technique using a new logarithmic step size for the stochastic gradient descent (SGD) approach. For smooth and non-convex functions, we establish an convergence rate for the SGD. We conduct a comprehensive implementation to demonstrate the efficiency of the newly proposed step size on the FashionMinst, CIFAR10, and CIFAR100 datasets. Moreover, we compare our results with nine other existing approaches and demonstrate that the new logarithmic step size improves test accuracy by 0.9% for the CIFAR100 dataset when we utilize a convolutional neural network (CNN) model.
|
Keywords
stochastic gradient descent
logarithmic step size
warm restart technique
|
Corresponding Author(s):
Sajad Fathi HAFSHEJANI
|
Just Accepted Date: 18 October 2023
Issue Date: 14 March 2024
|
|
1 |
H, Robbins S Monro . A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22( 3): 400–407
|
2 |
A, Krizhevsky I, Sutskever G E Hinton . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60( 6): 84–90
|
3 |
A, Krizhevsky G Hinton . Learning multiple layers of features from tiny images. Toronto: University of Toronto, Department of Computer Science, 2009
|
4 |
Redmon J, Farhadi A. Yolo9000: better, faster, stronger. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 6517−6525
|
5 |
J, Zhang C Zong . Deep neural networks in machine translation: an overview. IEEE Intelligent Systems, 2015, 30( 5): 16–25
|
6 |
Mishra P, Sarawadekar K. Polynomial learning rate policy with warm restart for deep neural network. In: Proceedings of 2019 IEEE Region 10 Conference (TENCON). 2019, 2087−2092
|
7 |
S, Vaswani A, Mishkin I, Laradji M, Schmidt G, Gidel S Lacoste-Julien . Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 335
|
8 |
R M, Gower N, Loizou X, Qian A, Sailanbayev E, Shulgin P Richtárik . SGD: General analysis and improved rates. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5200−5209
|
9 |
Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2261–2269
|
10 |
L N Smith . Cyclical learning rates for training neural networks. In: Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017, 464−472
|
11 |
Loshchilov I, Hutter F. SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the 5th International Conference on Learning Representations. 2017
|
12 |
G, Vrbančič V Podgorelec . Efficient ensemble for image-based identification of pneumonia utilizing deep CNN and SGD with warm restarts. Expert Systems with Applications, 2022, 187: 115834
|
13 |
Xu G, Cao H, Dong Y, Yue C, Zou Y. Stochastic gradient descent with step cosine warm restarts for pathological lymph node image classification via PET/CT images. In: Proceedings of the 5th IEEE International Conference on Signal and Image Processing (ICSIP). 2020, 490−493
|
14 |
A, Nemirovski A, Juditsky G, Lan A Shapiro . Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19( 4): 1574–1609
|
15 |
S, Ghadimi G H Lan . Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 2013, 23( 4): 2341–2368
|
16 |
F, Bach E Moulines . Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems. 2011, 451−459
|
17 |
A, Rakhlin O, Shamir K Sridharan . Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, 2011
|
18 |
X, Li Z, Zhuang F Orabona . A second look at exponential and cosine step sizes: Simplicity, adaptivity, and performance. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 6553−6564
|
19 |
R, Ge S M, Kakade R, Kidambi P Netrapalli . The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 1341
|
20 |
Wang X, Magnússon S, Johansson M. On the convergence of step decay step-size for stochastic optimization. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 2021, 14226−14238
|
21 |
J, Nocedal S J Wright . Numerical Optimization. New York: Springer, 1999
|
22 |
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770−778
|
23 |
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
24 |
N S, Aybat A, Fallah M, Gurbuzbalaban A Ozdaglar . A universally optimal multistage accelerated stochastic gradient method. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 765
|
25 |
G B, Thomas R L, Finney M D, Weir F R Giordano . Thomas’ Calculus, Early Transcendentals. 10th ed. Boston: Addison Wesley, 2002
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|