Please wait a minute...
Frontiers of Mathematics in China

ISSN 1673-3452

ISSN 1673-3576(Online)

CN 11-5739/O1

Postal Subscription Code 80-964

2018 Impact Factor: 0.565

Front. Math. China    2023, Vol. 18 Issue (6) : 381-414    https://doi.org/10.3868/s140-DDD-023-0027-x
Analysis of loss functions in support vector machines
Huajun WANG, Naihua XIU()
Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing 100044, China
 Download: PDF(4114 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Support vector machines (SVMs) are a kind of important machine learning methods generated by the cross interaction of statistical theory and optimization, and have been extensively applied into text categorization, disease diagnosis, face detection and so on. The loss function is the core research content of SVM, and its variational properties play an important role in the analysis of optimality conditions, the design of optimization algorithms, the representation of support vectors and the research of dual problems. This paper summarizes and analyzes the 0-1 loss function and its eighteen popular surrogate loss functions in SVM, and gives three variational properties of these loss functions: subdifferential, proximal operator and Fenchel conjugate, where the nine proximal operators and fifteen Fenchel conjugates are given by this paper.

Keywords Support vector machines      loss function      subdifferential      proximal operator      Fenchel conjugate     
Corresponding Author(s): Naihua XIU   
Online First Date: 27 February 2024    Issue Date: 05 March 2024
 Cite this article:   
Huajun WANG,Naihua XIU. Analysis of loss functions in support vector machines[J]. Front. Math. China, 2023, 18(6): 381-414.
 URL:  
https://academic.hep.com.cn/fmc/EN/10.3868/s140-DDD-023-0027-x
https://academic.hep.com.cn/fmc/EN/Y2023/V18/I6/381
Fig.1  SVM agent loss function classification
Fig.2  Convex non-smooth loss function schematic
Fig.3  Convex smooth loss function schematic
Fig.4  Non-convex non-smooth loss function schematic
Fig.5  Non-convex smooth loss function schematic
Fig.6  Subdifferential diagram of convex nonsmooth loss function where the set of subdifferentials at the non-integrable points is shown by dashed lines
Fig.7  Gradient diagram of convex smooth loss function
Fig.8  Non-convex non-smooth loss function of Clarke subdifferential schematic where the set of Clarke subdifferentials at the non-differentiable points is shown by dashed lines
Fig.9  Non-convex smooth loss function of the gradient schematic
Fig.10  Schematic diagram of the neighborhood point operator for convex nonsmooth loss function
Fig.11  Schematic diagram of the neighborhood point operator of the convex smooth loss function
Fig.12  Non-convex non-smooth loss function of the neighborhood point operator, where the multi-valued points are indicated by dashed lines
Fig.13  (a) Schematic diagram of the 0-1 loss function, where the discontinuities are indicated by dashed lines; (b) Schematic diagram of the Clarke subdifferential of the 0-1 loss function, where the set of Clarke subdifferentials at the non-differentiable points is shown by the dashed line; (c) Schematic diagram of the 0-1 loss function for the neighborhood operator, where the multi-valued points are shown by dashed lines
(“Y” means the loss function has the corresponding property, “N” means the loss function does not have the corresponding property, “[*]” means that the corresponding property of the loss function is given by this paper)
Loss functionConvexityBoundednessSparsityRobustnessSubdifferentialNeighborhood point operatorFenchel conjugate
0-1 loss [62] N Y Y Y Y Y [72] Y [62]
Y[*]
Hinge loss [16] Y N Y N Y [16] Y [67] Y [16]
Generalized hinge loss [2] Y N Y N Y [2] Y[*] Y[*]
Pinball loss [30] Y N N N Y [30] Y[*] Y [30]
ε-insensitive bouncing ball loss [26] Y N Y N Y [26] Y[*] Y [26]
Secondary hinge loss [16] Y N Y N Y [16] Y[*] Y[*]
Huber hinge loss [10] Y N Y N Y [10] Y[*] Y[*]
Logarithmic loss [60] Y N N N Y [60] N Y[*]
Least squares loss [57] Y N N N Y [57] Y [23] Y [57]
Huber bouncing ball loss [78] Y N N N Y [78] Y[*] Y[*]
Chute loss [54] N Y Y Y Y [54] Y [61] Y[*]
Truncated logarithmic loss [46] N Y Y Y Y [46] N Y[*]
Truncated least squares loss [42] N Y Y Y Y Y [42] Y[*]
Y[*]
Truncated bouncing ball loss [55] N N Y N Y [55] Y[*] Y[*]
Double truncated bouncing ball loss [68] N Y Y Y Y [68] Y[*] Y[*]
Generalized exponential loss [22] N Y Y Y Y [22] N Y[*]
Generalized logarithmic loss [22] N Y Y Y Y Y [22] N
Y[*]
Sigmoid loss [47] N Y N Y Y [47] N Y[*]
Cumulative distribution loss [24] N Y N Y Y [24] N Y[*]
Tab.1  Properties of the 19 loss functions in the quad SVM
1 M F Akay. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 2009; 36(2): 3240–3247
2 P L Bartlett, M H Wegkamp. Classification with a reject option using a hinge loss. J Mach Learn Res 2008; 9: 1823–1840
3 A Beck. First-Order Methods in Optimization. MOS-SIAM Series on Optimization, Vol 25. Philadelphia, PA: SIAM, 2017
4 J P Brooks. Support vector machines with the ramp loss and the hard margin loss. Oper Res 2011; 59(2): 467–479
5 L J Cao, S S Keerthi, C J Ong, J Q Zhang, U Periyathamby, X J Fu, H P Lee. Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans Neural Netw 2006; 17(4): 1039–1049
6 E Carrizosa, A Nogales-Gómez, Morales D Romero. Heuristic approaches for support vector machines with the ramp loss. Optim Lett 2014; 8(3): 1125–1135
7 C-C Chang, C-W Hsu, C-J Lin. The analysis of decomposition methods for support vector machines. IEEE Trans Neural Netw 2000; 11(4): 1003–1008
8 C-C Chang, C-J Lin. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011; 2(3): 27
9 K-W Chang, C-J Hsieh, C-J Lin. Coordinate descent method for large-scale L2-loss linear support vector machines. J Mach Learn Res 2008; 9: 1369–1398
10 O Chapelle. Training a support vector machine in the primal. Neural Comput 2007; 19(5): 1155–1178
11 O Chapelle, P Haffner, V N Vapnik. Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 1999; 10(5): 1055–1064
12 H L Chen, B Yang, J Liu, D Y Liu. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 2011; 38(7): 9014–9022
13 F H Clarke. Optimization and Nonsmooth Analysis. New York: John Wiley & Sons, 1983
14 R CollobertF SinzJ WestonL Bottou. Trading convexity for scalability. In: ICML 2006, Proceedings of the 23rd International Conference on Machine Learning (Cohen W W, Moore A, eds). New York: Association for Computing Machinery, 2006, 201–208
15 R Collobert, F Sinz, J Weston, L Bottou. Large scale transductive SVMs. J Mach Learn Res 2006; 7: 1687–1712
16 C Cortes, V Vapnik. Support vector networks. Mach Learn 1995; 20(3): 273–297
17 B J De Kruif, T J A De Vries. Pruning error minimization in least squares support vector machines. IEEE Trans Neural Netw 2003; 14(3): 696–702
18 N Y DengY J TianC H Zhang. Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions. Boca Raton, FL: CRC Press, 2012
19 S Ertekin, L Bottou, C L Giles. Nonconvex online support vector machines. IEEE Trans Pattern Anal Mach Intell 2011; 33(2): 368–381
20 R-E Fan, K-W Chang, C-J Hsieh, X-R Wang, C-J Lin. LIBLINEAR: A library for large linear classification. J Mach Learn Res 2008; 9: 1871–1874
21 R-E Fan, P-H Chen, C-J Lin. Working set selection using second order information for training support vector machines. J Mach Learn Res 2005; 6: 1889–1918
22 Y L Feng, Y N Yang, X L Huang, S Mehrkanoon, J A K Suykens. Robust support vector machines for classification with nonconvex and smooth losses. Neural Comput 2016; 28(6): 1217–1247
23 I E Frank, J H Friedman. A statistical view of some chemometrics regression tools. Technometrics 1993; 35(2): 109–135
24 H GhanbariM H LiK Scheinberg. Novel and efficient approximations for zero-one loss of linear classifiers, 2019, arXiv: 1903.00359
25 L W Huang, Y H Shao, J Zhang, Y T Zhao, J Y Teng. Robust rescaled hinge loss twin support vector machine for imbalanced noisy classification. IEEE Access 2019; 7: 65390–65404
26 X L Huang, L Shi, J A K Suykens. Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 2014; 36(5): 984–997
27 X L Huang, L Shi, J A K Suykens. Ramp loss linear programming support vector machine. J Mach Learn Res 2014; 15: 2185–2211
28 X L Huang, L Shi, J A K Suykens. Sequential minimal optimization for SVM with pinball loss. Neurocomputing 2015; 149(C): 1596–1603
29 X L Huang, L Shi, J A K Suykens. Solution path for pin-SVM classifiers with positive and negative τ values. IEEE Trans Neural Netw Learn Syst 2017; 28(7): 1584–1593
30 V JumutcX L HuangJ A K Suykens. Fixed-size Pegasos for hinge and pinball loss SVM. In: The 2013 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE, 2013
31 S S Keerthi, D DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. J Mach Learn Res 2005; 6: 341–361
32 S S Keerthi, E G Gilbert. Convergence of a generalized SMO algorithm for SVM classifier design. Machine Learning 2002; 46: 351–360
33 S S Keerthi, S K Shevade. SMO algorithm for least-squares SVM formulations. Neural Comput 2003; 15(2): 487–507
34 S S Keerthi, S K Shevade, C Bhattacharyya, K R K Murthy. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 2014; 13(3): 637–649
35 N M Khan, R Ksantini, I S Ahmad, B Boufama. A novel SVM+NDA model for classification with an application to face recognition. Pattern Recognition 2012; 45(1): 66–79
36 C-P Lee, C-J Lin. A Study on L2-loss (squared hinge-loss) multiclass SVM. Neural Comput 2013; 25(5): 1302–1323
37 Y-J Lee, O L Mangasarian. SSVM: a smooth support vector machine for classification. Comput Optim Appl 2001; 20(1): 5–22
38 H Li. Statistical Learning Methods, 2nd ed. Beijing: Tsinghua Univ Press, 2019 (in Chinese)
39 J T Li, Y M Jia, W L Li. Adaptive huberized support vector machine and its application to microarray classification. Neural Computing and Applications 2011; 20(1): 123–132
40 C-J Lin. On the convergence of the decomposition method for support vector machines. IEEE Trans Neural Netw 2001; 12(6): 1288–1298
41 C-J Lin. Asymptotic convergence of an SMO algorithm without any assumptions. IEEE Trans Neural Netw 2002; 13(1): 248–250
42 D L Liu, Y Shi, Y J Tian, X K Huang. Ramp loss least squares support vector machine. J Comput Sci 2016; 14: 61–68
43 J López, J A K Suykens. First and second order SMO algorithms for LS-SVM classifiers. Neural Process Lett 2011; 33(1): 31–44
44 D Mančev. A sequential dual method for the structured ramp loss minimization. Facta Univ Ser Math Inform 2005; 30(1): 13–27
45 L Mason, P L Bartlett, J Baxter. Improved generalization through explicit optimization of margins. Mach Learn 2000; 38(3): 243–255
46 S Y Park, Y F Liu. Robust penalized logistic regression with truncated loss functions. Canad J Statist 2011; 39(2): 300–323
47 F Pérez-Cruz, A Navia-Vázquez, A R Figueiras-Vidal, Rodríguez A Artés-. Empirical risk minimization for support vector classifiers. IEEE Trans Neural Netw 2003; 14(2): 296–303
48 R T RockafellarR J-B Wets. Variational Analysis, Corrected 3rd printing. Grundlehren der Mathematischen Wissenschaften, Vol 317. Berlin: Springer-Verlag, 2009
49 T Sabbah, M Ayyash, M Ashraf. Hybrid support vector machine based feature selection method for text classification. The International Arab Journal of Information Technology 2018; 15(3A): 599–609
50 S Shalev-Shwartz, Y Singer, N Srebro, A Cotter. Pegasos: primal estimated sub-gradient solver for SVM. Math Program 2011; 127(1): Ser B, 3–30
51 Y H Shao, L M Liu, L W Huang, N Y Deng. Key issues of support vector machines and future prospects. Sci Sin Math 2020; 50(9): 1233–1248
52 Y H Shao, K L Yang, M Z Liu, Z Wang, C N Li, W J Chen. From support vector machine to nonparallel support vector machine. Operations Research Transactions 2018; 22(2): 55–65(inChinese)
53 W Sharif, I T R Yanto, N A Samsudin, M M Deris, A Khan, M F Mushtaq, M Ashraf. An optimised support vector machine with Ringed Seal Search algorithm for efficient text classification. Journal of Engineering Science and Technology 2019; 14(3): 1601–1613
54 X T Shen, G C Tseng, X G Zhang, W H Wong. On ψ-learning. J Amer Statist Assoc 2003; 98(463): 724–734
55 X Shen, L F Niu, Z Q Qi, Y J Tian. Support vector machine classifier with truncated pinball loss. Pattern Recognition 2017; 68: 199–210
56 I SteinwartA Christmann. Support Vector Machines. New York: Springer, 2008
57 J A K Suykens, J Vandewalle. Least squares support vector machine classifiers. Neural Process Lett 1999; 9(3): 293–300
58 M Tanveer, S Sharma, R Rastogi, P Anand. Sparse support vector machine with pinball loss. Trans Emerg Telecommun Technol 2021; 32(2): e3820
59 P VenkateswarLalG R NittaA Prasad. Ensemble of texture and shape descriptors using support vector machine classification for face recognition. J Ambient Intell Humaniz Comput, 2019, https://doi:10.1007/s12652-019-01192-7, in press
60 G Wahba. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In: Advances in Kernel Methods—Support Vector Learning (Schölkopf B, Burges C J C, Smola A J, eds). Cambridge, MA: MIT Press, 1998, 69–88
61 H J Wang, Y H Shao, N H Xiu. Proximal operator and optimality conditions for ramp loss SVM. Optim Lett 2022; 16: 999–1014
62 H J Wang, Y H Shao, S L Zhou, C Zhang, N H Xiu. Support vector machine classifier via L0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell 2022; 44(10): 7253–7265
63 Q Wang, Y Ma, K Zhao, Y J Tian. A comprehensive survey of loss functions in machine learning. Ann Data Sci 2020; 9: 187–212
64 Y C Wu, Y F Liu. Robust truncated hinge loss support vector machines. J Amer Statist Assoc 2007; 102(479): 974–983
65 J M Xu, L Li. A face recognition algorithm based on sparse representation and support vector machine. Computer Technology and Development 2018; 28(2): 59–63(inChinese)
66 Y Y Xu, I Akrotirianakis, A Chakraborty. Proximal gradient method for huberized support vector machine. Pattern Anal Appl 2016; 19(4): 989–1005
67 Y Q Yan, Q N Li. An efficient augmented Lagrangian method for support vector machine. Optim Methods Softw 2020; 35(4): 855–883
68 L M Yang, H W Dong. Support vector machine with truncated pinball loss and its application in pattern recognition. Chemometrics Intell Lab Syst 2018; 177: 89–99
69 Y Yang, H Zou. An efficient algorithm for computing the HHSVM and its generalizations. J Comput Graph Statist 2013; 22(2): 396–415
70 Z J Yang, Y T Xu. A safe accelerative approach for pinball support vector machine classifier. Knowledge-Based Syst 2018; 147: 12–24
71 J Yin, Q N Li. A semismooth Newton method for support vector classification and regression. Comput Optim Appl 2019; 73(2): 477–508
72 C Zhang. Support Vector Machine Classifier via 0-1 Loss Function. MS Thesis. Beijing: Beijing Jiaotong University, 2019 (in Chinese)
73 T Zhang, F J Oles. Text categorization based on regularized linear classification methods. Information Retrieval 2001; 4(1): 5–31
74 W Zhang, T Yoshida, X J Tang. Text classification based on multi-word with support vector machine. Knowledge-Based Syst 2008; 21(8): 879–886
75 L ZhaoM MammadovJ Yearwood. From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops. Piscataway, NJ: IEEE, 2010, 1281–1288
76 Y P Zhao, J G Sun. Recursive reduced least squares support vector regression. Pattern Recognition 2009; 42(5): 837–842
77 S S Zhou. Sparse LSSVM in primal using Cholesky factorization for large-scale problems. IEEE Trans Neural Netw Learn Syst 2016; 27(4): 783–795
78 W X Zhu, Y Y Song, Y Y Xiao. Support vector machine classifier with huberized pinball loss. Eng Appl Artif Intell 2020; 91: 103635
[1] Chungen LIU, Li ZUO, Xiaofei ZHANG. Minimal periodic solutions of first-order convex Hamiltonian systems with anisotropic growth[J]. Front. Math. China, 2018, 13(5): 1063-1073.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed