|
|
Learning rates for multi-kernel linear programming classifiers |
Feilong CAO(), Xing XING |
Department of Mathematics, China Jiliang University, Hangzhou 310018, China |
|
|
Abstract In this paper, we consider the learning rates of multi-kernel linear programming classifiers. Our analysis shows that the convergence behavior of multi-kernel linear programming classifiers is almost the same as that of multi-kernel quadratic programming. This is implemented by setting a stepping stone between the linear programming and the quadratic programming. An upper bound is presented for general probability distributions and distribution satisfying some Tsybakov noise condition.
|
Keywords
Multi-kernel
linear programming
learning rate
classification
|
Corresponding Author(s):
CAO Feilong,Email:feilongcao@gmail.com
|
Issue Date: 01 April 2011
|
|
1 |
Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 1999 doi: 10.1017/CBO9780511624216
|
2 |
Aronszajn N. Theory of reproducing kernel. Trans Amer Math Soc , 1950, 68: 337-404
|
3 |
Bartleet P L. The sample complexity of pattern classification with neural networks: the size of the wights is more important than the size of the network. IEEE Trans Inform Theory , 1998, 44: 525-536 doi: 10.1109/18.661502
|
4 |
Bartleet P L, Jordan M I, Mcaulifle J D. Convexity, classification, and risk bounds. J Amer Stati Assoc , 2006, 101: 138-156 doi: 10.1198/016214505000000907
|
5 |
Chen D R, Wu Q, Ying Y, Zhou D X. Support vector machine soft margin classifier: error analysis. J Machine Learning Research , 2004, 5: 1143-1175
|
6 |
Cucker F, Smale S. On the mathematical foundations of learning. Bull Amer Math Soc , 2001, 39: 1-49 doi: 10.1090/S0273-0979-01-00923-5
|
7 |
Devroye L, Gy?rfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer-Verlag, 1997
|
8 |
Lugosi G, Vayatis N. On the Bayes-risk consistency of regularized boosting methods. Ann Statist , 2004, 32: 30-55
|
9 |
Pollsard D. Convergence of Stochastic Processes. Berlin: Springer-Verlag, 1984
|
10 |
Scovel C, Steinwart I. Fast Rates for Support Vector Machines. Berlin/Heidelberg: Springer-Verlag, 2005
|
11 |
Smale S, Zhou D X. Shannon sampling and function reconstruction from point values. Bull Amer Math Soc , 2004, 41: 279-305 doi: 10.1090/S0273-0979-04-01025-0
|
12 |
Tsybakov A B. Optimal aggregation of classifiers in statistical learning. Ann Statis , 2004, 32: 135-166 doi: 10.1214/aos/1079120131
|
13 |
Vapnik V. Statistical Learning Theory. New York: John Wiley & Sons, 1998
|
14 |
Wahba G. Spline Models for Observational Data. Philadelphia: SIAM, 1990
|
15 |
Wu Q, Ying Y, Zhou D X. Multi-kernel regularized classifiers. J Complexity , 2007, 23: 108-134 doi: 10.1016/j.jco.2006.06.007
|
16 |
Wu Q, Zhou D X. SVM soft margin classifiers: linear programming versus quadratic programming. Neural Computation , 2005, 17: 1160-1187 doi: 10.1162/0899766053491896
|
17 |
Zhang T. Covering number bounds of certain regularized linear function classes. J Machine Learning Research , 2002, 2: 527-550 doi: 10.1162/153244302760200713
|
18 |
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Stat , 2004, 32: 56-85 doi: 10.1214/aos/1079120130
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|