Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

doi:10.1007/s11704-020-9519-9

Front. Comput. Sci.

2021, Vol. 15

Issue (5) : 155404 https://doi.org/10.1007/s11704-020-9519-9

RESEARCH ARTICLE

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Feidiao YANG^1,², Wei CHEN³, Jialin ZHANG^1,², Xiaoming SUN^1,²(

)

¹. Institute of Computing Technology, Chinese Academy of Sciences, Bejing 100190, China
². University of Chinese Academy of Sciences, Beijing 100190, China
³. Microsoft Research Asia, Beijing 100080, China

Download: PDF(582 KB)
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Followthe-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1+ ε)-scaled regret of order O( $T 23$ ) for any small ε>0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which Tis the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.

Keywords online learning online combinatorial optimization semi-bandit follow-the-perturbed-leader

Corresponding Author(s): Xiaoming SUN

Just Accepted Date: 30 July 2020 Issue Date: 30 August 2021

Cite this article:

Feidiao YANG,Wei CHEN,Jialin ZHANG, et al. Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization[J]. Front. Comput. Sci., 2021, 15(5): 155404.

URL:

https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-9519-9
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I5/155404

1	A György, T Linder, G Lugosi, G Ottucsák. The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 2007, 8(10): 2369–2403
2	J Y Audibert, S Bubeck, G Lugosi. Regret in online combinatorial optimization. Mathematics of Operations Research, 2013, 39(1): 31–45 https://doi.org/10.1287/moor.2013.0598
3	G Neu, G Bartók. An efficient algorithm for learning with semi-bandit feedback. In: Prodeedings of International Conference on Algorithmic Learning Theory. 2013, 234–248 https://doi.org/10.1007/978-3-642-40935-6_17
4	G Neu, G Bartók. Importance weighting without importance weights: an efficient algorithm for combinatorial semi-bandits. Journal of Machine Learning Research, 2016, 17(154): 1–21
5	A Kalai, S Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 2005, 71(3): 297–307 https://doi.org/10.1016/j.jcss.2004.10.016
6	W Wang , Z H Zhou. Crowdsourcing label quality: a theoretical analysis. Science China Information Sciences, 2015, 58(11): 1–12 https://doi.org/10.1007/s11432-015-5391-x
7	W Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933, 25(3/4): 285–294 https://doi.org/10.1093/biomet/25.3-4.285
8	H Robbins. Some aspects of the sequential design of experiments. Bulletin of the America Mathematical Society, 1952, 58(5): 527–535 https://doi.org/10.1090/S0002-9904-1952-09620-8
9	P Auer, N Cesa-Bianchi, P Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 2002, 47(2–3): 235–256 https://doi.org/10.1023/A:1013689704352
10	P Auer, N Cesa-Bianchi, Y Freund, R Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 2002, 32(1): 48–77 https://doi.org/10.1137/S0097539701398375
11	Y Gai, B Krishnamachari, R Jain. Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 2012, 20(5): 1466–1478 https://doi.org/10.1109/TNET.2011.2181864
12	W Chen, Y Wang, Y Yuan, Q Wang. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. Journal of Machine Learning Research, 2016, 17(50): 1–33
13	B Kveton, Z Wen, A Ashkan, C Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2015, 535–543
14	K A Sankararaman, A Slivkins. Combinatorial semi-bandits with knapsacks. In: Proceedings of International Conference on Artificial Intelligence and Statistics. 2018, 1760–1770
15	B Kveton, Z Wen, A Ashkan, H Eydgahi, B Eriksson. Matroid bandits: fast combinatorial optimization with learning. In: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 2014, 420–429
16	E Takimoto, M Warmuth. Path kernels and multiplicative updates. Journal of Machine Learning Research, 2003, 4(5): 773–818
17	B Awerbuch, R Kleinberg. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. 2004, 45–53 https://doi.org/10.1145/1007352.1007367
18	H B McMahan, A Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of International Conference on Computational Learning Theory. 2004, 109–123 https://doi.org/10.1007/978-3-540-27819-1_8
19	V Dani, S Kakade, T Hayes. The price of bandit information for online optimization. In: Proceedings of the 20th International Conference on Neural Information Processing Systems. 2007, 345–352
20	J Abernethy, E Hazan, A Rakhlin. Competing in the dark: an efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory. 2008, 263–273
21	N Cesa-Bianchi, G Lugosi. Combinatorial bandits. Journal of Computer and System Sciences, 2012, 78(5): 1404–1422 https://doi.org/10.1016/j.jcss.2012.01.001
22	S Bubeck, N Cesa-Bianchi, S Kakade. Towards minimax policies for online linear optimization with bandit feedback. In: Proceedings of Annual Conference on Learning Theory. 2012
23	R Combes, M Talebi, A Proutiere, M Lelarge. Combinatorial bandits revisited. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems. 2015, 2116–2124
24	A Blum. On-line algorithms in machine learning. In: Fiat A, Woeginger G L, eds. Online Algorithms. Berlin: Springer, 1998, 306–325 https://doi.org/10.1007/BFb0029575
25	D Foster, R Vohra. Regret in the on-line decision problem. Games and Economic Behavior, 1999, 29: 7–35 https://doi.org/10.1006/game.1999.0740
26	N Cesa-Bianchi, G Lugosi. Prediction, Learning, and Games.New York: Cambridge University Press, 2006 https://doi.org/10.1017/CBO9780511546921
27	J Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 1957, 3: 97–139 https://doi.org/10.1515/9781400882151-006
28	V Vazirani. Approximation Algorithms. Berlin: Springer, 2001
29	Williamson D,Shmoys A. The Design of Approximation Algorithms. New York: Cambridge University Press, 2011
30	J Poland. FPL analysis for adaptive bandits. In: Proceedings of International Symposium on Stochastic Algorithms. 2005, 58–69 https://doi.org/10.1007/11571155_7
31	S Kakade, A Kalai, K Ligett. Playing games with approximation algorithms. SIAM Journal on Computing, 2009, 39(3): 1088–1106 https://doi.org/10.1137/070701704
32	D Garber. Efficient online linear optimization with approximation algorithms. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 627–635
33	E Hazan, W Hu, Y Li, Z Li. Online improper learning with an approximation oracle. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 5652–5660

[1]

Article highlights

Download

[1]	Hui ZHONG, Zaiyi CHEN, Chuan QIN, Zai HUANG, Vincent W. ZHENG, Tong XU, Enhong CHEN. Adam revisited: a weighted past gradients perspective[J]. Front. Comput. Sci., 2020, 14(5): 145309-.
[2]	Jia ZHU, Xingcheng WU, Jing XIAO, Changqin HUANG, Yong TANG, Ke Deng. Improved expert selection model for forex trading[J]. Front. Comput. Sci., 2018, 12(3): 518-527.
[3]	Lele CAO,Fuchun SUN,Hongbo LI,Wenbing HUANG. Advancing the incremental fusion of robotic sensory features using online multi-kernel extreme learning machine[J]. Front. Comput. Sci., 2017, 11(2): 276-289.

Viewed

Full text

Abstract

Cited

Shared

Discussed