Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

邮发代号 80-970

2019 Impact Factor: 1.275

Frontiers of Computer Science  2019, Vol. 13 Issue (5): 911-912   https://doi.org/10.1007/s11704-019-9901-7
  本期目录
A perspective on off-policy evaluation in reinforcement learning
Lihong LI()
Google Brain, Kirkland, WA 98033, USA
 全文: PDF(189 KB)  
收稿日期: 2019-04-04      出版日期: 2019-06-25
Corresponding Author(s): Lihong LI   
 引用本文:   
. [J]. Frontiers of Computer Science, 2019, 13(5): 911-912.
Lihong LI. A perspective on off-policy evaluation in reinforcement learning. Front. Comput. Sci., 2019, 13(5): 911-912.
 链接本文:  
https://academic.hep.com.cn/fcs/CN/10.1007/s11704-019-9901-7
https://academic.hep.com.cn/fcs/CN/Y2019/V13/I5/911
1 L Bottou, J Peters, J Quiñonero-Candela, D X Charles, D M Chickering, E Portugaly, D Ray, P Simard, E Snelson. Counterfactual reasoning and learning systems: the example of computational advertising. Journal of Machine Learning Research, 2013, 14(1): 3207–3260
2 K Hofmann, L Li, F Radlinski. Online evaluation for information retrieval. Foundations and Trends in Information Retrieval, 2016, 10(1): 1–117
https://doi.org/10.1561/1500000051
3 L Li, W Chu, J Langford, R E Schapire. A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web. 2010, 661–670
https://doi.org/10.1145/1772690.1772758
4 M Dudík, J Langford, L Li. Doubly robust policy evaluation and learning. In: Proceedings of the 28th International Conference on Machine Learning. 2011, 1097–1104
5 A Swaminathan, T Joachims. The selfnormalized estimator for counterfactual learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3231–3239
6 Y X Wang, A Agarwal, M Dudík. Optimal and adaptive off-policy evaluation in contextual bandits. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3589–3597
7 N Jiang, L Li. Doubly robust off-policy evaluation for reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. 2016, 652–661
8 L Li, R Munos, C Szepesvári. Toward minimax off-policy value estimation. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 2015, 608–616
9 D Precup, R S Sutton, S P Singh. Eligibility traces for off-policy policy evaluation. In: Proceedings of the 17th International Conference on Machine Learning. 2000, 759–766
10 Q Liu, L Li, Z Tang, D Zhou. Breaking the curse of horizon: infinitehorizon off-policy estimation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 5361–5371
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed