Please wait a minute...
Frontiers of Mathematics in China

ISSN 1673-3452

ISSN 1673-3576(Online)

CN 11-5739/O1

邮发代号 80-964

2019 Impact Factor: 1.03

Frontiers of Mathematics in China  2015, Vol. 10 Issue (4): 1005-1023   https://doi.org/10.1007/s11464-015-0479-6
  本期目录
First passage Markov decision processes with constraints and varying discount factors
Xiao WU1,2,Xiaolong ZOU2,Xianping GUO2,*()
1. School of Mathematics and Statistics, Zhaoqing University, Zhaoqing 526061, China
2. School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 510275, China
 全文: PDF(175 KB)  
Abstract

This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.

Key wordsDiscrete-time Markov decision process (DTMDP)    constrained optimality    varying discount factor    unbounded cost
收稿日期: 2015-01-28      出版日期: 2015-06-05
Corresponding Author(s): Xianping GUO   
 引用本文:   
. [J]. Frontiers of Mathematics in China, 2015, 10(4): 1005-1023.
Xiao WU,Xiaolong ZOU,Xianping GUO. First passage Markov decision processes with constraints and varying discount factors. Front. Math. China, 2015, 10(4): 1005-1023.
 链接本文:  
https://academic.hep.com.cn/fmc/CN/10.1007/s11464-015-0479-6
https://academic.hep.com.cn/fmc/CN/Y2015/V10/I4/1005
1 Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169-191
https://doi.org/10.1287/moor.19.1.169
2 Altman E. Constrained Markov Decision Processes. Boca Raton: Chapman & Hall/CRC, 1999
3 Alvarez-Mena J, Hernández-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461-484
https://doi.org/10.1007/s001860200209
4 Borkar V. A convex analytic approach to Markov decision processes. Probab Theory Related Fields, 1988, 78: 583-602
https://doi.org/10.1007/BF00353877
5 Derman C. Finite State Markovian Decision Processes. Mathematics in Science and Engineering, Vol 67. New York: Academic Press, 1970
6 Dufour F, Prieto-Rumeau T. Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J Control Optim, 2013, 51: 1298-1324
https://doi.org/10.1137/120867925
7 González-Hernández J, Hernández-Lerma O. Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim, 2005, 15: 1085-1104
https://doi.org/10.1137/040605345
8 Guo X P, Hernández-del-Valle A, Hernández-Lerma O. First passage problems for nonstationary discrete-time stochastic control systems. Eur J Control, 2012, 18: 528-538
https://doi.org/10.3166/EJC.18.528-538
9 Guo X P, Hernández-Lerma O. Continuous-Time Markov Decision Processes: Theory and Applications. Berlin: Springer-Verlag, 2009
https://doi.org/10.1007/978-3-642-02547-1
10 Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res, 2011, 36(1): 105-132
https://doi.org/10.1287/moor.1100.0477
11 Guo X P, Song X Y, Zhang Y. First passage criteria for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans Automat Control, 2014, 59: 163-174
https://doi.org/10.1109/TAC.2013.2281475
12 Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Eur J Control, 2014, 238: 486-496
https://doi.org/10.1016/j.ejor.2014.03.037
13 Hernández-Lerma O, González-Hernández J. Constrained Markov decision processes in Borel spaces: the discounted case. Math Methods Oper Res, 2000, 52: 271-285
https://doi.org/10.1007/s001860000071
14 Hernández-Lerma O, Lasserre J B. Discrete-Time Markov Control Processes. Basic Optimality Criteria. New York: Springer-Verlag, 1996
https://doi.org/10.1007/978-1-4612-0729-0
15 Hernández-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999
https://doi.org/10.1007/978-1-4612-0561-6
16 Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197-219
https://doi.org/10.1007/s10479-012-1292-1
17 Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755-776
https://doi.org/10.1080/07362990008809696
18 Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997
https://doi.org/10.1007/978-94-011-5508-3
19 Piunovskiy A. Controlled random sequences: the convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233-1293
https://doi.org/10.1070/RM1998v053n06ABEH000090
20 Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157-214
https://doi.org/10.1137/1101016
21 Saldi N, Linder T, Yüksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60 (to appear)
https://doi.org/10.1109/TAC.2014.2343831
22 Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999
23 Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369-374
https://doi.org/10.1016/j.orl.2011.06.014
24 Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52(2) (to appear)
25 Zhang Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP, 2013, 21: 378-408
https://doi.org/10.1007/s11750-011-0186-8
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed