in this paper, a hierarchical reinforcement learning algorithm is investigated for markov decision process with average reward.
对平均报酬型马氏决策过程,本文研究了一种递阶增强型学习算法;
reinforcement learning based on markov decision process is a way of on-line learning, which can be applied to single agent environment.
基于马尔科夫过程的强化学习作为一种在线学习方式,能够很好地应用于单智能体环境中。
in order to plan ahead for multiple moves, an algorithm known as a markov decision process is commonly used when there are only a reasonably small group of possible world states.
为了计划后面多个步骤,当可能的世界状态数目不算太多时,通常用到一种被称作马尔科夫决策过程的算法。
this paper deals with the continuous time markov decision process with the polynomial return rates on a countable state space and general action sets.
本文讨论了状态空间可列、行动集为一般点集且具有多项式无界报酬率的连续时间平均马氏决策问题。
the paper first presents an objective model of task scheduling, and then based on the analysis of q learning algorithm, the markov decision process description of the scheduling problem is given.
首先建立任务调度问题的目标模型,在分析q学习算法的基础上,给出调度问题的马尔可夫决策过程描述;针对任务调度的q学习算法更新速度慢的问题,提出一种基于多步信息更新值函数的多步q学习调度算法。
the solution to the cross-layer design is modeled as a markov decision process and utilizes the linear programming method to obtain the optimal adaptive transmission policy.
该跨层设计将问题的求解建模为马尔科夫决策过程,利用线性规划推导出最优的自适应传输策略。
in this paper, a hierarchical reinforcement learning algorithm is investigated for markov decision process with average cost.
对平均费用型马氏决策过程,研究了一种递阶增强型学习算法;