Policy Iteration
234
浏览
0
关注

Markov decision processes (MDPs), named after Andrey Markov, provide a mathematical framework for modeling decision making in situations where outcomes are partly Randomness#In mathematics|random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, Dynamic Programming and Markov Processes. They are used in a wide area of disciplines, including robotics, Automatic control|automated control, economics, and manufacturing. More precisely, a Markov Decision Process is a discrete time stochastic Optimal control theory|control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next...
[展开]
主要的会议/期刊
演化趋势
Chart will load here
Policy Iteration文章数量变化趋势

Feedback
Feedback
Feedback
我想反馈:
排行榜