Markov Decision Process

MDP, Stochastic
Finite State Automata

Random process (stochastic process)

A collection of random variables indexed by time (or some set)

MDP Types

Tabular Ergodic MDP

1. Markov Process

New twist - don’t know R and T (different to traditional MDP) → (Step 1)

→ Must actually try actions and states out to learn

A set of states s ∈S

A set of actions (per state) A

A model T(s,a,s’) (probability)

A reward function R(s,a,s’)

Bellman equation - do not need to know environment model use action -value function

2. Markov Reward Process (1. with reward)

MDP is Markov Decision Process and tuple of states, actions, station probability matrix (probability to go certain state to another state by certain action), reward function, discounted factor

3. Markov Decision Process (2. with action)

\tau = (s_1, a_1, \dots, s_T, a_T)

OpenRL - 강화학습 그리고 OpenAI - 2: Intro to Reinforcement Learning (1) MDP &Value Function

2016.07.12 Leewoongwon Reinforcement Learning 그리고 OpenAI <Contents> 1. Introduction to OpenAI 2-1. Intro to Reinforcement Learning (1) MDP & Value Function 2-2. Intro to Reinforcement Learning (2) Q Learning 3-1. CartPole with Deep Q Learning (1) CartPole example 3-2. CartPole with Deep Q Learning (2) DQN(Deep Q-Networks) 3-3.

http://www.modulabs.co.kr/RL_library/2136

arxiv.org

https://arxiv.org/pdf/2310.08833.pdf

Markov Decision Process

MDP, Stochastic
Finite State Automata

1. Markov Process

2. Markov Reward Process (1. with reward)

3. Markov Decision Process (2. with action)

Backlinks

Recommendations

Markov Decision Process

MDP, Stochastic Finite State Automata

1. Markov Process

2. Markov Reward Process (1. with reward)

3. Markov Decision Process (2. with action)

Backlinks

Recommendations

MDP, Stochastic
Finite State Automata