MDP, Stochastic Finite State Automata
- Random process (stochastic process)
- A collection of random variables indexed by time (or some set)
MDP Types
1. Markov Process

New twist - don’t know R and T (different to traditional MDP) → (Step 1)
→ Must actually try actions and states out to learn
- A set of states s ∈S
- A set of actions (per state) A
- A model T(s,a,s’) (probability)
- A reward function R(s,a,s’)
Bellman equation - do not need to know environment model use action -value function
2. Markov Reward Process (1. with reward)
MDP is Markov Decision Process and tuple of states, actions, station probability matrix (probability to go certain state to another state by certain action), reward function, discounted factor

3. Markov Decision Process (2. with action)


OpenRL - 강화학습 그리고 OpenAI - 2: Intro to Reinforcement Learning (1) MDP &Value Function
2016.07.12 Leewoongwon Reinforcement Learning 그리고 OpenAI <Contents> 1. Introduction to OpenAI 2-1. Intro to Reinforcement Learning (1) MDP & Value Function 2-2. Intro to Reinforcement Learning (2) Q Learning 3-1. CartPole with Deep Q Learning (1) CartPole example 3-2. CartPole with Deep Q Learning (2) DQN(Deep Q-Networks) 3-3.
http://www.modulabs.co.kr/RL_library/2136


Seonglae Cho