Markov Decision Process

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Mar 5 6:47
Editor
Edited
Edited
2024 Oct 21 10:58
Refs
Refs

MDP, Stochastic
FSM

  • Random process (stochastic process)
  • A collection of random variables indexed by time (or some set)
 

1. Markov Process

notion image

Markov Property

  • The future is independent of the past given the present
  • The state captures all relevant information from the history
 
Reinforcement Learning is process of resolving MDP
New twist - don’t know R and T (different to traditional MDP) → (Step 1)
→ Must actually try actions and states out to learn
  • A set of states s ∈S
  • A set of actions (per state) A
  • A model T(s,a,s’) (probability)
  • A reward function R(s,a,s’)
Bellman equation - do not need to know environment model use action -value function
 
 

2. Markov Reward Process (1. with reward)

MDP is Markov Decision Process and tuple of states, actions, station probability matrix (probability to go certain state to another state by certain action), reward function, discounted factor
notion image
 
 

3. Markov Decision Process (2. with action)

notion image
notion image
 
 
 
 
 
 

Recommendations