Markov Decision Process

Editor
Editor
Alan JoAlan Jo
Creator
Creator
Alan JoAlan Jo
Created
Created
2023 Mar 5 6:47
Edited
Edited
2024 Apr 23 14:43

MDP, Stochastic
FSM

  • Random process (stochastic process)
  • A collection of random variables indexed by time (or some set)
 

1. Markov Process

notion image

Markov Property

  • The future is independent of the past given the present
  • The state captures all relevant information from the history
 
Reinforcement Learning is process of resolving MDP
New twist - don’t know R and T (different to traditional MDP) → (Step 1)
→ Must actually try actions and states out to learn
  • A set of states s ∈S
  • A set of actions (per state) A
  • A model T(s,a,s’) (probability)
  • A reward function R(s,a,s’)
Bellman equation - do not need to know environment model use action -value function
 
 

2. Markov Reward Process (1. with reward)

MDP is Markov Decision Process and tuple of states, actions, station probability matrix (probability to go certain state to another state by certain action), reward function, discounted factor
notion image
 
 

3. Markov Decision Process (2. with action)

notion image
notion image
 
 
 
 
 
 

Recommendations