MDP, Stochastic FSM
- Random process (stochastic process)
- A collection of random variables indexed by time (or some set)
1. Markov Process
Markov Property
- The future is independent of the past given the present
- The state captures all relevant information from the history
Reinforcement Learning is process of resolving MDP
New twist - don’t know R and T (different to traditional MDP) → (Step 1)
→ Must actually try actions and states out to learn
- A set of states s ∈S
- A set of actions (per state) A
- A model T(s,a,s’) (probability)
- A reward function R(s,a,s’)
Bellman equation - do not need to know environment model use action -value function
2. Markov Reward Process (1. with reward)
MDP is Markov Decision Process and tuple of states, actions, station probability matrix (probability to go certain state to another state by certain action), reward function, discounted factor