Reinforcement Learning Term

Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Mar 5 6:49
Edited
Edited
2024 Apr 19 3:9
Refs
Refs

Structure

  • Agent, Environment, Action, State, Reward
Basic idea - random action → check reward → observation given state → learn → make policy
  • Receive feedback in the form of rewards
  • Agent’s utility is defined by the reward function
  • Must (learn to) act so as to maximize expected rewards
  • All learning is based on observed samples of outcomes
Reinforcement Learning Terms

States

  • environment state - environment representation
  • agent state - agent representation - most used state
  • information state (Markov state) - probability from start to this state = probability from previous state to this state → state is Markov (independant)
 

Others

  • history - sequence of observation, action, reward
  • value iteration
    • notion image
  • policy iteration
  • MDP - policy and value
    • notion image
 
 
 
 
Both value iteration and policy iteration compute the same thing (all optimal values)
 
 
 
 
 
 

Recommendations