Reinforcement Learning Term

Editor

Editor

Seonglae Cho

Creator

Creator

Seonglae Cho

Created

Created

2023 Mar 5 6:49

Edited

Edited

2025 Feb 9 11:46

Refs

Refs

Structure

Agent, Environment, Action, State, Reward

Basic idea - random action → check reward → observation given state → learn → make policy

Receive feedback in the form of rewards

Agent’s utility is defined by the reward function

Must (learn to) act so as to maximize expected rewards

All learning is based on observed samples of outcomes

Reinforcement Learning Terms

States

environment state - environment representation

agent state - agent representation - most used state

information state (Markov state) - probability from start to this state = probability from previous state to this state → state is Markov (independant)

Others

history - sequence of observation, action, reward

value iteration

notion image

policy iteration

MDP - policy and value

notion image

Both value iteration and policy iteration compute the same thing (all optimal values)

Recommendations

//////