Map situation to action by numeric reward signal for policy model
Sequential decision making problems (sequential decision making is everywhere)
Thus, unlike supervised & unsupervised iid, RL consider Compounding Error
indirect supervision: 어떤 환경 안에서 정의된 에이전트가 현재의 상태를 인식하여, 선택 가능한 행동들 중 보상을 최대화
We get a world state by interacting with world (perception)
Expected return of a policy is the expected return over all possible trajectories
- Approach for learning decision making
Reinforcement Learning Notion
Reinforcement Learning Usages
OpenAI
CS285