Reinforcement Learning

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2024 May 7 17:42

Map situation to action by numeric reward signal for policy model

Unlike supervised & unsupervised
iid
, RL consider
Compounding Error
어떤 환경 안에서 정의된 에이전트가 현재의 상태를 인식하여, 선택r 가능한 행동들 중 보상을 최대화
We get a world state by interacting with world (perception)
Expected return of a policy is the expected return over all possible trajectories
  • Sequential decision making problems (sequential decision making is everywhere)
  • Approach for learning decision making
Reinforcement Learning Notion
 
 
Reinforcement Learning Usages
 
 
 

OpenAI

CS285
 
 

Recommendations