Reinforcement Learning

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2024 Oct 25 22:30

Map situation to action by numeric reward signal for policy model

Sequential decision making problems (sequential decision making is everywhere)
Thus, unlike supervised & unsupervised
iid
, RL consider
Compounding Error
indirect supervision: 어떤 환경 안에서 정의된 에이전트가 현재의 상태를 인식하여, 선택 가능한 행동들 중 보상을 최대화
We get a world state by interacting with world (perception)
Expected return of a policy is the expected return over all possible trajectories
  • Approach for learning decision making
Reinforcement Learning Notion
 
 
Reinforcement Learning Usages
 
 
 

OpenAI

CS285
 
 

Recommendations