Reward model

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 17 3:6
Editor
Edited
Edited
2024 Jun 29 13:18
Refs

Reward Learning (Without reward model), Inverse reinforcement learning

Reward function design, Reward can’t taken for granted

Reward function could be exploited because of
AI Reward Hacking
or
Wireheading
so
GAIL
update reward model from a new explored dataset.
In cases where the situation is simple, it's easy to calculate using Domain knowledge and changes in action and state. (like if goal is distance → reward is velocity) But as it gets a little more complicated, the reward function becomes incredibly complex.
So, how can we reason about what an expert is trying to achieve? Deriving the intent of others is a key ability for humans. Than we can devise just binary classify with positive example and negative example.
  • Pre-defined reward function can be considered as hand-engineered, handcrafted, hard-coded or AI generated skills
  • Imitation learning of offline RL could be considered as learning skills from data
  1. Learning reward from goals & demos
  1. Learning reward from human or AI preferences
  1. How about zero supervision
    1. unsupervised RL
    2. skill discovery
Reward model Notion
 
 
Reward Models
 
 
 
 
 
 

Recommendations