Reward Learning (Without reward model), Inverse reinforcement learning
Reward function design, Reward can’t taken for granted
Reward function could be exploited because of AI Reward Hacking or Wireheading so GAIL update reward model from a new explored dataset.
In cases where the situation is simple, it's easy to calculate using Domain knowledge and changes in action and state. (like if goal is distance → reward is velocity) But as it gets a little more complicated, the reward function becomes incredibly complex.
So, how can we reason about what an expert is trying to achieve? Deriving the intent of others is a key ability for humans. Than we can devise just binary classify with positive example and negative example.
- Pre-defined reward function can be considered as hand-engineered, handcrafted, hard-coded or AI generated skills
- Imitation learning of offline RL could be considered as learning skills from data
- Learning reward from goals & demos
- Learning reward from human or AI preferences
- How about zero supervision
- unsupervised RL
- skill discovery
Reward model Notion
Reward Models