Reward model

Reward Learning (Without reward model), Inverse reinforcement learning

It learns to predict rewards based on data without explicit specification of the

Reward function

Reward function A model that learns to predict rewards based on data without explicit specification

Reward function design, Reward can’t taken for granted

Reward function could be exploited because of

AI Reward Hacking or

Wireheading so

GAIL update reward model from a new explored dataset.

In cases where the situation is simple, it's easy to calculate using Domain knowledge and changes in action and state. (like if goal is distance → reward is velocity) But as it gets a little more complicated, the reward function becomes incredibly complex.

So, how can we reason about what an expert is trying to achieve? Deriving the intent of others is a key ability for humans. Than we can devise just binary classify with positive example and negative example.