Inappropriate generalization of reward signals leading to inaccurate decisions
A phenomenon where the reward function learned during training is misinterpreted when applied to new situations, leading to unintended behaviors that deviate from the original objectives