Guiding exploration through high intrinsic rewards for unexperienced states or changes (e.g., unexpected events) in RL Exploration. Learning exploration methods (exploration policies) across various environments through intrinsic rewards. These learned exploration policies can then be applied directly to new environments through Transfer Learning
In complex environments with sparse rewards, it efficiently explores to capture important states and changes even in situations with rare external rewards, recording higher extrinsic rewards than the standard version. However, in simple environments, exploration behavior becomes excessively induced, leading to actions that don't align with goals, resulting in convergence to suboptimal policies with low extrinsic rewards.
2021 nips