CBET

Guiding exploration through high intrinsic rewards for unexperienced states or changes (e.g., unexpected events) in

RL Exploration. Learning exploration methods (exploration policies) across various environments through intrinsic rewards. These learned exploration policies can then be applied directly to new environments through

Transfer Learning

In complex environments with sparse rewards, it efficiently explores to capture important states and changes even in situations with rare external rewards, recording higher extrinsic rewards than the standard version. However, in simple environments, exploration behavior becomes excessively induced, leading to actions that don't align with goals, resulting in convergence to suboptimal policies with low extrinsic rewards.

2021 nips

proceedings.neurips.cc

https://proceedings.neurips.cc/paper/2021/file/abe8e03e3ac71c2ec3bfb0de042638d8-Paper.pdf

2025 nldl to

Model based RL such as

Dreamer V3,

IMPALA

World Model Agents with Change-Based Intrinsic Motivation

Sparse reward environments pose a significant challenge for reinforcement learning due to the scarcity of feedback. Intrinsic motivation and transfer learning have emerged as promising strategies...

https://openreview.net/forum?id=0io7gvXniL#discussion

CBET

Backlinks

Recommendations