Controllability-aware Skill Discovery
We always Seek new things
Learn what are easy-to-control states and hard-to-control states → More rewarded when changing hard-to-control states (low probability and large distance in skill space)
Shrink the skill space to less reward for easy-to-control states. (less reward for easily controllable skills)
LSD with log probability
Learn what are easy-to-control states and hard-to-control states by
- hard transition → small p(s′ ∣ s) → large ∥ϕ(s′) − ϕ(s)∥
- easy transition → high p(s′ ∣ s) → small ∥ϕ(s′) − ϕ(s)∥
To explain it more simply, this means we only do LSD with data that satisfies the above condition - for easy cases, we only allow those that are close in z-space, while for difficult cases, we allow those that are far away in z-space

Controllability-Aware Unsupervised Skill Discovery
One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision. However, the current unsupervised skill discovery methods are often limited...
https://arxiv.org/abs/2302.05103


Seonglae Cho