Controllability-aware Skill Discovery
We always Seek new things
Learn what are easy-to-control states and hard-to-control states → More rewarded when changing hard-to-control states (low probability and large distance in skill space)
Shrink the skill space to less reward for easy-to-control states. (less reward for easily controllable skills)
LSD with log probability
Learn what are easy-to-control states and hard-to-control states by
- hard transition → small p(s′ ∣ s) → large ∥ϕ(s′) − ϕ(s)∥
- easy transition → high p(s′ ∣ s) → small ∥ϕ(s′) − ϕ(s)∥
To explain it more simply, this means we only do LSD with data that satisfies the above condition - for easy cases, we only allow those that are close in z-space, while for difficult cases, we allow those that are far away in z-space