CSD

Controllability-aware Skill Discovery

We always Seek new things

Learn what are easy-to-control states and hard-to-control states → More rewarded when changing hard-to-control states (low probability and large distance in skill space)

Shrink the skill space to less reward for easy-to-control states. (less reward for easily controllable skills)

LSD with log probability

Learn what are easy-to-control states and hard-to-control states by

-\log p(s'|s)

||\phi(s') - \phi(s)|| \le -\log p(s'|s)

hard transition → small p(s′ ∣ s) → large ∥ϕ(s′) − ϕ(s)∥

easy transition → high p(s′ ∣ s) → small ∥ϕ(s′) − ϕ(s)∥

To explain it more simply, this means we only do LSD with data that satisfies the above condition - for easy cases, we only allow those that are close in z-space, while for difficult cases, we allow those that are far away in z-space

Controllability-Aware Unsupervised Skill Discovery

One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision. However, the current unsupervised skill discovery methods are often limited...

https://arxiv.org/abs/2302.05103

CSD

Controllability-aware Skill Discovery

We always Seek new things

LSD with log probability

Recommendations