Controllability-aware Skill Discovery
We always Seek new things
Learn what are easy-to-control states and hard-to-control states → More rewarded when changing hard-to-control states (low probability and large distance in skill space)
Shrink the skill space to less reward for easy-to-control states. (less reward for easily controllable skills)
LSD with log probability
Learn what are easy-to-control states and hard-to-control states by
- hard transition → small p(s′ ∣ s) → large ∥ϕ(s′) − ϕ(s)∥
- easy transition → high p(s′ ∣ s) → small ∥ϕ(s′) − ϕ(s)∥
풀어서 설명하면, 위 조건을 만족하는 데이터로만 LSD 한다는 건데, 쉬운 놈은 z space 에서 가까운 것만 어용해주고 어려운 놈은 z space 에서 먼놈도 허용해준다는 말