Diversity is all you need
Mutual information between skills and states can be maximized by maximizing the below reward. (MI-based skill discovery)
Diversity-promoting reward function
We want to depend skills and desired state each other by Mutual information.
Reward states that are unlikely for other with .
The thing one to achieve by diverse space is make probability difference for probability.
Problems
Mutual information is skill invariant so naively maximizing mutual information cannot encourage dynamic states.
전자는 uniform p(z)를 사용하면 자동으로 높아지고, 후자는 reward function 설정으로 훈련에 따라 높아짐