DIAYN

Mutual information between skills and states can be maximized by maximizing the below reward. (MI-based skill discovery)

We want to depend skills and desired state each other by

Reward states that are unlikely for other with .

The thing one to achieve by diverse space is make probability difference for probability.

Mutual information is skill invariant so naively maximizing mutual information cannot encourage dynamic states.

전자는 uniform p(z)를 사용하면 자동으로 높아지고, 후자는 reward function 설정으로 훈련에 따라 높아짐