DIAYN

Created
Created
2024 May 22 1:29
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2024 Jun 18 13:32
Refs
Refs
Diversity is all you need
Mutual information between skills and states can be maximized by maximizing the below reward. (MI-based skill discovery)

Diversity-promoting reward function

We want to depend skills and desired state each other by
Mutual information
.
Reward states that are unlikely for other with .
The thing one to achieve by diverse space is make probability difference for probability.

Problems

Mutual information is skill invariant so naively maximizing mutual information cannot encourage dynamic states.
notion image
전자는 uniform p(z)를 사용하면 자동으로 높아지고, 후자는 reward function 설정으로 훈련에 따라 높아짐
 
 
 
 
 
 

Recommendations