Transition Policy

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jun 18 14:54
Editor
Edited
Edited
2024 Jun 18 14:54
Refs
Refs
Naive fine-tuning of skills fails since the skills end up with very large terminal state distributions.
A reward for learning a transition policy is a success of following policy. A proximity prediction for proximity reward instead of binary reward estimates better on how close to good initial states. The paper defines proximity as and like discounting factor and provide proximity reward every step . The paper learns proximity predictor by success buffer and failure buffer.
 
 
 
 
 
 
 
 
 

Recommendations