Transition Policy

Creator

Seonglae Cho

Created

2024 Jun 18 14:54

Editor

Seonglae Cho

Edited

2024 Jun 18 14:54

Refs

Naive fine-tuning of skills fails since the skills end up with very large terminal state distributions.

A reward for learning a transition policy is a success of following policy. A proximity prediction for proximity reward instead of binary reward estimates better on how close to good initial states. The paper defines proximity as and like discounting factor and provide proximity reward every step . The paper learns proximity predictor by success buffer and failure buffer.

Recommendations

//////////