Filtered Behavioral Cloning

Creator
Creator
Seonglae Cho
Created
Created
2024 May 1 1:45
Editor
Edited
Edited
2024 May 31 3:13

Can we never estimate values on OOD actions? (BC)

Problem of Filtered BC is that the BC treats transitions in a trajectory equally with using reward only in binary way (good or bad). The solution is to imitate only good trajectories.
10% filtered BC outperforms almost any methods
 
 
 
 
 
 

Recommendations