Filtered Behavioral Cloning

Creator

Seonglae Cho

Created

2024 May 1 1:45

Editor

Seonglae Cho

Edited

2024 May 31 3:13

Refs

Behavior Cloning

Can we never estimate values on OOD actions? (BC)

Problem of Filtered BC is that the BC treats transitions in a trajectory equally with using reward only in binary way (good or bad). The solution is to imitate only good trajectories.

10% filtered BC outperforms almost any methods

Recommendations

//////////