Behavior Cloning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Jan 8 9:19
Editor
Edited
Edited
2024 Apr 27 10:10
Refs
Refs

Simplest imitation learning

Train policy suing supervised learning using data (reward and next state is not used for training)
Can’t treat
Compounding Error
DAgger(data aggregation) is efficient to learning but hard to get expert data in real time (
Online RL
)
 
 

Limitations

  • Compounding errors
  • Multimodal demonstration data
  • Mismatch in observability between expert & agent
 
 
 

Recommendations