Behavior Cloning

Creator
Creator
Alan JoAlan Jo
Created
Created
2024 Jan 8 9:19
Editor
Editor
Alan JoAlan Jo
Edited
Edited
2024 Apr 27 10:10
Refs
Refs

Simplest imitation learning

Train policy suing supervised learning using data (reward and next state is not used for training)
Can’t treat
Compounding Error
DAgger(data aggregation) is efficient to learning but hard to get expert data in real time (
Online RL
)
 
 

Limitations

  • Compounding errors
  • Multimodal demonstration data
  • Mismatch in observability between expert & agent
 
 
 

Recommendations