Behavior Cloning

Creator
Creator
Seonglae Cho
Created
Created
2024 Jan 8 9:19
Editor
Edited
Edited
2024 Apr 27 10:10
Refs
Refs

Simplest imitation learning

Train policy πθ(as)\pi_\theta(a|s) suing supervised learning using data (reward and next state is not used for training)
D={(s0,aa),(s1,a1),}D = \{(s_0, a_a), (s_1, a_1),\dots\}minθE(s,a)[logπθ(as)]min_\theta E_{(s,a )}[log \pi_\theta (a|s)]
Can’t treat
Compounding Error
DAgger(data aggregation) is efficient to learning but hard to get expert data in real time (
Online Learning
)
 
 

Limitations

  • Compounding errors
  • Multimodal demonstration data
  • Mismatch in observability between expert & agent
 
 
 

Recommendations