Simplest imitation learning
Train policy suing supervised learning using data (reward and next state is not used for training)
Can’t treat Compounding Error
DAgger(data aggregation) is efficient to learning but hard to get expert data in real time (Online RL)
Limitations
- Compounding errors
- Multimodal demonstration data
- Mismatch in observability between expert & agent