Reality에서 Environment 정보는 물리적으로 agent 에게 없으므로 무조건 pixel based 방향성은 맞다
Pixel based representation’s sample efficiency is bad.
Embed trajectory could be very hard since it is heavy
Representation learning for RL
Usually state representation is task-agnostic like Model based RL
- CURL
- RAD (Data augmentation tricks enables generating a lot of diversity in limited dataset) == DrQ (concurrent work)
- Random crop worked as the best (translation + windowing) and translation is important which results in translation invariance.
- RAD improved Pixel SAC better than State SAC.
- Is there any other method sample efficiency
- SADA (geometric and photometric)
Pixel Data Augmentation for RL
Pixel based RLs