Dreamer

The very first working Dyna-style model-based RL implementation on pixel with data augmentation using imaginary rollout.

They reconstruct original observation using a decoder to catch all details unlike

TD-MPC so it is slower. Any on-policy algorithm can be used to train actor/critic.

First case that model-based RL could be successful.

World Models for Physical Robot Learning

처음으로 인간 데이터 없이

Minecraft AI 다이아몬드 캠

학습하는 월드 모델의 구조와 목적 함수에 있습니다. DreamerV3는 변형된 목적 함수를 사용하여 효과적으로 동작할 수 있는 정책을 학습

월드 모델은 복잡한 환경에서 감각 입력의 압축 표현을 학습하고, 가능한 행동에 대한 미래의 표현과 보상을 예측

policy가 계획을 가능하게 한다