Decision Transformer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 8 5:53
Editor
Edited
Edited
2024 Oct 31 11:10
Refs
Refs

RL tasks can be solved with transformer sequence modelling

Tokenizes states, actions, and rewards but emphasizes accurately predicting future actions from these sequences
  • Return conditioned policy can be used for policy rollout
  1. desired return
  1. subtract reward
notion image
Works well for long horizon and sparse task compared to traditional RL
Does success of Mar false approach
  • More supervision and MDP can be differ per task
 
 
 
NeuralPS 2021
 
 

Recommendations