Decision Transformer

Creator
Creator
Seonglae Cho
Created
Created
2024 Mar 8 5:53
Editor
Edited
Edited
2025 Mar 6 16:30
Refs
Refs

RL tasks can be solved with transformer sequence modelling

Tokenizes states, actions, and rewards but emphasizes accurately predicting future actions from these sequences
  • Return conditioned policy can be used for policy rollout
  1. desired return R0R_0
  1. subtract reward
notion image
Works well for long horizon and sparse task compared to traditional RL
Does success of Mar false approach
  • More supervision and MDP can be differ per task
 
 
 
NeuralPS 2021
synthetic data training
 
 
 

 

Recommendations