Decision Transformer

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 8 5:53
Editor
Edited
Edited
2024 May 31 3:33
Refs
Refs
Tokenizes states, actions, and rewards but emphasizes accurately predicting future actions from these sequences
  • Return conditioned policy can be used for policy rollout
  1. desired return
  1. subtract reward
notion image
 
 
Works well for long horizon and sparse task compared to traditional RL
Does success of Mar false approach
  • More supervision and MDP can be differ per task
 
 
 
 
 
 
NeuralPS 2021
 
 

Recommendations