Decision Transformer

Creator

Creator

Created

Created

2024 Mar 8 5:53

Editor

Editor

Edited

Edited

2025 Mar 6 16:30

Refs

Refs

RL tasks can be solved with transformer sequence modelling

Tokenizes states, actions, and rewards but emphasizes accurately predicting future actions from these sequences

Compute
Reward to Go and train trajectory

Return conditioned policy can be used for policy rollout

desired return $R_0$

subtract reward

notion image

Works well for long horizon and sparse task compared to traditional RL

Does success of Mar false approach

More supervision and MDP can be differ per task

NeuralPS 2021

decision-transformer

kzl • Updated 2025 Mar 6 6:28

Decision Transformer: Reinforcement Learning via Sequence Modeling

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in…

Decision Transformer: Reinforcement Learning via Sequence Modeling

https://ar5iv.labs.arxiv.org/html/2106.01345

Decision Transformer: Reinforcement Learning via Sequence Modeling

synthetic data training

https://arxiv.org/pdf/2310.00771

Recommendations

///////