
Vision Language Model augmented with additional behavior tokens
- Decoder as Policy
Every 128 steps, OmniJARVIS is forced to reason again and produce new behavior tokens with the latest observation.

arxiv.org
https://arxiv.org/pdf/2407.00114
Seonglae Cho
Seonglae Cho
