TLI

TLI (Text Latent Interpolation) to construct Task Latent

Analyzes why trained tasks cannot be combined to perform new tasks (e.g., "placing cream cheese on a bowl") and proposes methods to enable this by manipulating internal representations

Analysis revealed that VLAs do not actually understand object semantics, but instead create fixed associations (spatial overfitting) between object names and their locations in training scenes. In other words, the term "cream cheese" refers not to actual cream cheese, but to "whatever object was at that location during training."

arxiv.org

https://arxiv.org/pdf/2505.03500

TLI

TLI (Text Latent Interpolation) to construct Task Latent

Recommendations