TLI (Text Latent Interpolation) to construct Task Latent
Analyzes why trained tasks cannot be combined to perform new tasks (e.g., "placing cream cheese on a bowl") and proposes methods to enable this by manipulating internal representations
Analysis revealed that VLAs do not actually understand object semantics, but instead create fixed associations (spatial overfitting) between object names and their locations in training scenes. In other words, the term "cream cheese" refers not to actual cream cheese, but to "whatever object was at that location during training."

Seonglae Cho