VLA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 27 14:52
Editor
Edited
Edited
2025 Dec 16 14:6

Vision Language Action Model

Should have action decoder or low-level vector-based controlling. If VLA generates just a high-level sequence, it is VLM not VLA
Vision Language Action Models
 
 
 
Vision Language Action Notion
 
 
 
 

Steering

VLA's Transformer FFN neurons still maintain semantic concepts like slow, fast, up. By selectively activating these neurons (activation steering) → robot behavior can be adjusted in real-time without fine-tuning, rewards, or environment interaction. In both simulation (OPENVLA, LIBERO) and real robots (
UR5
,
Pi 0
) → behavioral characteristics like speed and movement height change in a zero-shot manner. Semantic-based neuron intervention is more effective than prompt modification or random intervention. VLAs maintain interpretable semantic structures internally, which can be directly manipulated to control robot behavior transparently and immediately.
 
 

Recommendations