AirVLA

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Apr 9 14:18
Editor
Edited
Edited
2026 Apr 9 14:21
Refs
Refs
Transferring an existing Vision-Language-Action (VLA) foundation model to aerial platforms such as drones remains an open challenge due to a fundamental dynamics mismatch. Unlike quasi-static control on fixed-base robots, quadrotors are underactuated systems in which thrust and attitude are tightly coupled, so even small control errors can lead to large attitude deviations. This paper investigates whether a VLA foundation model pre-trained for manipulation can be transferred to an aerial manipulator.
We introduce a Payload-Aware Guidance mechanism that injects physical constraints into the flow-matching sampling process. The conditional action distribution of is defined via a velocity field . Starting from initial noise , we integrate the ODE to generate an action chunk . The guidance distribution is defined as , balancing the policy prior with physical constraints. Concretely, the payload guidance loss is , where is a payload-confidence value computed from the gripper-command history and current gripper state. The guided velocity field is modified to , enabling inference-time physical rewards without re-training the model weights.
 
 
 
arxiv.org
AirVLA
AirVLA.
 
 

Recommendations