AirVLA

Transferring an existing Vision-Language-Action (VLA) foundation model to aerial platforms such as drones remains an open challenge due to a fundamental dynamics mismatch. Unlike quasi-static control on fixed-base robots, quadrotors are underactuated systems in which thrust and attitude are tightly coupled, so even small control errors can lead to large attitude deviations. This paper investigates whether a VLA foundation model pre-trained for manipulation can be transferred to an aerial manipulator.

We introduce a Payload-Aware Guidance mechanism that injects physical constraints into the flow-matching sampling process. The conditional action distribution of is defined via a velocity field . Starting from initial noise , we integrate the ODE to generate an action chunk . The guidance distribution is defined as , balancing the policy prior with physical constraints. Concretely, the payload guidance loss is , where is a payload-confidence value computed from the gripper-command history and current gripper state. The guided velocity field is modified to , enabling inference-time physical rewards without re-training the model weights.

arxiv.org

https://arxiv.org/pdf/2603.25038

AirVLA

AirVLA.

https://airvla.github.io/

AirVLA

Recommendations