Hi Robot

A Hierarchical VLA that divides a VLM (understanding location, images, and language) into two layers—high-level (reasoning) and low-level (action)—to enable robots to perform complex instructions

High-Level VLM

Low level VLA
Pi 0

twisting and error accumulation; low-level action stage failure patterns include proximity bias leading to incorrect grasps, with a tendency to grab nearby objects more frequently

arxiv.org

https://arxiv.org/pdf/2502.19417

Hi Robot

Recommendations