Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills
- Pre-train motion tracking policies in simulation
- Deploy the policies in the real world and collect real-world data to train a delta (residual) action model that compensates for the Dynamics Mismatch
- Then ASAP fine-tunes pre-trained policies with the delta action model integrated into the simulator
Method components
- PPO based Delta Action Learning with Asymmetric Actor-Critic framework
- Humanoid control is inherently a POMDP
- critic network has access to privileged information such as the global positions of the reference motion and the root linear velocity
- actor network relies solely on proprioceptive inputs and a time-phase variable
- This design not only enhances phase-based motion tracking during training but also enables a simple, phase driven motion goal for sim-to-real transfer
Robust Initialization
- Reference State Initialization from random initial states for robustness
- Termination Curriculum of Tracking Tolerance that allows flexible training by permitting larger error tolerance in the initial training stage
- Domain Randomization for pretraining
Fine tuning policy
DeltaDynamics corrects delta using while ASAP method tunes action based on . DeltaDynamics compensates residual dynamics while ASAP treats action directly which is effective to reduce Compounding Error.
ASAPLeCAR-Lab • Updated 2025 Feb 22 13:55 evaluates three transfer scenarios
ASAP
LeCAR-Lab • Updated 2025 Feb 22 13:55
- IsaacGym to IsaacSim
- IsaacGym to Genesis
- IsaacGym to the real-world Unitree G1 humanoid