ASAP method

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 9 11:22
Editor
Edited
Edited
2025 Feb 22 16:19
Refs
Refs

Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

  1. Pre-train motion tracking policies in simulation
  1. Deploy the policies in the real world and collect real-world data to train a delta (residual) action model that compensates for the
    Dynamics Mismatch
  1. Then ASAP fine-tunes pre-trained policies with the delta action model integrated into the simulator

Method components

  • PPO
    based Delta Action Learning with Asymmetric Actor-Critic framework
    • Humanoid control is inherently a
      POMDP
      • critic network has access to privileged information such as the global positions of the reference motion and the root linear velocity
      • actor network relies solely on proprioceptive inputs and a time-phase variable
    • This design not only enhances phase-based motion tracking during training but also enables a simple, phase driven motion goal for sim-to-real transfer

Robust Initialization

  • Reference State Initialization from random initial states for robustness
  • Termination Curriculum of Tracking Tolerance that allows flexible training by permitting larger error tolerance in the initial training stage

Fine tuning policy

DeltaDynamics
corrects delta using while
ASAP method
tunes action based on . DeltaDynamics compensates residual dynamics while ASAP treats action directly which is effective to reduce
Compounding Error
.

ASAP
LeCAR-LabUpdated 2025 Feb 22 13:55
evaluates three transfer scenarios

  • IsaacGym to IsaacSim
  • IsaacGym to Genesis
  • IsaacGym to the real-world Unitree G1 humanoid
 
 
 
 
 
 

Recommendations