ASAP method

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 9 11:22
Editor
Edited
Edited
2025 Feb 22 16:19
Refs
Refs

Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills

  1. Pre-train motion tracking policies in simulation
  1. Deploy the policies in the real world and collect real-world data to train a delta (residual) action model that compensates for the
    Dynamics Mismatch
  1. Then ASAP fine-tunes pre-trained policies with the delta action model integrated into the simulator

Method components

  • PPO
    based Delta Action Learning with Asymmetric Actor-Critic framework
    • Humanoid control is inherently a
      POMDP
      • critic network has access to privileged information such as the global positions of the reference motion and the root linear velocity
      • actor network relies solely on proprioceptive inputs and a time-phase variable
    • This design not only enhances phase-based motion tracking during training but also enables a simple, phase driven motion goal for sim-to-real transfer

Robust Initialization

  • Reference State Initialization from random initial states for robustness
  • Termination Curriculum of Tracking Tolerance that allows flexible training by permitting larger error tolerance in the initial training stage

Fine tuning policy

DeltaDynamics
corrects delta using st+1=fsim(st,at)+fΔ(st,at)s_{t+1} = f_{sim}(s_t, a_t) + f_\Delta(s_t, a_t) while
ASAP method
tunes action based on st+1=fsim(st,at+πΔ(st,at))s_{t+1} = f_{sim}(s_t, a_t + \pi_\Delta(s_t, a_t)). DeltaDynamics compensates residual dynamics while ASAP treats action directly which is effective to reduce
Compounding Error
.

ASAP
LeCAR-LabUpdated 2025 Feb 22 13:55
evaluates three transfer scenarios

  • IsaacGym to IsaacSim
  • IsaacGym to Genesis
  • IsaacGym to the real-world Unitree G1 humanoid
 
 
 
 
 
 

Recommendations