Gradient based learning

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Jan 7 14:19
Editor
Edited
Edited
2026 Jan 10 0:24
Refs

GBP

  • Rollout some policy (e.g. random policy) for H steps with model
  • Back propagate through model with the objective
  • Gradient ascent to get better
  1. Version 1: Guess & check (random shooting)
    1. Sample random actions and choose action based on
  1. Version 2:
    CEM Iteration
 
 
 
 
 
gradient-based planning is weak because the planner's actions/trajectories deviate from the training distribution, causing the world model to fail and resulting in poor gradients → By creating training data from 'the distribution/worst-case scenarios that GBP will encounter' and finetuning the world model accordingly, GBP can achieve
CEM Iteration
-level performance at a much lower cost."
Online World Modeling (OWM)
  • Generate actions with GBP → rollout those actions in the actual simulator to correct states → retrain the world model with the corrected trajectory
  • Effect: Include OOD latent regions encountered during planning in the training distribution to reduce long-horizon error accumulation.
Adversarial World Modeling (AWM)
  • Create FGSM adversarial perturbations on state/action from expert data (in the direction that maximizes model loss) and train with them
  • Effect: Make the world model's input gradient/induced planning landscape smoother, so GBP is less stuck in local minima/flat regions. (And this is possible without a simulator)
 
 

Recommendations