Loading views...

ICLR 2026 Workshop first day

Date
Date
2026 Apr 26 0:0
Created by
Created by
Seonglae ChoSeonglae Cho
Created time
Created time
2026 Apr 26 12:16
Last edited by
Last edited by
Seonglae ChoSeonglae Cho
Last edited time
Last edited time
2026 Apr 26 19:7
Refs
Refs

Kyunghyun Cho

 

Sergey Levine

  • robot foundation models

rlpd

  • noneed soft actor critic
  • rl for specialist policies

rldg

  • distilling rl into openvla
  • training on data from rl specialists gives us a better generalist
  • can the speicalists also benefit from the generalist
    • adding rlt rl tokens to robot foundation models
  • cfg and ppo is almost same?
  • cfgrl with o optimality variable
    • score function
  • good/bad data and data during all rl training or just final policy
  • condition the policy on optimality indicator
takeaway
  • rl is the key to lifelong long learning of robot foundationmodels
  • directly training end to end rl is hard

David Bau

brain science dream is patching experiment
  • transplant
  1. where does this related - patching - where or when does it infer
  1. what is being influenced - probing - what concepts does it use its decision
  1. how can we manipulate - steering - can we directly change its thoughts
 
 
mechanistic interpretaability is 도약판 mind the gap. fill in the middle for ai's reality konweldge gap for jump up
현실 → 자연어 → 이해

But why

  • to debug audit and improve the models
    • are all the last layers of sequence reasoning needed
  • to customize the models beyond their trianing
    • can we interpolate conditions
  • to discover new science/discovery
    • what does the ai know that scientist/human do not yet know

Yoshua Bengio

 

Azalia Mirhoseini

  • test time scaling
 
 
 
 

Recommendations