ICLR 2026 Workshop first day

Date

Date

2026 Apr 26 0:0

Created by

Created by

Seonglae Cho

Created time

Created time

2026 Apr 26 12:16

Last edited by

Last edited by

Seonglae Cho

Last edited time

Last edited time

2026 Apr 26 19:7

Refs

Refs

Kyunghyun Cho

Sergey Levine

robot foundation models

rlpd

noneed soft actor critic

rl for specialist policies

rldg

distilling rl into openvla

training on data from rl specialists gives us a better generalist

can the speicalists also benefit from the generalist

adding rlt rl tokens to robot foundation models

cfg and ppo is almost same?

cfgrl with o optimality variable

score function

good/bad data and data during all rl training or just final policy

condition the policy on optimality indicator

takeaway

rl is the key to lifelong long learning of robot foundationmodels

directly training end to end rl is hard

David Bau

brain science dream is patching experiment

transplant

where does this related - patching - where or when does it infer

what is being influenced - probing - what concepts does it use its decision

how can we manipulate - steering - can we directly change its thoughts

mechanistic interpretaability is 도약판 mind the gap. fill in the middle for ai's reality konweldge gap for jump up

현실 → 자연어 → 이해

But why

to debug audit and improve the models

are all the last layers of sequence reasoning needed

to customize the models beyond their trianing

can we interpolate conditions

to discover new science/discovery

what does the ai know that scientist/human do not yet know

Yoshua Bengio

Azalia Mirhoseini

test time scaling

Recommendations

/