Kyunghyun Cho
Sergey Levine
- robot foundation models
rlpd
- noneed soft actor critic
- rl for specialist policies
rldg
- distilling rl into openvla
- training on data from rl specialists gives us a better generalist
- can the speicalists also benefit from the generalist
- adding rlt rl tokens to robot foundation models
- cfg and ppo is almost same?
- cfgrl with o optimality variable
- score function
- good/bad data and data during all rl training or just final policy
- condition the policy on optimality indicator
takeaway
- rl is the key to lifelong long learning of robot foundationmodels
- directly training end to end rl is hard
David Bau
brain science dream is patching experiment
- transplant
- where does this related - patching - where or when does it infer
- what is being influenced - probing - what concepts does it use its decision
- how can we manipulate - steering - can we directly change its thoughts
mechanistic interpretaability is 도약판 mind the gap. fill in the middle for ai's reality konweldge gap for jump up
현실 → 자연어 → 이해
But why
- to debug audit and improve the models
- are all the last layers of sequence reasoning needed
- to customize the models beyond their trianing
- can we interpolate conditions
- to discover new science/discovery
- what does the ai know that scientist/human do not yet know
Yoshua Bengio
Azalia Mirhoseini
- test time scaling
Seonglae Cho