ICLR 2026 Workshop first day

워크샵 첫날 일함 리스 데이. 메인만큼 나름 볼거 많긴 하다. 특히 첫 세르게이 르빈 강연은 첫 키노트급으로 혹은 더 기술적 구체적으로는 나에게 도움되었다. 조경현씨는 박사쪽으로 내가 지원 못하겠다 바얗ㅇ이 안맞는듯. 음식퀄은 사람이 줄어서인가 워크샵이 훨씬 맛있었고, david bau 랑 alan sun paper, 원격만 하는 조슈아 벤조 일함 oral mind the gap 등 나름 알찼다. 저쿤도 이날부터 와서 바로 다른 집으로 옮겻지만 마사지룸 1000hands soap 가 너무 웃겼다. 그리고 저녁전에 논문들 행사들 다 보고 나와서 크리스티안 집 완전 근처다해서 놀러간다. 아니근데 단지가 너무 좋고 한국 일본 감성이라 너무 맘에 든다. 온갖체육시설에 올라가는데 노을에 뷰가 호수 분수에 쩍벌어지게 최고였다. 그리고 저녁 먹으로 가는데 아쉬웠던 alan sun 에게서도 연락아서 같이 먹기로 한다. 이때 깨달은 다른 점은 역시 내가 당장 마음을 바꿔서 최적의 경로로 하고싶은 것을 다 하지못하더라도 충분히 아쉬움을 표하거나 이후에 행동으로 드러내면 어떻게던 이뤄질 확률이 매우 높아진다는 점이다. 첫날 춤추는 행사 마지막에도 결국 기회가 돌아왔고, alansun 과 만남도 마찬가지다. 언제나 바라고 의지를 표명하면 기회가 찾아오고 그것을 더 쉽게 낚아챌 수 있다는건 참 신기한 법칙같다. 누구에게는 설득되지 않을 말이지만 내가 이런 말들을 많이 들어봤다는건 꽤나 유용하기 때문이거나 돌아봤을 때 자주 보이는 패턴이어서가 아닐까? 여튼 너무나도 재밌고 웃기던 파인애플 디너 엘런설 mit 박사 한국인 지도교수 논문 얘기들도 듣고 지쿤의 참 재밌는 이야기들 듣고 토치드 시나몸 파인애플 먹으며 신선하게 비싼 돈 내고보니 밤이되어 집으로 간다.밤에 에어컨도 틀고 밤거리도 걷고 재미난 리우 레블론삶이다.

Kyunghyun Cho

Sergey Levine

robot foundation models

rlpd

noneed soft actor critic

rl for specialist policies

rldg

distilling rl into openvla

training on data from rl specialists gives us a better generalist

can the speicalists also benefit from the generalist

adding rlt rl tokens to robot foundation models

cfg and ppo is almost same?

cfgrl with o optimality variable

score function

good/bad data and data during all rl training or just final policy

condition the policy on optimality indicator

takeaway

rl is the key to lifelong long learning of robot foundationmodels

directly training end to end rl is hard

David Bau

brain science dream is patching experiment

transplant

where does this related - patching - where or when does it infer

what is being influenced - probing - what concepts does it use its decision

how can we manipulate - steering - can we directly change its thoughts

mechanistic interpretaability is 도약판 mind the gap. fill in the middle for ai's reality konweldge gap for jump up

현실 → 자연어 → 이해

But why

to debug audit and improve the models

are all the last layers of sequence reasoning needed

to customize the models beyond their trianing

can we interpolate conditions

to discover new science/discovery

what does the ai know that scientist/human do not yet know

Yoshua Bengio

Azalia Mirhoseini

test time scaling