ICLR 2026 first Day

드디어 첫숙소 푹자고 가는 첫 iclr day. 생각보다 금방 가고 도착한 곳 풍경도 미친듯이 좋다. 분위기와 헷살 압살에 첫 키노트도 나름 인상적이다. 역시 첫키노트는 아무나 하는게 아니군 하며 음식 커피도 마음에 들고 신나게 돌아다닌다. 이때까지는 체력도 남아서 웃으며 한국인 미팅도 혹시해서 걸어가보는데 임은우 중학교 친구가 나를 부르는것이다. 참 이런 레전드 사건이 발생하다니. 여튼 친구 자식이 저녁 버리고 가네 뭐 얘때문에 춤 바로앞 직관도 메인 놓치고 했지만 마지막에 기차놀이 끼여서도 잘하고 초반에 손잡고 춤도 추고 행복하게 노니다가 숙소도 저녁도 잘 먹고 클레이튼 리시 일함과 잘 있다 간다. 이때 깨달은 점은 역시 남과 함께 있더라도 그것에 휘둘리지 말고, 내가 하고 싶은 것을 즐기자는 것이다. 예의차리다가 잃은 기회가 치명적일 수 있고 다시 갖기 위해서는 꽤 큰 노력이 필요할 수 있다는 점. 그래도 나름 맥도날드도 가서 첫숙소에 저장해두고 레블론 밤도 돌아다니며 들어간다. 돌아갈때는 클레이튼 친구가 브라질사는친구인데 태워줘서 호텔 머무는 곳 숙소비도 싸다고 알려주고 이런저런 좁지만 재밌는 차 통해서 온다.

First keynote

motivation behavior information

embodyment bias expectation persona dog/humanoid

only few wants humanoid?

personality matching is important as a part of alignment vertical

why you do ai

generalized model are bad at specialized thing
Proprioception as a Tool

LLM reasoning oral

overthinking reduction @router rl

decs

Decoupled Reward
Curriculum Data Scheduling

nrp detector, cnk?

Dataset

DeepScalarR
https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
nrp

gpqa-D
livecodebench

baselines

tlmre
thinkprune
adaptthink
LC-R1

takeawawy

efficient reasoning should penalize redunc=dancy not reasoning itself
decs separte them out
기존 length penalty 기반 방식들은 trajectory-level reward와 token-level optimization 사이의 misalignment 때문에 성능 저하를 유발한다고 저자들은 지적한다. 저자들은 두 가지 이론적 결함을 밝혀낸다
(1) 올바른 trajectory 내의 high-entropy exploratory token이 잘못 penalize되는 현상, (2) Necessary Reasoning Prefix(NRP) 이후 redundant token이 오히려 positive advantage로 보상되는 현상.

belief deviation

GAE @router rl

btr entroy?

relation between memorization and exploration - summarizating form past observation on environments

MemAgent

hqarl good ddataset?

condext distribution

dense Retreival learner - next chunk prediction as a retrieval loss

llm learning token is token level prefix ntp

retrieval is chunk level with ntp on other thinkgs

AI Task vector with
Model Merging

good

CRV

good

Second Oral

associate tokens

can we approximate the parameters?

bigram
interchangability
context

Sequences of Logits Reveal the Low Rank Structure of Language Models

Causal Structure Learning in Hawkes Processes

causal discovery in partially observed multivariate hawkes process

continuous to discrete reduction using bins?

temporal SAE

leveraging sequential property of language model

single level token

temporal features are naturally in-distribution for steering over generation

optimal transpork

hawkes process

causality and temporal smooth

eigenbench

human evaluation does not really scales

an unbiased metric for value alignment