ICLR 2026 first Day

Date

Date

2026 Apr 23 0:0

Created by

Created by

Seonglae Cho

Created time

Created time

2026 Apr 23 12:35

Last edited by

Last edited by

Seonglae Cho

Last edited time

Last edited time

2026 May 19 0:35

Refs

Refs

First keynote

motivation behavior information

embodyment bias expectation persona dog/humanoid

only few wants humanoid?

personality matching is important as a part of alignment vertical

why you do ai

generalized model are bad at specialized thing
Proprioception as a Tool

LLM reasoning oral

overthinking reduction @router rl

decs

Decoupled Reward
Curriculum Data Scheduling

nrp detector, cnk?

Dataset

DeepScalarR
https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
nrp

gpqa-D
livecodebench

baselines

tlmre
thinkprune
adaptthink
LC-R1

takeawawy

efficient reasoning should penalize redunc=dancy not reasoning itself
decs separte them out
기존 length penalty 기반 방식들은 trajectory-level reward와 token-level optimization 사이의 misalignment 때문에 성능 저하를 유발한다고 저자들은 지적한다. 저자들은 두 가지 이론적 결함을 밝혀낸다
(1) 올바른 trajectory 내의 high-entropy exploratory token이 잘못 penalize되는 현상, (2) Necessary Reasoning Prefix(NRP) 이후 redundant token이 오히려 positive advantage로 보상되는 현상.

belief deviation

GAE @router rl

btr entroy?

relation between memorization and exploration - summarizating form past observation on environments

MemAgent

hqarl good ddataset?

condext distribution

dense Retreival learner - next chunk prediction as a retrieval loss

llm learning token is token level prefix ntp

retrieval is chunk level with ntp on other thinkgs

AI Task vector with
Model Merging

good

CRV

good

Second Oral

associate tokens

can we approximate the parameters?

bigram
interchangability
context

Sequences of Logits Reveal the Low Rank Structure of Language Models

Causal Structure Learning in Hawkes Processes

causal discovery in partially observed multivariate hawkes process

continuous to discrete reduction using bins?

temporal SAE

leveraging sequential property of language model

single level token

temporal features are naturally in-distribution for steering over generation

optimal transpork

hawkes process

causality and temporal smooth

eigenbench

human evaluation does not really scales

an unbiased metric for value alignment

Recommendations

/