Loading views...

ICLR 2026 first Day

Date
Date
2026 Apr 23 0:0
Created by
Created by
Seonglae ChoSeonglae Cho
Created time
Created time
2026 Apr 23 12:35
Last edited by
Last edited by
Seonglae ChoSeonglae Cho
Last edited time
Last edited time
2026 May 19 0:35
Refs
Refs

First keynote

  • motivation behavior information
  • embodyment bias expectation persona dog/humanoid
  • only few wants humanoid?
  • personality matching is important as a part of alignment vertical
  • why you do ai

LLM reasoning oral

overthinking reduction @router rl

  • decs
    • Decoupled Reward
    • Curriculum Data Scheduling
  • nrp detector, cnk?
  • baselines
    • tlmre
    • thinkprune
    • adaptthink
    • LC-R1
  • takeawawy
    • efficient reasoning should penalize redunc=dancy not reasoning itself
    • decs separte them out
    • 기존 length penalty 기반 방식들은 trajectory-level reward와 token-level optimization 사이의 misalignment 때문에 성능 저하를 유발한다고 저자들은 지적한다. 저자들은 두 가지 이론적 결함을 밝혀낸다
    • (1) 올바른 trajectory 내의 high-entropy exploratory token이 잘못 penalize되는 현상, (2) Necessary Reasoning Prefix(NRP) 이후 redundant token이 오히려 positive advantage로 보상되는 현상.

belief deviation

  • btr entroy?
  • relation between memorization and exploration - summarizating form past observation on environments

MemAgent

  • hqarl good ddataset?
  • condext distribution

dense Retreival learner - next chunk prediction as a retrieval loss

  • llm learning token is token level prefix ntp
  • retrieval is chunk level with ntp on other thinkgs

AI Task vector
with
Model Merging

good

CRV

good

Second Oral

associate tokens

  • can we approximate the parameters?
    • bigram
    • interchangability
    • context

Sequences of Logits Reveal the Low Rank Structure of Language Models

Causal Structure Learning in Hawkes Processes

  • causal discovery in partially observed multivariate hawkes process
  • continuous to discrete reduction using bins?

temporal SAE

leveraging sequential property of language model
  • single level token
  • temporal features are naturally in-distribution for steering over generation
optimal transpork
hawkes process
causality and temporal smooth

eigenbench

  • human evaluation does not really scales
  • an unbiased metric for value alignment
 
 
 
 
 

Recommendations