2026 Research/Product Plan

Micro-pessimist, Macro-optimist

Eval is a direction. Developing a benchmark means a finding a direction of benchmark intended in a benchmark designed space

연구를 하며 깨닫는건 내가 얼마나 쉽게 거창한 말을 하고 다녔는지를 깨닫는다

인질을 활용해라 내가 정말 좋아하는 일과 내가 싫어하는 일을 묶어버려라. 두개 같이 하게. 아니면 hook 이나 환경을 미리 세팅해둬서 자연스럽게 그게 보이거나 신경쓰여서 그 일을 하도록

Jax https://build.nvidia.com/spark/jax/overview

연구에 계속 의도를 넣지말기 실험으로 관찰하고 나는 선택하고 구현 제대로 된지만

이런 아이디어로 이렇게 해봤는데 잘되더라. 이건 설득력 없고 이거보다 요런 모티베이션으로 어떤 문제해결 위해 특정 인사이트로 이렇게 해봤는데 실전에는 문제 있어서 어떤 테크닉 추가했더니 훨씬 좋아지더라 이렇게 말해야.

More inspiration from nature present, more confident on top-down belief that sustains you when experiments contradict you multifaceted beauty (

Ilya Sutskever)

돈많이 받는 연구자는 하나밖에 없다. compute 나 scaling 하기 전에 연구자의 taste 로 작동할지 안할지 알아내는 인사이트로 그 비용을 줄여주고 빠르게 전진하게 하는 능력

Rather than new theoretical insights, a "well-packaged framework + stable empirical gains" is much more effective. Concise conceptual packaging with experimental reproducibility + computational discipline with safe novelty creates a sweet spot that appears to be a reasonable extension.

리뷰: 무조건 pessimistic 보단 응원받는 느낌 주게 내 관점도 보여주면서 안된다 현실적인 말보단

streaming is the first thing to multimodality

Plan

일찍자고 tldr alphasignal 정상화

oversquashing

icml 중요하다 매우 서울

icml

moe sae

routerrl

(confidence manifold)

acl workshop / iclr workshop

neruonpedia

confidence manifold ‣

agent state graph for ‣

cvpr / neurips

math interpretability

agent crdt

probe as a tool

하반기

SNN Recurrent Model

robotics redteaming or control within simulation

streaming llm Streaming Vision Speech Model

ARR march

agent state graph

routerrl

single token

stream writer?

Expecting trend

Modeling

Dynamic Tokentization

Test Time compute

Test-Time Training

AI Agent

Memory, Memo

Tool Calling

나쁜 reserach result descirption

나쁜 예: “이 뉴런/feature는 거짓말을 나타낸다”

“이 activation 방향을 intervene(ablate/patch/steer)하면, 특정 행동/정답률/로그잇이 예측 가능한 방식으로 변한다”

원인→결과 형태로.“정성”이 아니라 “측정 + 그래프” 중심

effect size: (intervention 전후) 정확도/손실/특정 metric 변화량

sweep: intervention strength(α) 스윕 곡선

compare: method A vs baseline vs random 방향 vs matched-control 방향

error bars 또는 최소한 여러 seed/여러 example 분포

다른 사람들에게 휘둘리지 말기, 내가 믿는 것을 유지하기. CorrSteer 제목, CRL conclusion 함부러 맡기지 말기. Umar UI UX 도 내가 맞다고 믿는 것으로 강하게 주장

food is food, sex is sex. find something really differentiate you. dont be emotional (베니스 비행기내) 장기목표를 위해 나에게 남아있는 일들을 고치기: 자신감 가지고 편하게 말잘뱉기, 긴장없이 발표하고 논문잘쓰기

Personal 2026 Paper plans

Streaming Vision Speech Model

Working

Done

Deprecated

Continuous Curriculum Learning