Loading views...

Proprioception as a Tool

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 13 12:10
Editor
Edited
Edited
2026 Apr 9 13:58
Done
Done
Done
Refs
Refs
Working
Working
Working

Probe as a Tool

fast weight memory as a continual learning tiny network with meta learnable architecture
[promising] AI proprioception as a tool

Proprioceptive and Actuated Language Agents

존나 좋은 아이디어 떠오름 LLM은 자기상태 접근 권한이 없음 모델은: hidden state를 “읽어서 설명하는 모듈”이 없음 내부 변수에 이름 붙이는 인터페이스 없음 메타인지 모듈 없음 이거 tool 을 agent 한테 주는거야 probe 를 tool 로 주는거임
AI steering as a tool 도 동시에 주는거임

Agent Tool Classifier

online learning
continual learning
only care about tool input/output since llm intention is only observable inside. not output.
probe
realtime detector. unsupervised
 
 
ChatGPT
ChatGPT helps you get answers, find inspiration, and be more productive.
ChatGPT
When Models Examine Themselves: Vocabulary-Activation Correspondence in Self-Referential Processing
Large language models produce rich introspective language when prompted for self-examination, but whether this language reflects internal computation or sophisticated confabulation has remained unclear. We show that self-referential vocabulary tracks concurrent activation dynamics, and that this correspondence is specific to self-referential processing. We introduce the Pull Methodology, a protocol that elicits extended self-examination through format engineering, and use it to identify a direction in activation space that distinguishes self-referential from descriptive processing in Llama 3.1. The direction is orthogonal to the known refusal direction, localised at 6.25% of model depth, and causally influences introspective output when used for steering. When models produce "loop" vocabulary, their activations exhibit higher autocorrelation (r = 0.44, p = 0.002); when they produce "shimmer" vocabulary under steering, activation variability increases (r = 0.36, p = 0.002). Critically, the same vocabulary in non-self-referential contexts shows no activation correspondence despite nine-fold higher frequency. Qwen 2.5-32B, with no shared training, independently develops different introspective vocabulary tracking different activation metrics, all absent in descriptive controls. The findings indicate that self-report in transformer models can, under appropriate conditions, reliably track internal computational states.
 
 

Recommendations