Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Internal Probe/
Classifier Probe
Search

Classifier Probe

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2026 Feb 13 12:51
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2026 Feb 13 12:52
Refs
Refs
 
 
 
 
 

The Confidence Manifold

The Confidence Manifold: Geometric Structure of Correctness...
When a language model asserts that "the capital of Australia is Sydney," does it know this is wrong? We characterize the geometry of correctness representations across 9 models from 5 architecture...
The Confidence Manifold: Geometric Structure of Correctness...
https://arxiv.org/abs/2602.08159
The Confidence Manifold: Geometric Structure of Correctness...
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Risk/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Internal Probe/
Classifier Probe
Copyright Seonglae Cho