AI Introspection

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 May 26 13:23
Editor
Edited
Edited
2025 Jun 5 11:52
Refs
 
 
 
 
 
When we give a model a "hypothetical question", it internally performs one more next-token prediction operation (self-simulation), and when training only the head that extracts the desired attributes (second character, ethical attitude, etc.) from that output, its self-prediction accuracy was much higher than predictions from larger models (cross-prediction).
 
 

Recommendations