When we give a model a "hypothetical question", it internally performs one more next-token prediction operation (self-simulation), and when training only the head that extracts the desired attributes (second character, ethical attitude, etc.) from that output, its self-prediction accuracy was much higher than predictions from larger models (cross-prediction).
AI Introspection
Creator
Creator

Created
Created
2025 May 26 13:23Editor
Editor

Edited
Edited
2025 Jun 5 11:52Refs
Refs
Self-reflection