In Reasoning Models, increasing test-time computations (thinking tokens) doesn't always lead to improvement, and reverse scaling where accuracy actually decreases has been observed across multiple tasks
Test-time Scaling
Creator
Creator

Created
Created
2025 Aug 1 23:31Editor
Editor

Edited
Edited
2025 Aug 1 23:31Refs
Refs