Test-time Compute

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 21 0:30
Editor
Edited
Edited
2025 Feb 16 22:1

Inference time compute, Reasoning Model

  • Pretraining Scaling is reaching its limits due to finite data.
Test-time compute is important because when solving problems, AI need complexity proportional to the algorithm itself - giving an immediate answer just means reciting memorized information. However, among human requests there are new queries, and we need to first determine how much thinking is required based on the problem complexity. This can be also done through pattern matching for problems which also enables intelligence extrapolation.
Test-time Compute Notion
 
 

Google’s efficiency perspective approach

notion image
R=InferenceTokensPretrainingTokensR = \frac{Inference Tokens}{PretrainingTokens}
While increasing the number of test-time inferences becomes less efficient compared to training time, for relatively simple problems where R is low, test-time compute scaling is more advantageous. However, the conclusion is that model scaling is also necessary for expanding the range of problems that can be solved at test time in the long term.
If a problem is fundamentally difficult or something the model has never encountered before, no amount of test-time computation will lead to significant improvements (as the model itself is incorrectly trained), making additional training (parameter/data scaling) inherently more effective
Huggingface - new scaling after pretraining

Increasing inference-time compute often reduces the success of attacks

“Large language models are not a dead end”

  • Additional prompt such as “Think step by step” is unnecessary and degrades performance
  • Zero-shot prioritized while few-shot is OK
  • Provide explicit objective (success criteria) and limitation
 
 

Recommendations