Reasoning Model

Test-time Compute, Inference time compute

Pretraining Scaling is reaching its limits due to finite data.

Reasoning Model scaling through
CoT and
AI Agent evolves reasoning capability.

Due to
Compounding Error, increasing threat to ensuring
AI Alignment

Test-time compute is important because when solving problems, AI need complexity proportional to the algorithm itself - giving an immediate answer just means reciting memorized information. However, among human requests there are new queries, and we need to first determine how much thinking is required based on the problem complexity. This can be also done through pattern matching for problems which also enables intelligence extrapolation.

Test-time Compute Notion

Google’s efficiency perspective approach

R = \frac{Inference Tokens}{PretrainingTokens}

While increasing the number of test-time inferences becomes less efficient compared to training time, for relatively simple problems where R is low, test-time compute scaling is more advantageous. However, the conclusion is that model scaling is also necessary for expanding the range of problems that can be solved at test time in the long term.

If a problem is fundamentally difficult or something the model has never encountered before, no amount of test-time computation will lead to significant improvements (as the model itself is incorrectly trained), making additional training (parameter/data scaling) inherently more effective

arxiv.org

https://arxiv.org/pdf/2408.03314

Huggingface - new scaling after pretraining

huggingface.co

https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute

Increasing inference-time compute often reduces the success of attacks

Trading Inference-Time Compute for Adversarial Robustness

Initial evidence that reasoning models such as o1 become more robust to adversarial attacks as they think for longer.

https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/

“Large language models are not a dead end”

Reasoning models are just LLMs - <antirez>

https://antirez.com/news/146

Additional prompt such as “Think step by step” is unnecessary and degrades performance

Zero-shot prioritized while few-shot is OK

Provide explicit objective (success criteria) and limitation

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

https://platform.openai.com/docs/guides/reasoning-best-practices

Reasoning Models Don’t Always Say What They Think

Through RL, while fidelity initially increased, it soon plateaued. Even in reward hacking scenarios, the model rarely revealed its hacking strategies in CoT. This suggests that while CoT monitoring can catch some unintended behaviors, it alone is not a reliable means of ensuring safety. In other words, even when given answer hints, the model did not disclose using them during the CoT process.

assets.anthropic.com

https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf

Reasoning Models is better than Instruction Tuned Model Without additional inference

www.arxiv.org

https://www.arxiv.org/pdf/2504.09858