TRM

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Oct 8 23:28
Editor
Edited
Edited
2025 Oct 19 22:49

Tiny Recursive Model

Achieved significantly higher performance and generalization than HRM with just a single 2-layer network (7M parameters) using simple recursion.
Token-level parallelism is possible, but uses a small number of layers recursively with three variables: x (input), y (prediction), and z (reasoning vector). x is the encoding of the input, z is initialized with a projection embedding, and y starts at zero. For n steps, z is updated, and at the end y is updated to yz. This is repeated N times. In other words, n is like layer depth and N is recursive depth.

Criticism

Hierarchical Reasoning Model
's recursion doesn't truly reach a fixed point (since the residual doesn't become 0). This means the
Implicit Function Theorem
cannot be applied with mathematical guarantees. Therefore, the 1-step gradient approximation lacks solid foundation. So TRM completely removes IFT and directly backprops through all recursion steps. As a result, it's much more stable and generalization performance is significantly improved.
In other words, HRM applies differentiation to a mathematically non-differentiable function. This means the gradient is fundamentally incorrect, and even if the model "sort of works" in early training, convergence and generalization performance become distorted. Since the fixed point doesn't actually exist, just backprop through every step.
notion image
Therefore, TRM uses full chain backprop through all steps, whereas HRM only backprops through the final 1-step of reasoning.
 
 
 
 
 
 

Recommendations