Hierarchical Reasoning Model

HRM

HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples.

HRM combines H(slow)/L(fast) two modules with different timescales (multiple L steps → 1 H step). HRM avoids premature convergence through hierarchical convergence (L converges locally → H updates context and resets L). HRM uses 1-step gradient (DEQ 1st-order approximation) + deep supervision for O(1) memory backpropagation. In essence, it combines timescale separation, hierarchical convergence, 1-step gradient, and ACT.

Deep Equilibrium Models, DEQ

In this equation, is the fixed point where the network's output is fed back into itself as input. In other words, it's a state where repeatedly applying f no longer changes the value. Once we reach this fixed point after iterating (forward step), we can compute the gradient using IFT (

Implicit Function Theorem) as follows:

Using this approach, we can calculate the gradient using only the final fixed point, without backpropagating through all recursion steps. → This enables "1-step gradient approximation".

Frequency Hierarchy

Two networks f_L and f_H alternately update z_L and z_H. This is based on the assumption of different frequency layers (hierarchy) in the brain. The paper connects this to

Hierarchical Temporal Processing in the brain (fast sensory vs slow reasoning loop).

arxiv.org

https://arxiv.org/pdf/2506.21734

Hierarchical Reasoning Model

HRM

Deep Equilibrium Models, DEQ

Frequency Hierarchy

Backlinks

Recommendations