Hierarchical Reasoning Model

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Aug 1 23:13
Editor
Edited
Edited
2025 Oct 19 22:52

HRM

HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples.
HRM combines H(slow)/L(fast) two modules with different timescales (multiple L steps → 1 H step). HRM avoids premature convergence through hierarchical convergence (L converges locally → H updates context and resets L). HRM uses 1-step gradient (DEQ 1st-order approximation) + deep supervision for O(1) memory backpropagation. In essence, it combines timescale separation, hierarchical convergence, 1-step gradient, and ACT.
notion image

Deep Equilibrium Models, DEQ

In this equation, is the fixed point where the network's output is fed back into itself as input. In other words, it's a state where repeatedly applying f no longer changes the value. Once we reach this fixed point after iterating (forward step), we can compute the gradient using IFT (
Implicit Function Theorem
) as follows:
Using this approach, we can calculate the gradient using only the final fixed point, without backpropagating through all recursion steps. → This enables "1-step gradient approximation".

Frequency Hierarchy

Two networks f_L and f_H alternately update z_L and z_H. This is based on the assumption of different frequency layers (hierarchy) in the brain. The paper connects this to
Hierarchical Temporal Processing
in the brain (fast sensory vs slow reasoning loop).
 
 
 
 
 

Recommendations