HRM
HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples.
HRM combines H(slow)/L(fast) two modules with different timescales (multiple L steps → 1 H step). HRM avoids premature convergence through hierarchical convergence (L converges locally → H updates context and resets L). HRM uses 1-step gradient (DEQ 1st-order approximation) + deep supervision for O(1) memory backpropagation. In essence, it combines timescale separation, hierarchical convergence, 1-step gradient, and ACT.
