LLM-based multi-agents repeatedly perform hypothesis-code-experiment-analysis cycles to 'invent' new model architectures. Evolving through a Researcher→Engineer→Analyst loop, they conducted 1,773 experiments, used 20,000 GPU hours, and discovered 106 SOTA linear attention architectures. Some outperformed DeltaNet, Gated-DeltaNet, and Mamba2 across multiple benchmarks. A strong linear relationship between number of SOTA discoveries and compute time invested demonstrates that research productivity can scale with compute rather than human limitations. The fitness function combines quantitative (sigmoid differences in loss and benchmark performance) and qualitative (LLM judges) evaluations to prevent reward hacking while assessing innovation, correctness, complexity, and convergence.
ASI-ARCH
Creator
Creator

Created
Created
2025 Aug 1 22:10Editor
Editor

Edited
Edited
2025 Aug 1 22:11Refs
Refs