ASI-ARCH

LLM-based multi-agents repeatedly perform hypothesis-code-experiment-analysis cycles to 'invent' new model architectures. Evolving through a Researcher→Engineer→Analyst loop, they conducted 1,773 experiments, used 20,000 GPU hours, and discovered 106 SOTA linear attention architectures. Some outperformed DeltaNet, Gated-DeltaNet, and Mamba2 across multiple benchmarks. A strong linear relationship between number of SOTA discoveries and compute time invested demonstrates that research productivity can scale with compute rather than human limitations. The fitness function combines quantitative (sigmoid differences in loss and benchmark performance) and qualitative (LLM judges) evaluations to prevent reward hacking while assessing innovation, correctness, complexity, and convergence.

arxiv.org

https://arxiv.org/pdf/2507.18074

ASI-ARCH

Recommendations