ABench-Physics consists of two subsets. comprises 400 high-difficulty static problems at graduate or olympiad level, providing a stable performance baseline. consists of 100 dynamic problems, with the core innovation being an automatic variation engine that automatically changes numerical constants embedded in LaTeX equations
‣
ABench-Physics: Benchmarking Physical Reasoning in LLMs via...
Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics...
https://arxiv.org/abs/2507.04766


Seonglae Cho