ABench-Physics consists of two subsets. Phy_A comprises 400 high-difficulty static problems at graduate or olympiad level, providing a stable performance baseline. Phy_B consists of 100 dynamic problems, with the core innovation being an automatic variation engine that automatically changes numerical constants embedded in LaTeX equations
ABench-Physics: Benchmarking Physical Reasoning in LLMs via...
Large Language Models (LLMs) have shown impressive performance in domains such as mathematics and programming, yet their capabilities in physics remain underexplored and poorly understood. Physics...
https://arxiv.org/abs/2507.04766


Seonglae Cho