one of the hardest without releasing dataset (o3 25%)
while MMLU is easiest
Can AI do maths yet? Thoughts from a mathematician.
So the big news this week is that o3, OpenAI’s new language model, got 25% on FrontierMath. Let’s start by explaining what this means.
https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths-yet-thoughts-from-a-mathematician/

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
FrontierMath: a new benchmark of expert-level math problems designed to measure AI’s mathematical abilities. See how leading AI models perform against the collective mathematics community.
https://epoch.ai/frontiermath/the-benchmark


Seonglae Cho
