FrontierMath

Creator

Creator

Seonglae Cho

Created

Created

2024 Nov 27 0:57

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Jan 10 20:8

Refs

Refs

one of the hardest without releasing dataset (o3 25%)

while

MMLU is easiest

https://epoch.ai/frontiermath/the-benchmark

Can AI do maths yet? Thoughts from a mathematician.

So the big news this week is that o3, OpenAI’s new language model, got 25% on FrontierMath. Let’s start by explaining what this means.

Can AI do maths yet? Thoughts from a mathematician.

https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths-yet-thoughts-from-a-mathematician/

Can AI do maths yet? Thoughts from a mathematician.

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

FrontierMath: a new benchmark of expert-level math problems designed to measure AI’s mathematical abilities. See how leading AI models perform against the collective mathematics community.

https://epoch.ai/frontiermath/the-benchmark

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Recommendations

////////