Existing AI benchmarks are increasingly struggling to differentiate between top-performing models. AI models can be objectively and dynamically evaluated through direct competition in strategy games (e.g., chess, Go, poker) with clear win-loss conditions.
Advancing AI benchmarking with Game Arena
We’re expanding Game Arena with Poker and Werewolf, while Gemini 3 Pro and Flash top our chess leaderboard.
https://blog.google/innovation-and-ai/models-and-research/google-deepmind/kaggle-game-arena-updates/

Rethinking how we measure AI intelligence
Kaggle Game Arena is a new platform where AI models compete head-to-head in complex strategic games.
https://blog.google/technology/ai/kaggle-game-arena/


Seonglae Cho