LLM Leaderboard

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Aug 29 8:38
Editor
Edited
Edited
2025 Dec 12 17:7
 
 

Illusion

Major companies like Meta, Google, and Amazon privately test multiple versions and only publish their highest scores. This violates the fair sampling assumption of the Bradley-Terry model. There are unfair advantages due to differences in API calls, sampling rates, and model maintenance policies. Scores can be improved by fine-tuning on Arena data, and there are discrepancies between official withdrawals and vote-based eliminations.

LLM Leaderboard

 

Leaderboard

Per model layer analysis

Korean Leaderboard

OpenRouter accounts for 1% of API usage but approximately shows market share
 

Backlinks

LLMLLM

Recommendations