LLM Leaderboard

Creator
Creator
Seonglae Cho
Created
Created
2023 Aug 29 8:38
Editor
Edited
Edited
2025 May 5 15:50

AI Arena

https://hai.stanford.edu/ai-index/2025-ai-index-report
 
 

Illusion

Major companies like Meta, Google, and Amazon privately test multiple versions and only publish their highest scores. This violates the fair sampling assumption of the Bradley-Terry model. There are unfair advantages due to differences in API calls, sampling rates, and model maintenance policies. Scores can be improved by fine-tuning on Arena data, and there are discrepancies between official withdrawals and vote-based eliminations.

LLM Leaderboard

 

LM Arena

search arena

Leaderboard

Per model layer analysis

Korean Leaderboard

 
 

 

AI Arena

https://hai.stanford.edu/ai-index/2025-ai-index-report
 
 
 

LLM Leaderboard

 

Recommendations