LLM Evaluation

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Jun 2 12:28
Editor
Edited
Edited
2024 Sep 10 23:21

Language Model Evaluation

LLM Benchmarks
 
  • monotonicity
  • low variance
https://www.youtube.com/watch?v=2-SPH9hIKT8
LLM Evaluation Methods
 
 
notion image
 
LLM Evaluation Tools
 
 
Benchmarks are unreliable, see results from arena or trustworthy 3rd party

LLM Leaderboard

Evaluating LLMs is complex so more comprehensive and purpose-specific evaluation methods is needed to assess their capabilities for various real-world applications
Types
 
 

Recommendations