LLM Judge
Using continuous scores for LLM as a judge is not effective. LLMs perform better when making categorical judgments. It is recommended to first have LLM judges make categorical assessments, which can then be aggregated into continuous metrics if needed.
Limitation
- egocentric - prefer himself (AI Introspection)
LLM Judge Types
weakness
arxiv.org
https://arxiv.org/pdf/2505.15795

Seonglae Cho