LLM Judge
Using continuous scores for LLM as a judge is not effective. LLMs perform better when making categorical judgments. It is recommended to first have LLM judges make categorical assessments, which can then be aggregated into continuous metrics if needed.
Limitation
- egocentric - prefer himself (AI Introspection)
LLM Judge Types
weakness