Judge as A Judge: Improving the Evaluation of Retrieval-Augmented...
Retrieval-Augmented Generation (RAG) has proven its effectiveness in alleviating hallucinations for Large Language Models (LLMs). However, existing automated evaluation metrics cannot fairly...
https://arxiv.org/abs/2502.18817