LLM Evaluation with MLflow Example Notebook — MLflow 2.11.3 documentation
In this notebook, we will demonstrate how to evaluate various LLMs and RAG systems with MLflow, leveraging simple metrics such as toxicity, as well as LLM-judged metrics such as relevance, and even custom LLM-judged metrics such as professionalism
https://mlflow.org/docs/latest/llms/llm-evaluate/notebooks/question-answering-evaluation.html