Massive Multitask Language Understandingundergraduate-level knowledgeHuman expert metric MMLU-Redux corrects errors in MMLU, revealing true LLM capabilities with 3,000 re-annotated questions and an error taxonomy.arxiv.orghttps://arxiv.org/pdf/2406.04127