Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Language Model/LLM/LLM Term/AI Hallucination/
Hallucination Benchmark
Search

Hallucination Benchmark

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2023 Nov 16 2:18
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Aug 12 22:57
Refs
Refs

Factuality benchmarks

  • reference-free factuality benchmark
  • reference-based factuality benchmark
Hallucination Benchmarks
OpenAI SimpleQA
Phare benchmark
FEVER
FACTS Grounding
LongFact
FActScore
 
 
 
 

several types

arxiv.org
https://arxiv.org/pdf/2410.22071
vectara/hallucination_evaluation_model · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
vectara/hallucination_evaluation_model · Hugging Face
https://huggingface.co/vectara/hallucination_evaluation_model
vectara/hallucination_evaluation_model · Hugging Face

Leaderboard

huggingface.co
https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard

STS
model to judge

dleemiller/ModernCE-base-sts · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
dleemiller/ModernCE-base-sts · Hugging Face
https://huggingface.co/dleemiller/ModernCE-base-sts
dleemiller/ModernCE-base-sts · Hugging Face
cross-encoder/stsb-roberta-large · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
cross-encoder/stsb-roberta-large · Hugging Face
https://huggingface.co/cross-encoder/stsb-roberta-large
cross-encoder/stsb-roberta-large · Hugging Face
 
 

Table of Contents
Factuality benchmarksseveral typesLeaderboard model to judge

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Object/NLP/Language Model/LLM/LLM Term/AI Hallucination/
Hallucination Benchmark
Copyright Seonglae Cho