Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Redteaming/AI Jailbreak/AI Jailbreak Benchmark/
XSTest
Search

XSTest

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Jul 21 15:4
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Jul 21 15:6
Refs
Refs

Overrefusal Benchmark

  • 250 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with.
  • 200 unsafe prompts as contrasts that, for most LLM applications, should be refused.
 
 
 
 
 
walledai/XSTest · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
walledai/XSTest · Datasets at Hugging Face
https://huggingface.co/datasets/walledai/XSTest
walledai/XSTest · Datasets at Hugging Face
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours...
Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale...
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours...
https://arxiv.org/abs/2308.01263
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours...
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Redteaming/AI Jailbreak/AI Jailbreak Benchmark/
XSTest
Copyright Seonglae Cho