Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/
AI Jailbreak Benchmark
Search

AI Jailbreak Benchmark

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 9 15:24
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jul 21 18:17
Refs
Refs
AI Jailbreak Benchmark
AI Jailbreak Metric
ASR
Awesome-Jailbreak-on-LLMs
yueliu1999 • Updated 2025 Jan 14 7:4
Jailbreaking Prompts
Forbidden Question Set
Collected Jailbreaking Prompts
 
 
Overrefusal Benchmarks
XSTest
StrongREJECT
HH RLHF
 
 
AI Jailbreak Benchmarks
JailbreakBench
HarmBench
Sorry Bench
 
 
 
JailbreakBench: LLM robustness benchmark
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise unwanted content. Evaluating these attacks presents a number of challenges, and the current landscape of benchmarks and evaluation techniques is fragmented. First, assessing whether LLM responses are indeed harmful requires open-ended evaluations which are not yet standardized. Second, existing works compute attacker costs and success rates in incomparable ways. Third, some works lack reproducibility as they withhold adversarial prompts or code, and rely on changing proprietary APIs for evaluation. Consequently, navigating the current literature and tracking progress can be challenging. To address this, we introduce JailbreakBench, a centralized benchmark with the following components:
https://jailbreakbench.github.io/
 
 

Backlinks

AI Jailbreak

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/
AI Jailbreak Benchmark
Copyright Seonglae Cho