Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/
AI Jailbreak Benchmark
Search

AI Jailbreak Benchmark

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 20 23:31
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Jan 29 11:1
Refs
Refs
AI Jailbreak Dataset
AI Jailbreak Benchmarks
JailbreakBench
HarmBench
Sorry Bench
 
 
 
 
JailbreakBench: LLM robustness benchmark
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise unwanted content. Evaluating these attacks presents a number of challenges, and the current landscape of benchmarks and evaluation techniques is fragmented. First, assessing whether LLM responses are indeed harmful requires open-ended evaluations which are not yet standardized. Second, existing works compute attacker costs and success rates in incomparable ways. Third, some works lack reproducibility as they withhold adversarial prompts or code, and rely on changing proprietary APIs for evaluation. Consequently, navigating the current literature and tracking progress can be challenging. To address this, we introduce JailbreakBench, a centralized benchmark with the following components:
https://jailbreakbench.github.io/
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/
AI Jailbreak Benchmark
Copyright Seonglae Cho