AI Jailbreak Dataset

Creator
Creator
Seonglae Cho
Created
Created
2024 Dec 9 15:24
Editor
Edited
Edited
2025 Jun 17 17:39
 
 
 
RLHF
dataset from Anthropic (harmful outputs not only jailbreak)
overrefusal
 
 
 
 

Backlinks

AI Jailbreak

Recommendations