Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/AI Jailbreak Benchmark/
Sorry Bench
Search

Sorry Bench

Creator
Creator
Seonglae Cho
Created
Created
2025 Jan 29 11:1
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Feb 18 11:44
Refs
Refs
Refusal Vector
 
 
 
 
 

Sorry Bench

arxiv.org
https://arxiv.org/pdf/2406.14598
unsafe instruction dataste
sorry-bench/sorry-bench-202406 · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
sorry-bench/sorry-bench-202406 · Datasets at Hugging Face
https://huggingface.co/datasets/sorry-bench/sorry-bench-202406?row=96
sorry-bench/sorry-bench-202406 · Datasets at Hugging Face
 
 

Backlinks

Refusal Vector based DefenseConditional Vector Steering

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/AI Jailbreak Benchmark/
Sorry Bench
Copyright Seonglae Cho