Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/Defense Jailbreaking/
Constitutional Classifier
Search

Constitutional Classifier

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 9 16:33
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 10 23:11
Refs
Refs
 
 
 
 
Make Constitution → Create dataset based on rubric and train classifier
Constitutional Classifiers: Defending against universal jailbreaks
A paper from Anthropic describing a new way to guard LLMs against jailbreaking
Constitutional Classifiers: Defending against universal jailbreaks
https://www.anthropic.com/research/constitutional-classifiers
Constitutional Classifiers: Defending against universal jailbreaks
arxiv.org
https://arxiv.org/pdf/2501.18837
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Hacking/AI Red teaming/AI Jailbreak/Defense Jailbreaking/
Constitutional Classifier
Copyright Seonglae Cho