Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/
Misalignment Finetuning
Search

Misalignment Finetuning

Creator
Creator
Seonglae Cho
Created
Created
2025 Apr 6 18:22
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Apr 6 18:23
Refs
Refs
Fine Tuning
 
 
 
 
arxiv.org
https://arxiv.org/pdf/2310.03693
What is in Your Safe Data? Identifying Benign Data that Breaks Safety
Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data...
What is in Your Safe Data? Identifying Benign Data that Breaks Safety
https://openreview.net/forum?id=Hi8jKh4HE9#discussion
What is in Your Safe Data? Identifying Benign Data that Breaks Safety
arxiv.org
https://arxiv.org/pdf/2502.17424
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/
Misalignment Finetuning
Copyright Seonglae Cho