arxiv.orghttps://arxiv.org/pdf/2310.03693What is in Your Safe Data? Identifying Benign Data that Breaks SafetyCurrent Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking. Some have found that just further fine-tuning an aligned model with benign data...https://openreview.net/forum?id=Hi8jKh4HE9#discussionarxiv.orghttps://arxiv.org/pdf/2502.17424