BPJ
Boundary Point Jailbreaking of Black-Box LLMs
Frontier LLMs are safeguarded against attempts to extract harmful information via adversarial prompts known as "jailbreaks". Recently, defenders have developed classifier-based systems that have...
https://arxiv.org/abs/2602.15001


Seonglae Cho