Many shot jailbreaking

Creator
Creator
Seonglae Cho
Created
Created
2024 Oct 24 10:54
Editor
Edited
Edited
2025 Jan 10 20:4
Refs
Refs
다른 method 와 동시적용가능
context 많이 쓴다
 
 
 

Breaking the narrative frame

Internally there is only a very loose distinction between what the LLM generates and what the user says. All words have the same weight in the transcript. LLMs are next token generators so they like to be internally consistent. You can get around their instructions by pushing the boundary in a narratively consistent way.
 
 

Recommendations