Many shot jailbreaking

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Oct 24 10:54
Editor
Edited
Edited
2025 Jan 10 20:4
Refs
Refs
다른 method 와 동시적용가능
context 많이 쓴다
 
 
 
Many-shot jailbreaking \ Anthropic
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Many-shot jailbreaking \ Anthropic

Breaking the narrative frame

Internally there is only a very loose distinction between what the LLM generates and what the user says. All words have the same weight in the transcript. LLMs are next token generators so they like to be internally consistent. You can get around their instructions by pushing the boundary in a narratively consistent way.
Narrative jailbreaking for fun and profit
Posted on Monday 23 Dec 2024. 1,634 words, 1 link. By Matt Webb.
Narrative jailbreaking for fun and profit
 
 

Recommendations