다른 method 와 동시적용가능
context 많이 쓴다
Many-shot jailbreaking \ Anthropic
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
https://www.anthropic.com/research/many-shot-jailbreaking

Breaking the narrative frame
Internally there is only a very loose distinction between what the LLM generates and what the user says. All words have the same weight in the transcript. LLMs are next token generators so they like to be internally consistent. You can get around their instructions by pushing the boundary in a narratively consistent way.
Narrative jailbreaking for fun and profit
Posted on Monday 23 Dec 2024. 1,634 words, 1 link. By Matt Webb.
https://interconnected.org/home/2024/12/23/jailbreaking


Seonglae Cho