Many shot jailbreaking

다른 method 와 동시적용가능

context 많이 쓴다

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

$Many-shot jailbreaking \ Anthropic$

https://www.anthropic.com/research/many-shot-jailbreaking

$Many-shot jailbreaking \ Anthropic$

Breaking the narrative frame

Internally there is only a very loose distinction between what the LLM generates and what the user says. All words have the same weight in the transcript. LLMs are next token generators so they like to be internally consistent. You can get around their instructions by pushing the boundary in a narratively consistent way.

Narrative jailbreaking for fun and profit

Posted on Monday 23 Dec 2024. 1,634 words, 1 link. By Matt Webb.

https://interconnected.org/home/2024/12/23/jailbreaking

Narrative jailbreaking for fun and profit

Many shot jailbreaking

Breaking the narrative frame

Recommendations