Prompt Hierarchy
Teaches LLMs to selectively ignore lower-privileged instructions

Can prevent Prompt Injection
Instruction hierarchy from OpenAI
Prioritize platform, developer, and user guidelines in that order, reject harmful requests, and provide clear, fact-based information (Instruction hierarchy on CoT)
Claude model is good at following Instruction Hierarchy
Findings from a pilot Anthropic–OpenAI alignment evaluation exercise: OpenAI Safety Tests
OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—highlighting progress, challenges, and the value of cross-lab collaboration.
https://openai.com/index/openai-anthropic-safety-evaluation/


Seonglae Cho